7. Reordering Method

Denote the following.

  • \(V\) is a \(n\) x \(n\) weakness matrix where \(v_{ij}\) represents some dominance over the i-th team by the j-th team.

  • \(D\) is \(V^T\) and referred to as the strength matrix where \(d_{ij}\) represents dominance of the i-th team over the j-th team.

  • \(R\) is the rank-differential matrix that is \(n\) x \(n\) and an upper triangle matrix of the following form.

\(R=\left[\begin{matrix}0 & 1 & 2 & \dots & n - 1 \\ & & 1 & \dots & n - 2 \\ & & \ddots & \ddots & \vdots \\ & & & \ddots & 1 \\ & & & & 0 \end{matrix}\right]\)

The idea of the Reordering Method is to find some \(Q\) to reorder/transform \(D\) to be as close as possible to \(R\). \(Q\) is referred to as the permutation matrix. The minimization problem is formulated as follows.

\(\underset{Q}{\min}||Q^TDQ - R||\)

\(Q\) is a \(n\) x \(n\) sparse matrix with \(q_{ij} \in {0, 1}\). Also, \(Q\) is specified by a candidate ranking vector \(r\). Each index and value of \(r\) determines \(Q\). For example, \(r = [4, 1, 3, 2, 0]\) will create a \(Q\) such that

  • \(r_0 = 4\): column 0, row 4 will be 1,

  • \(r_1 = 1\): column 1, row 1 will be 1,

  • \(r_2 = 3\): column 2, row 3 will be 1,

  • \(r_3 = 2\): column 3, row 2 will be 1, and

  • \(r_4 = 0\): column 4, row 0 will be 1.

\(Q(r=[4, 1, 3, 2, 0]) = \left[\begin{matrix}0 & 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \end{matrix}\right]\)

For a matrix of size \(n\), there will be \(n!\) permutations of \(r\) possible. Note

  • \(5! = 120\) and

  • \(10! = 3,628,800\).

It is not computationally tractable to exhaustively try all permutations of \(r\) to find the solution for the minimization problem. Instead, exploration and exploitation theoretical frameworks like genetic algorithms are advised to find the (near-) optimal solution.

7.1. NCAAF ACC 2005

This data \(V\) comes from the NCAAF ACC 2005 result, and is the point differential. The order of the columns and rows are Duke, Miami, UNC, UVA and VT. \(D\) is \(V^T\) and \(R\) is given as follows. What we want to do is to find a near-optimal \(Q\) to permute \(D\) into \(R\) as closely as possible.

[1]:
import numpy as np

V = np.array([
    [0, 45, 3, 31, 45],
    [0, 0, 0, 0, 0],
    [0, 18, 0, 0, 27],
    [0, 8, 2, 0, 38],
    [0, 20, 0, 0, 0]
])

D = V.T

R = np.array([
    [0, 1, 2, 3, 4],
    [0, 0, 1, 2, 3],
    [0, 0, 0, 1, 2],
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0]
])

Let’s normalize \(D\) and \(R\) over the sum of all elements.

[2]:
D = D / D.sum()
D
[2]:
array([[0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.18987342, 0.        , 0.07594937, 0.03375527, 0.08438819],
       [0.01265823, 0.        , 0.        , 0.00843882, 0.        ],
       [0.13080169, 0.        , 0.        , 0.        , 0.        ],
       [0.18987342, 0.        , 0.11392405, 0.16033755, 0.        ]])
[3]:
R = R / R.sum()
R
[3]:
array([[0.  , 0.05, 0.1 , 0.15, 0.2 ],
       [0.  , 0.  , 0.05, 0.1 , 0.15],
       [0.  , 0.  , 0.  , 0.05, 0.1 ],
       [0.  , 0.  , 0.  , 0.  , 0.05],
       [0.  , 0.  , 0.  , 0.  , 0.  ]])

Here’s a nifty function to produce \(Q\) from \(r\).

[4]:
def get_Q(r):
    if 0 not in r:
        r = [i - 1 for i in r]

    n = len(r)
    Q = np.zeros((n, n))
    for col, row in enumerate(r):
        Q[row][col] = 1
    return Q

For \(r=[5, 1, 4, 3, 2]\), let’s see what \(Q^TDQ\) looks like. Note that \(r\) as specified here states that - Duke is 4, - Miami is 1, - UNC is 4, - UVA is 3, and - VT is 2.

[5]:
Q = get_Q([5, 1, 4, 3, 2])
Q.T.dot(D).dot(Q)
[5]:
array([[0.        , 0.18987342, 0.16033755, 0.11392405, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.13080169, 0.        , 0.        , 0.        ],
       [0.        , 0.01265823, 0.00843882, 0.        , 0.        ],
       [0.08438819, 0.18987342, 0.03375527, 0.07594937, 0.        ]])

This result is what \(Q^TDQ - R\) looks like.

[6]:
Q.T.dot(D).dot(Q) - R
[6]:
array([[ 0.        ,  0.13987342,  0.06033755, -0.03607595, -0.2       ],
       [ 0.        ,  0.        , -0.05      , -0.1       , -0.15      ],
       [ 0.        ,  0.13080169,  0.        , -0.05      , -0.1       ],
       [ 0.        ,  0.01265823,  0.00843882,  0.        , -0.05      ],
       [ 0.08438819,  0.18987342,  0.03375527,  0.07594937,  0.        ]])

This result is what \(||Q^TDQ - R||\) looks like.

[7]:
np.linalg.norm(Q.T.dot(D).dot(Q) - R)
[7]:
0.4265304195816933

Interestingly, when VT is specified as 1 and Miami is specified as 2, then the best \(Q\) is found!

[8]:
Q = get_Q([5, 2, 4, 3, 1])
np.linalg.norm(Q.T.dot(D).dot(Q) - R)
[8]:
0.14836639449076508

Here’s an exhuastive search over all the permutations of \(r\). It’s disturbing that the second best solution is when Duke is second and Miami is last!

[9]:
from itertools import permutations, chain
import pandas as pd

candidates = permutations(range(D.shape[0]), D.shape[0])
candidates = map(lambda p: (p, get_Q(p)), candidates)
candidates = map(lambda tup: (tup[0], [np.linalg.norm(tup[1].T.dot(D).dot(tup[1]) - R)]), candidates)
candidates = map(lambda tup: list(chain(*tup)), candidates)

dist_df = pd.DataFrame(candidates, columns=['r0', 'r1', 'r2', 'r3', 'r4', 'distance'])\
    .sort_values(['distance'])\
    .reset_index(drop=True)

dist_df.head(10)
[9]:
r0 r1 r2 r3 r4 distance
0 4 1 3 2 0 0.148366
1 1 4 3 2 0 0.173290
2 4 1 2 3 0 0.180447
3 1 4 2 3 0 0.201440
4 4 3 1 2 0 0.234359
5 4 1 3 0 2 0.235257
6 1 3 4 2 0 0.241453
7 1 4 3 0 2 0.251720
8 4 1 2 0 3 0.254222
9 4 2 1 3 0 0.267958

Just out of curiosity, we can also look at the distribution of the top 10 candidates.

[10]:
dist_df.head(10).r0.value_counts()
[10]:
4    6
1    4
Name: r0, dtype: int64
[11]:
dist_df.head(10).r1.value_counts()
[11]:
1    4
4    3
3    2
2    1
Name: r1, dtype: int64
[12]:
dist_df.head(10).r2.value_counts()
[12]:
3    4
2    3
1    2
4    1
Name: r2, dtype: int64
[13]:
dist_df.head(10).r3.value_counts()
[13]:
2    4
0    3
3    3
Name: r3, dtype: int64
[14]:
dist_df.head(10).r4.value_counts()
[14]:
0    7
2    2
3    1
Name: r4, dtype: int64