A confusion matrix
is a practical and conceptually simple tool to evaluate a classification model.
So we need to honour it with a simple way to compute it, like Gauss in the past,
without the magic of would do
it with simple linear algebra operations:from sklearn.metrics import confusion_matrix
A confusion matrix is the matrix multiplication by the true and predicted labels, both encoding as one-hot vectors.
If we have the true labels of 4 observations in vector \(\boldsymbol y = [1, 0, 2, 1]\), and 3 different classes (i.e. 0, 1 and 2), their one-hot encoding will be:
\[ \boldsymbol T = \begin{bmatrix} 0 & 1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & 1 & 0 \end{bmatrix}~\in~[0,1]^{4~\times~3} \]
Some classification model gives us the predicted label for each observation in the vector \(\hat{\boldsymbol y} = [2, 0, 2, 0]\), by the same logic above, the one-hot encoding will be:
\[ \hat{\boldsymbol T} = \begin{bmatrix} 0 & 0 & 1\\ 1 & 0 & 0\\ 0 & 0 & 1\\ 1 & 0 & 0 \end{bmatrix}~\in~[0,1]^{4~\times~3} \] We have everything to compute the confusion matrix and, it will be \(\boldsymbol T^{\top}\hat{\boldsymbol T}~\in~\boldsymbol Z_{0+}^{3\times3}\). So again,
A confusion matrix is the matrix multiplication by the true and predicted labels, both encoding as one-hot vectors.
\[ \boldsymbol T^{\top}\hat{\boldsymbol T} = \begin{bmatrix} 1 & 0 & 0\\ 1 & 0 & 1\\ 0 & 0 & 1 \end{bmatrix}~\in~Z_{0+}^{3~\times~3} \\ \]
As you notice, the confusion matrix summarizes the information correctly of both vectors.
\[ \boldsymbol y = [1,0,2,1] \\ \hat{\boldsymbol y} = [2,0,2,0] \]
0
1
, producing two errors1
-observations, one with a 2
and the
other with a 0
, look at the second rowimport numpy as np
We need two steps to compute our confusion matrix.
First, we need a way to transform a vector \(\boldsymbol v\) with k-classes into their one-hot-encoding
version, v_one_hot = one_hot_econding(v)
:
def one_hot_encoding(v):
'''Return the one-hot encoding vector for k-classes label vector'''
num_classes = np.unique(v).size
return np.eye(num_classes)[v]
Second, compute the confusion matrix, \(~\boldsymbol T^{\top}\hat{\boldsymbol T}~\in~\boldsymbol Z_{0+}^{K\times K}~\), for k-classes; there are many ways of doing it with numpy
as
you can see in the following code. Below I used the canonical notation to name the
true labels (y
) and the predicted ones (y_pred
):
# 1st option: Using the matrix multiplication '@' operator
one_hot_encoding(y).T @ one_hot_encoding(y_pred)
# 2nd option: Using np.dot()
np.dot(one_hot_encoding(y).T, one_hot_encoding(y_pred))
# 3rd option: Using np.matmul()
np.matmul(one_hot_encoding(y).T, one_hot_encoding(y_pred))
And we are done! Of course, you can always get your confusion matrix from your favourite store ;)
from sklearn.metrics import confusion_matrix
confusion_matrix(y, y_pred)
That’s the way computer talks to each other.