Confusion Matrix

Confusion Matrix: is a table that is used describe the performance of classification model.

For example our classification model made 100 prediction on whether person has a disease or not.

Let’s build confusion matrix, but first get basic terms for confusion matrix:

True Positive(TP): Actual label is yes and model also predict yes.

True Negative(TN): Actual label is No and model predict No.

False Positive(FP): Actual label is No and model predict Yes. This is called Type I error.

False Negative(FN): Actual label is Yes and model predict No. This is called Type II error.

The list of metrics that can be calculated from confusion matrix:

Accuracy: (TP + TN)/total = (45 + 20)/100 = 6.5
Misclassification Rate: (FP + FN)/total = (25 + 10)/100 = 3.5 Or (1- Accuracy) = 0.35
True Positive Rate: TP/actual_yes = 45/55
True Negative Rate: TN/actual_no = 20/45
False Positive Rate: FP/actual no = 10/45
False Negative Rate: FN/actual_yes = 24/70
Precision : TP/total predicted_yes = 45/70
Recall: TP/total actual_yes
Prevalence: actual_yes/total = 55/100

Other metrics can be useful for classification model:

Null error rate: how many wrong prediction will be there if you predict only majority class like if you predict only yes in above example then null error rate will be 45/100. This can be baseline metric to compare against our classifier.
F1 Score: harmonic mean of precision and recall.
ROC curve: plot between True positive rate and False positive rate. Cohen’s Kappa
Important point for newcomers: Accuracy metric is always not a good metric for predictive models because if one class is dominating 99% times then accuracy of model will be 99% if we predict only one class. Precision and recall are better metrics in such cases, but precision too can be biased by very unbiased class. Keep learning and be innovative.