Confusion Matrix in ML
A Confusion Matrix is a performance measurement tool used for evaluating the accuracy of a classification model. It is especially useful for understanding how well our model performs on a multi-class or binary classification problem.
The matrix compares the predicted class labels with the actual class labels, helping us understand the types of errors our model is making. It is represented in the form of a square matrix, where:
- Rows represent the actual classes.
- Columns represent the predicted classes.
Structure of a Confusion Matrix For a binary classification problem, the confusion matrix looks like this:
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | True Positive (TP) | False Negative (FN) |
Actual Negative | False Positive (FP) | True Negative (TN) |
- True Positives (TP): The model correctly predicted the positive class.
- True Negatives (TN): The model correctly predicted the negative class.
- False Positives (FP): The model incorrectly predicted the positive class (also known as a Type I error).
- False Negatives (FN): The model incorrectly predicted the negative class (also known as a Type II error).
Example of a Confusion Matrix Suppose we have a model predicting whether an email is spam (positive) or not spam (negative). Here's a sample confusion matrix:
Predicted Spam | Predicted Not Spam | |
---|---|---|
Actual Spam | 50 (TP) | 10 (FN) |
Actual Not Spam | 5 (FP) | 100 (TN) |
From this matrix:
- True Positives (TP): 50 emails correctly classified as spam.
- False Negatives (FN): 10 spam emails misclassified as not spam.
- False Positives (FP): 5 non-spam emails misclassified as spam.
- True Negatives (TN): 100 emails correctly classified as not spam.