Confusion Matrix

Available from:

Description

A confusion matrix represents “the relative frequencies with which each of a number of stimuli is mistaken for each of the others by a person in a task requiring recognition or identification of stimuli” (R. Colman, A Dictionary of Psychology, 2008). Each row represents the predicted label of an instance, and each column represents the observed label of that instance. Numbers at each (row, column) reflect the total number of instances of predicted label “row” which were observed as having label “column”.

A two-label example is:

Observed        Observed      | 
Positive        Negative      | Predicted
------------------------------+------------
    a               b         | Positive
    c               d         | Negative

Here the value:

a

is the number of true positives (those predicted positive and observed positive)

b

is the number of false negatives (those predicted positive but observed negative)

c

is the number of false positives (those predicted negative but observed positive)

d

is the number of true negatives (those predicted negative and observed negative)

From this matrix we can calculate statistics like:

true positive rate

a/(a+b)

positive recall

a/(a+c)

The implementation supports confusion matrices with more than two labels, and hence most statistics are calculated with reference to a named label. When more than two labels are in use, the statistics are calculated as if the named label were positive and all the other labels are grouped as if negative.

For example, in a three-label example:

 Observed        Observed      Observed     | 
   Red            Blue          Green       | Predicted
--------------------------------------------+------------
    a               b             c         | Red
    d               e             f         | Blue
    g               h             i         | Green

We can calculate:

true red rate

a/(a+b+c)

red recall

a/(a+d+g)

Example

The following example creates a two-label confusion matrix, prints a few statistics and displays the matrix as a table.

require 'confusion_matrix'

cm = ConfusionMatrix.new :pos, :neg
cm.add_for(:pos, :pos, 10)
3.times { cm.add_for(:pos, :neg) }
20.times { cm.add_for(:neg, :neg) }
5.times { cm.add_for(:neg, :pos) }

puts "Precision: #{cm.precision}"
puts "Recall: #{cm.recall}"
puts "MCC: #{cm.matthews_correlation}"
puts
puts(cm.to_s)

Output:

Precision: 0.6666666666666666
Recall: 0.7692307692307693
MCC: 0.5524850114241865

Observed |
pos neg  | Predicted
---------+----------
 10   3  | pos
  5  20  | neg