How to compare two histograms

For a small project, I need to compare one image with another - to determine if the images are approximately the same or not. The images are smallish, varying from 25 to 100px across. The images are meant to be of the same picture data but are sublty different, so a simple pixel equality check won't work. Consider these two possible scenarios:

Show
    1. A security (CCTV) camera in a museum looking at an exhibit: we want to quickly see if two different video frames show the same scene, but slight differences in lighting and camera focus means they won't be identical.
    2. A picture of a vector computer GUI icon rendered at 64x64 compared to the same icon rendered at 48x48 (but both images would be scaled down to 32x32 so the histograms have the same total pixel count).

    I've decided to represent each image using histograms, using three 1D histograms: one for each RGB channel - it's safe for me to just use colour and to ignore texture and edge histograms (An alternative approach uses a single 3D histogram for each image, but I'm avoiding that as it adds extra complexity). Therefore I will need to compare the histograms to see how similar they are, and if the similarity measure passes some threshold value then I can say with confidence the respective images are visually the same - I would be comparing each image's corresponding channel histograms (e.g. image 1's red histogram with image 2's red histogram, then image 1's blue histogram with image 2's blue histogram, then the green histograms - so I'm not comparing image 1's red histogram with image 2's blue histogram, that would just be silly).

    Let's say I have these three histograms, which represent a summary of the red RGB channel for three images (using 5 bins for 7-pixel images for simplicity):

    H1 H2 H3 X X X X X X X X X X X X X X X X X X X X X 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 H1 = [ 1, 3, 0, 2, 1 ] H2 = [ 3, 1, 0, 1, 2 ] H3 = [ 1, 1, 1, 1, 3 ]

    Image 1 (H1) is my reference image, and I want to see if Image 2 (H2) and/or Image 3 (H3) is similar to Image 1. Note that in this example, Image 2 is similar to Image 1, but Image 3 is not.

    When I did a cursory search for "histogram difference" algorithms (at least those I could understand) I found a popular approach was to just sum the differences between each bin, however this approach often fails because it weighs all bin differences the same.

    To demonstrate the problem with this approach, in C# code, like this:

    Int32[] image1RedHistogram = new Int32[] { 1, 3, 0, 2, 1 }; Int32[] image2RedHistogram = new Int32[] { 3, 2, 0, 1, 2 }; Int32[] image3RedHistogram = new Int32[] { 1, 1, 1, 1, 3 }; Int32 GetDifference(Int32[] x, Int32[] y) { Int32 sumOfDifference = 0; for( int i = 0; i < x.Length; i++ ) { sumOfDifference += Math.Abs( x[i] - y[i] ); } return sumOfDifferences; }

    The output of which is:

    GetDifference( image1RedHistogram, image2RedHistogram ) == 6 GetDifference( image1RedHistogram, image3RedHistogram ) == 6

    This is incorrect.

    Is there a way to determine the difference between two histograms that takes into account the shape of the distribution?

    Prev Tutorial: Histogram Calculation

    Next Tutorial: Back Projection

    Goal

    In this tutorial you will learn how to:

    • Use the function cv::compareHist to get a numerical parameter that express how well two histograms match with each other.
    • Use different metrics to compare histograms

    Theory

    • To compare two histograms ( \(H_{1}\) and \(H_{2}\) ), first we have to choose a metric ( \(d(H_{1}, H_{2})\)) to express how well both histograms match.
    • OpenCV implements the function cv::compareHist to perform a comparison. It also offers 4 different metrics to compute the matching:
      1. Correlation ( CV_COMP_CORREL )

        \[d(H_1,H_2) = \frac{\sum_I (H_1(I) - \bar{H_1}) (H_2(I) - \bar{H_2})}{\sqrt{\sum_I(H_1(I) - \bar{H_1})^2 \sum_I(H_2(I) - \bar{H_2})^2}}\]

        where

        \[\bar{H_k} = \frac{1}{N} \sum _J H_k(J)\]

        and \(N\) is the total number of histogram bins.
      2. Chi-Square ( CV_COMP_CHISQR )

        \[d(H_1,H_2) = \sum _I \frac{\left(H_1(I)-H_2(I)\right)^2}{H_1(I)}\]

      3. Intersection ( method=CV_COMP_INTERSECT )

        \[d(H_1,H_2) = \sum _I \min (H_1(I), H_2(I))\]

      4. Bhattacharyya distance ( CV_COMP_BHATTACHARYYA )

        \[d(H_1,H_2) = \sqrt{1 - \frac{1}{\sqrt{\bar{H_1} \bar{H_2} N^2}} \sum_I \sqrt{H_1(I) \cdot H_2(I)}}\]

    Code

    • What does this program do?
      • Loads a base image and 2 test images to be compared with it.
      • Generate 1 image that is the lower half of the base image
      • Convert the images to HSV format
      • Calculate the H-S histogram for all the images and normalize them in order to compare them.
      • Compare the histogram of the base image with respect to the 2 test histograms, the histogram of the lower half base image and with the same base image histogram.
      • Display the numerical matching parameters obtained.

    C++

    • Downloadable code: Click here
    • Code at glance:

      #include "opencv2/imgcodecs.hpp"

      #include "opencv2/highgui.hpp"

      #include "opencv2/imgproc.hpp"

      "{ help h| | Print help message. }"

      "{ @input1 | | Path to input image 1. }"

      "{ @input2 | | Path to input image 2. }"

      "{ @input3 | | Path to input image 3. }";

      int main( int argc, char** argv )

      CommandLineParser parser( argc, argv, keys );

      Mat src_base = imread( parser.get<String>("input1") );

      Mat src_test1 = imread( parser.get<String>("input2") );

      Mat src_test2 = imread( parser.get<String>("input3") );

      if( src_base.empty() || src_test1.empty() || src_test2.empty() )

      cout << "Could not open or find the images!\n" << endl;

      Mat hsv_base, hsv_test1, hsv_test2;

      cvtColor( src_base, hsv_base, COLOR_BGR2HSV );

      cvtColor( src_test1, hsv_test1, COLOR_BGR2HSV );

      cvtColor( src_test2, hsv_test2, COLOR_BGR2HSV );

      int h_bins = 50, s_bins = 60;

      int histSize[] = { h_bins, s_bins };

      float h_ranges[] = { 0, 180 };

      float s_ranges[] = { 0, 256 };

      const float* ranges[] = { h_ranges, s_ranges };

      int channels[] = { 0, 1 };

      Mat hist_base, hist_half_down, hist_test1, hist_test2;

      calcHist( &hsv_base, 1, channels, Mat(), hist_base, 2, histSize, ranges, true, false );

      normalize( hist_base, hist_base, 0, 1, NORM_MINMAX, -1, Mat() );

      calcHist( &hsv_half_down, 1, channels, Mat(), hist_half_down, 2, histSize, ranges, true, false );

      normalize( hist_half_down, hist_half_down, 0, 1, NORM_MINMAX, -1, Mat() );

      calcHist( &hsv_test1, 1, channels, Mat(), hist_test1, 2, histSize, ranges, true, false );

      normalize( hist_test1, hist_test1, 0, 1, NORM_MINMAX, -1, Mat() );

      calcHist( &hsv_test2, 1, channels, Mat(), hist_test2, 2, histSize, ranges, true, false );

      normalize( hist_test2, hist_test2, 0, 1, NORM_MINMAX, -1, Mat() );

      for( int compare_method = 0; compare_method < 4; compare_method++ )

      double base_base = compareHist( hist_base, hist_base, compare_method );

      double base_half = compareHist( hist_base, hist_half_down, compare_method );

      double base_test1 = compareHist( hist_base, hist_test1, compare_method );

      double base_test2 = compareHist( hist_base, hist_test2, compare_method );

      cout << "Method " << compare_method << " Perfect, Base-Half, Base-Test(1), Base-Test(2) : "

      << base_base << " / " << base_half << " / " << base_test1 << " / " << base_test2 << endl;

    Java

    • Downloadable code: Click here
    • Code at glance:

      import org.opencv.core.Core;

      import org.opencv.core.Mat;

      import org.opencv.core.MatOfFloat;

      import org.opencv.core.MatOfInt;

      import org.opencv.core.Range;

      import org.opencv.imgcodecs.Imgcodecs;

      import org.opencv.imgproc.Imgproc;

      public void run(String[] args) {

      System.err.println("You must supply 3 arguments that correspond to the paths to 3 images.");

      Mat srcBase = Imgcodecs.imread(args[0]);

      Mat srcTest1 = Imgcodecs.imread(args[1]);

      Mat srcTest2 = Imgcodecs.imread(args[2]);

      if (srcBase.empty() || srcTest1.empty() || srcTest2.empty()) {

      System.err.println("Cannot read the images");

      Mat hsvBase = new Mat(), hsvTest1 = new Mat(), hsvTest2 = new Mat();

      Imgproc.cvtColor( srcBase, hsvBase, Imgproc.COLOR_BGR2HSV );

      Imgproc.cvtColor( srcTest1, hsvTest1, Imgproc.COLOR_BGR2HSV );

      Imgproc.cvtColor( srcTest2, hsvTest2, Imgproc.COLOR_BGR2HSV );

      Mat hsvHalfDown = hsvBase.submat( new Range( hsvBase.rows()/2, hsvBase.rows() - 1 ), new Range( 0, hsvBase.cols() - 1 ) );

      int hBins = 50, sBins = 60;

      int[] histSize = { hBins, sBins };

      float[] ranges = { 0, 180, 0, 256 };

      int[] channels = { 0, 1 };

      Mat histBase = new Mat(), histHalfDown = new Mat(), histTest1 = new Mat(), histTest2 = new Mat();

      List<Mat> hsvBaseList = Arrays.asList(hsvBase);

      Imgproc.calcHist(hsvBaseList, new MatOfInt(channels), new Mat(), histBase, new MatOfInt(histSize), new MatOfFloat(ranges), false);

      Core.normalize(histBase, histBase, 0, 1, Core.NORM_MINMAX);

      List<Mat> hsvHalfDownList = Arrays.asList(hsvHalfDown);

      Imgproc.calcHist(hsvHalfDownList, new MatOfInt(channels), new Mat(), histHalfDown, new MatOfInt(histSize), new MatOfFloat(ranges), false);

      Core.normalize(histHalfDown, histHalfDown, 0, 1, Core.NORM_MINMAX);

      List<Mat> hsvTest1List = Arrays.asList(hsvTest1);

      Imgproc.calcHist(hsvTest1List, new MatOfInt(channels), new Mat(), histTest1, new MatOfInt(histSize), new MatOfFloat(ranges), false);

      Core.normalize(histTest1, histTest1, 0, 1, Core.NORM_MINMAX);

      List<Mat> hsvTest2List = Arrays.asList(hsvTest2);

      Imgproc.calcHist(hsvTest2List, new MatOfInt(channels), new Mat(), histTest2, new MatOfInt(histSize), new MatOfFloat(ranges), false);

      Core.normalize(histTest2, histTest2, 0, 1, Core.NORM_MINMAX);

      for( int compareMethod = 0; compareMethod < 4; compareMethod++ ) {

      double baseBase = Imgproc.compareHist( histBase, histBase, compareMethod );

      double baseHalf = Imgproc.compareHist( histBase, histHalfDown, compareMethod );

      double baseTest1 = Imgproc.compareHist( histBase, histTest1, compareMethod );

      double baseTest2 = Imgproc.compareHist( histBase, histTest2, compareMethod );

      System.out.println("Method " + compareMethod + " Perfect, Base-Half, Base-Test(1), Base-Test(2) : " + baseBase + " / " + baseHalf

      + " / " + baseTest1 + " / " + baseTest2);

      public class CompareHistDemo {

      public static void main(String[] args) {

      System.loadLibrary(Core.NATIVE_LIBRARY_NAME);

      new CompareHist().run(args);

    Python

    • Downloadable code: Click here
    • Code at glance:

      from __future__ import print_function

      from __future__ import division

      parser = argparse.ArgumentParser(description='Code for Histogram Comparison tutorial.')

      parser.add_argument('--input1', help='Path to input image 1.')

      parser.add_argument('--input2', help='Path to input image 2.')

      parser.add_argument('--input3', help='Path to input image 3.')

      args = parser.parse_args()

      src_base = cv.imread(args.input1)

      src_test1 = cv.imread(args.input2)

      src_test2 = cv.imread(args.input3)

      if src_base is None or src_test1 is None or src_test2 is None:

      print('Could not open or find the images!')

      hsv_base = cv.cvtColor(src_base, cv.COLOR_BGR2HSV)

      hsv_test1 = cv.cvtColor(src_test1, cv.COLOR_BGR2HSV)

      hsv_test2 = cv.cvtColor(src_test2, cv.COLOR_BGR2HSV)

      hsv_half_down = hsv_base[hsv_base.shape[0]//2:,:]

      histSize = [h_bins, s_bins]

      ranges = h_ranges + s_ranges

      hist_base = cv.calcHist([hsv_base], channels, None, histSize, ranges, accumulate=False)

      cv.normalize(hist_base, hist_base, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

      hist_half_down = cv.calcHist([hsv_half_down], channels, None, histSize, ranges, accumulate=False)

      cv.normalize(hist_half_down, hist_half_down, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

      hist_test1 = cv.calcHist([hsv_test1], channels, None, histSize, ranges, accumulate=False)

      cv.normalize(hist_test1, hist_test1, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

      hist_test2 = cv.calcHist([hsv_test2], channels, None, histSize, ranges, accumulate=False)

      cv.normalize(hist_test2, hist_test2, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

      for compare_method in range(4):

      base_base = cv.compareHist(hist_base, hist_base, compare_method)

      base_half = cv.compareHist(hist_base, hist_half_down, compare_method)

      base_test1 = cv.compareHist(hist_base, hist_test1, compare_method)

      base_test2 = cv.compareHist(hist_base, hist_test2, compare_method)

      print('Method:', compare_method, 'Perfect, Base-Half, Base-Test(1), Base-Test(2) :',\

      base_base, '/', base_half, '/', base_test1, '/', base_test2)

    • Load the base image (src_base) and the other two test images:

    C++

    CommandLineParser parser( argc, argv, keys );

    Mat src_base = imread( parser.get<String>("input1") );

    Mat src_test1 = imread( parser.get<String>("input2") );

    Mat src_test2 = imread( parser.get<String>("input3") );

    if( src_base.empty() || src_test1.empty() || src_test2.empty() )

    {

    cout << "Could not open or find the images!\n" << endl;

    parser.printMessage();

    return -1;

    }

    Java

    if (args.length != 3) {

    System.err.println("You must supply 3 arguments that correspond to the paths to 3 images.");

    System.exit(0);

    }

    Mat srcBase = Imgcodecs.imread(args[0]);

    Mat srcTest1 = Imgcodecs.imread(args[1]);

    Mat srcTest2 = Imgcodecs.imread(args[2]);

    if (srcBase.empty() || srcTest1.empty() || srcTest2.empty()) {

    System.err.println("Cannot read the images");

    System.exit(0);

    }

    Python

    parser = argparse.ArgumentParser(description='Code for Histogram Comparison tutorial.')

    parser.add_argument('--input1', help='Path to input image 1.')

    parser.add_argument('--input2', help='Path to input image 2.')

    parser.add_argument('--input3', help='Path to input image 3.')

    args = parser.parse_args()

    src_base = cv.imread(args.input1)

    src_test1 = cv.imread(args.input2)

    src_test2 = cv.imread(args.input3)

    if src_base is None or src_test1 is None or src_test2 is None:

    print('Could not open or find the images!')

    exit(0)

    • Convert them to HSV format:

    C++

    Java

    Mat hsvBase = new Mat(), hsvTest1 = new Mat(), hsvTest2 = new Mat();

    Imgproc.cvtColor( srcBase, hsvBase, Imgproc.COLOR_BGR2HSV );

    Imgproc.cvtColor( srcTest1, hsvTest1, Imgproc.COLOR_BGR2HSV );

    Imgproc.cvtColor( srcTest2, hsvTest2, Imgproc.COLOR_BGR2HSV );

    Python

    • Also, create an image of half the base image (in HSV format):

    C++

    Mat hsv_half_down = hsv_base( Range( hsv_base.rows/2, hsv_base.rows ), Range( 0, hsv_base.cols ) );

    Java

    Mat hsvHalfDown = hsvBase.submat( new Range( hsvBase.rows()/2, hsvBase.rows() - 1 ), new Range( 0, hsvBase.cols() - 1 ) );

    Python

    hsv_half_down = hsv_base[hsv_base.shape[0]//2:,:]

    • Initialize the arguments to calculate the histograms (bins, ranges and channels H and S ).

    C++

    int h_bins = 50, s_bins = 60;

    int histSize[] = { h_bins, s_bins };

    float h_ranges[] = { 0, 180 };

    float s_ranges[] = { 0, 256 };

    const float* ranges[] = { h_ranges, s_ranges };

    int channels[] = { 0, 1 };

    Java

    int hBins = 50, sBins = 60;

    int[] histSize = { hBins, sBins };

    float[] ranges = { 0, 180, 0, 256 };

    int[] channels = { 0, 1 };

    Python

    h_bins = 50

    s_bins = 60

    histSize = [h_bins, s_bins]

    h_ranges = [0, 180]

    s_ranges = [0, 256]

    ranges = h_ranges + s_ranges

    channels = [0, 1]

    • Calculate the Histograms for the base image, the 2 test images and the half-down base image:

    C++

    Mat hist_base, hist_half_down, hist_test1, hist_test2;

    calcHist( &hsv_base, 1, channels, Mat(), hist_base, 2, histSize, ranges, true, false );

    normalize( hist_base, hist_base, 0, 1, NORM_MINMAX, -1, Mat() );

    calcHist( &hsv_half_down, 1, channels, Mat(), hist_half_down, 2, histSize, ranges, true, false );

    normalize( hist_half_down, hist_half_down, 0, 1, NORM_MINMAX, -1, Mat() );

    calcHist( &hsv_test1, 1, channels, Mat(), hist_test1, 2, histSize, ranges, true, false );

    normalize( hist_test1, hist_test1, 0, 1, NORM_MINMAX, -1, Mat() );

    calcHist( &hsv_test2, 1, channels, Mat(), hist_test2, 2, histSize, ranges, true, false );

    normalize( hist_test2, hist_test2, 0, 1, NORM_MINMAX, -1, Mat() );

    Java

    Mat histBase = new Mat(), histHalfDown = new Mat(), histTest1 = new Mat(), histTest2 = new Mat();

    List<Mat> hsvBaseList = Arrays.asList(hsvBase);

    Imgproc.calcHist(hsvBaseList, new MatOfInt(channels), new Mat(), histBase, new MatOfInt(histSize), new MatOfFloat(ranges), false);

    Core.normalize(histBase, histBase, 0, 1, Core.NORM_MINMAX);

    List<Mat> hsvHalfDownList = Arrays.asList(hsvHalfDown);

    Imgproc.calcHist(hsvHalfDownList, new MatOfInt(channels), new Mat(), histHalfDown, new MatOfInt(histSize), new MatOfFloat(ranges), false);

    Core.normalize(histHalfDown, histHalfDown, 0, 1, Core.NORM_MINMAX);

    List<Mat> hsvTest1List = Arrays.asList(hsvTest1);

    Imgproc.calcHist(hsvTest1List, new MatOfInt(channels), new Mat(), histTest1, new MatOfInt(histSize), new MatOfFloat(ranges), false);

    Core.normalize(histTest1, histTest1, 0, 1, Core.NORM_MINMAX);

    List<Mat> hsvTest2List = Arrays.asList(hsvTest2);

    Imgproc.calcHist(hsvTest2List, new MatOfInt(channels), new Mat(), histTest2, new MatOfInt(histSize), new MatOfFloat(ranges), false);

    Core.normalize(histTest2, histTest2, 0, 1, Core.NORM_MINMAX);

    Python

    hist_base = cv.calcHist([hsv_base], channels, None, histSize, ranges, accumulate=False)

    cv.normalize(hist_base, hist_base, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

    hist_half_down = cv.calcHist([hsv_half_down], channels, None, histSize, ranges, accumulate=False)

    cv.normalize(hist_half_down, hist_half_down, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

    hist_test1 = cv.calcHist([hsv_test1], channels, None, histSize, ranges, accumulate=False)

    cv.normalize(hist_test1, hist_test1, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

    hist_test2 = cv.calcHist([hsv_test2], channels, None, histSize, ranges, accumulate=False)

    cv.normalize(hist_test2, hist_test2, alpha=0, beta=1, norm_type=cv.NORM_MINMAX)

    • Apply sequentially the 4 comparison methods between the histogram of the base image (hist_base) and the other histograms:

    C++

    for( int compare_method = 0; compare_method < 4; compare_method++ )

    {

    double base_base = compareHist( hist_base, hist_base, compare_method );

    double base_half = compareHist( hist_base, hist_half_down, compare_method );

    double base_test1 = compareHist( hist_base, hist_test1, compare_method );

    double base_test2 = compareHist( hist_base, hist_test2, compare_method );

    cout << "Method " << compare_method << " Perfect, Base-Half, Base-Test(1), Base-Test(2) : "

    << base_base << " / " << base_half << " / " << base_test1 << " / " << base_test2 << endl;

    }

    Java

    for( int compareMethod = 0; compareMethod < 4; compareMethod++ ) {

    double baseBase = Imgproc.compareHist( histBase, histBase, compareMethod );

    double baseHalf = Imgproc.compareHist( histBase, histHalfDown, compareMethod );

    double baseTest1 = Imgproc.compareHist( histBase, histTest1, compareMethod );

    double baseTest2 = Imgproc.compareHist( histBase, histTest2, compareMethod );

    System.out.println("Method " + compareMethod + " Perfect, Base-Half, Base-Test(1), Base-Test(2) : " + baseBase + " / " + baseHalf

    + " / " + baseTest1 + " / " + baseTest2);

    }

    Python

    for compare_method in range(4):

    base_base = cv.compareHist(hist_base, hist_base, compare_method)

    base_half = cv.compareHist(hist_base, hist_half_down, compare_method)

    base_test1 = cv.compareHist(hist_base, hist_test1, compare_method)

    base_test2 = cv.compareHist(hist_base, hist_test2, compare_method)

    print('Method:', compare_method, 'Perfect, Base-Half, Base-Test(1), Base-Test(2) :',\

    base_base, '/', base_half, '/', base_test1, '/', base_test2)

    1. We use as input the following images:

      How to compare two histograms

      Base_0

      How to compare two histograms

      Test_1

      How to compare two histograms

      Test_2

      where the first one is the base (to be compared to the others), the other 2 are the test images. We will also compare the first image with respect to itself and with respect of half the base image.
    2. We should expect a perfect match when we compare the base image histogram with itself. Also, compared with the histogram of half the base image, it should present a high match since both are from the same source. For the other two test images, we can observe that they have very different lighting conditions, so the matching should not be very good:
    3. Here the numeric results we got with OpenCV 3.4.1:
      *Method* Base - Base Base - Half Base - Test 1 Base - Test 2
      *Correlation* 1.000000 0.880438 0.20457 0.0664547
      *Chi-square* 0.000000 4.6834 2697.98 4763.8
      *Intersection* 18.8947 13.022 5.44085 2.58173
      *Bhattacharyya* 0.000000 0.237887 0.679826 0.874173
      For the Correlation and Intersection methods, the higher the metric, the more accurate the match. As we can see, the match base-base is the highest of all as expected. Also we can observe that the match base-half is the second best match (as we predicted). For the other two metrics, the less the result, the better the match. We can observe that the matches between the test 1 and test 2 with respect to the base are worse, which again, was expected.