Absolute joint moments: a novel image similarity measure

In this paper, we propose a novel approach for estimating image similarity. This measure is of importance in assessing image correspondence or image alignment and plays an important role in image registration. Currently, this problem is approached rather one-dimensionally since most registration methods consider the problem as either mono- or multi-modal. This perspective leads to the selection of some form of either the correlation coefficient (CC) or mutual information (MI) as image similarity measure (ISM). We propose a more generic framework for ISM construction, based on absolute joint moments, which can be considered as a generalization of CC. Within this framework, we propose a specific ISM that provides a different trade-off between MI and CC in terms of performance and computational cost for general registration problems. To illustrate this, we compared CC and MI with the proposed ISM and performed extensive experiments with regard to accuracy, robustness and speed. The evaluation demonstrated that the proposed absolute joint moments is a good combination of properties of CC and MI, with respect to speed and performance.


Introduction
Image registration is an optimization process that utilizes a similarity measure to find the optimal alignment of two images. The registration accuracy depends on the selection of the optimization algorithm and geometric transformation as well as on the definition of the similarity measure. Therefore, it is essential to use a suitable similarity measure for a given problem. The selection of an image similarity measure, especially in the case of medical image registration, is usually reduced down to the question whether a multi-or mono-modal registration is required. This black-and-white perspective leads to well-known answers and results in the selection of correlation coefficient (CC) for mono-modal registration and mutual information (MI) for multi-modal registration.
However, the problem of registration can be approached from several perspectives, and often, the registration of the same two images can be performed using different methodological approaches. For example, Zitova et al. in [1] distinguish not only a multi-modal approach but *Correspondence: hrvoje.kalinic@fer.hr 1 Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb 10000, Croatia Full list of author information is available at the end of the article also a multi-view or multi-temporal approach. Similarly, Maintz et al. in [2], classify registration not only from a intra-/inter-modality perspective but also from a intra-/ inter-subject perspective. Perhaps for this reason, many other image similarity measure (ISM) emerged over time (see Section 2).
In the following sections, we propose a wider framework for the construction of image similarity measures. Within this framework, we propose a specific ISM to show that, when constructed in this way, it combines beneficial properties of the two most used ones: CC and MI.
Whatever similarity measure is used for image registration, it has to satisfy one basic condition -at the exact location of the correct alignment of two images, the similarity (measure) has to be maximal. To find the maximum, the image registration incorporates an optimization algorithm, which iteratively calculates a similarity measure. The number of calculations may thus easily reach a number of a few thousands, especially if the geometric transformation is non-linear. Therefore, the complexity of the similarity measure also plays an important role in registration methods. For this reason, in our experiments, we measure the performance of the similarity measure in terms of accuracy, robustness and speed. http://jivp.eurasipjournals.com/content/2013/1/24

Background
The purpose of an ISM is to quantify the similarity between two images, usually referred to as source and target images. In this section, we will briefly discuss the properties of the two most prominent ISMs in order to compare them to the newly proposed one.
To set the notation, let us denote source and target images as S(x) and T(x), where x ∈ R n stands for the pixel coordinate vector. Further, the average value is denoted with a line above the function (e.g. S), and p is used to denote the (joint) probability density function (e.g. p TS (x, y)).
It is well known [3,4] that CC, defined as is an optimal choice if the pixel values between the images S(x) and T(x) are related by an affine function, even in the presence of a reasonable amount of noise. Since most mono-modal image registration problems assume this type of functional relationship between images, CC was considered a dedicated approach for this type of problems and thus is often the first choice. Similarly, MI defined as is established as the appropriate ISM for multi-modal registration [5,6]. All good properties, as well as the shortcomings, of MI and CC come directly from their definition. For example, CC is fast since it uses only summary statistics. The calculation of mean and standard deviation only, which are sufficient to describe an affine functional relationship between pixel values, makes CC a good and fast measure for registration of images with affine relationship between their pixel values. Many image registration problems violate the assumption of an affine relationship between image pixel values, thus making CC a suboptimal choice for images with more complex functional relationship between pixel values [3,7]. On the other hand, the definition of MI includes much more statistical information about the relationship of the images S(x) and T(x). This property comes directly from the joint probability density function p TS = p(T, S), which fully characterizes the relationship between images T and S, without any assumptions on a type of functional relationship or any noise that may exist in the image acquisition process. Many of the disadvantages of the MI come from the very same things that boost its good properties -the joint probability density function (PDF). For example, since the joint PDF needs to be estimated/approximated, various problems can emerge, like sensitivity to sample size, number of histogram bins or interpolation [7,8].
For this reason, we will primarily focus on CC and MI, investigate their properties and compare them to the ISM that we propose. In the following section, we will give some motivating examples to show that there is room for improvement besides the exiting ISMs. Next, we aim to propose a framework for ISM construction which will utilize a chosen amount of statistical values instead of only a few (such as standard deviation, skewness, kurtosis). In this way, we aim to present a framework which will bridge a gap between CC and MI and select a measure from this framework to show that it incorporates the good sides of both of them. In a way, this paper can be seen as an extension of the idea presented in the paper by Kalinic et al. [28].

Motivation
In previous sections, we mentioned that CC is considered ideal for mono-modal image registration due to its elegance and swiftness and with the implicit assumption that the relationship between pixel intensity values is affine and only corrupted by Gaussian noise in mono-modal image acquisition [3,4]. Due to its intrinsic properties, MI can be used in both mono-modal and multi-modal applications, the latter of which is extensively used. This section provides three simple examples where either CC or MI (or both) do not perform well (see Figure 1).
As will be shown in further sections, the performance of the MI may vary, depending on the number of bins selected to approximate the PDF. To distinguish one MI implementation from another, the number of bins is used as index (e.g. MI 8 and MI 256 ). The three examples shown in Figure 1 are given for each ISM implementation: CC, MI 8 and MI 256 . The comparison between the images is done while applying three different geometric transformations: scaling, rotation and translation. The images which are to be aligned are constructed from the same template image selected from the test set (see Section 5.1) by simple noise addition, with A n /A s = 0.5 (for details about noise degradation model, see Section 5.2). Since both images are constructed from the same template, the correct alignment is already known, i.e. the ISM should have a maximal value for unity scaling, zero rotation and zero translation.  In the first example, the image is consecutively translated by 1 pixel in the range of −30 to 30. In the second example, the image is consecutively rotated by 1°in the range of −30°to 30°. In the third example, the image is scaled by s = 0.9 + 0.01 · n where n ∈ {0, . . . , 20}. Figure 1 shows the values for MI 8  Notice that MI 256 does not perform well in the first example, and neither of the ISMs has the maximum at the correct location for both rotation and scaling. Therefore, an ISM defined in a different way might be able to produce a better result in these examples, but then, it obviously remains to be seen if its better performance would be related only to this particular image. Both questions will be addressed further on (see Section 7).

Absolute joint moments
In this section, we define source and target images as random variables and denote them as S and T instead of S(x) and T(x). The following notation is used:

Framework for constructing image similarity measures
As mentioned in the previous section, the idea is to use more statistical information from the images (instead of only a limited amount as in CC) for the ISM construction. Joint moments are the proposed information to use. The motivation behind this becomes clear if we write Equation 1 in the following form: where σ T and σ S denote standard deviations. If we rewrite Equation 3 to the following form we can see that this is equal to the joint central moment of order (1,1). From this, we propose a generalization in the following form: We refer to the absolute joint moment (AJM) framework for ISM construction. Notice that if we take only the first element of the sum and select ω n = ω m = 1, this reduces down to the absolute covariance (numerator of CC from Equation 3). http://jivp.eurasipjournals.com/content/2013/1/24

Proposed image similarity measure
For the proposed image similarity, we use a specific selection of weights ω n and ω m , which are computationally efficient and guarantee the convergence of the sum. Since this is just one of many possible selection of weights, we will denote ISM as AJM i to indicate that is just one instance.
Using the selected weights, Equation 5 can be rearranged into the form of the Taylor expansion of the following exponential function: As the expectation is estimated on the overlapping region of images S and T (here denoted by D X ), we will only have the estimation of AJM i : where N denotes the number of pixel pairs in D X . In further sections, the AJM i will be the only used and tested ISM from this framework, so we will denote it simply as AJM.
We will show that this ISM selected from the proposed framework, compared to MI and CC, will have a different trade-off between speed and performance as will result directly from the definition of AJM. In order to show this, in further sections, we will investigate the properties of AJM and compare them to MI and CC with regard to robustness, accuracy and speed.

Data set
Images from publicly available databases were used to test the properties of the registration implemented using AJM as similarity measure. The test set was constructed so as to have as much diversity as possible. First, we used all 44 miscellaneous images from the SIPI database [29]. Since this database does not have medical images, all 19 medical images from the VIS database [30] were added to the set as well as 3 mammography images from the MIAS database [31]. From the MIAS set, only three images were selected since the variability of the images from this set is low. Finally, 34 images of different objects from the ALOI database [32] were added to the set which made the total number of images in the testing set 100. The set constructed in this way contained images with different context, from natural to artificially constructed images. Both colour and greyscale images were represented in this set, but all images were converted to greyscale before the registration. All the images were coded with either 7 or 8 bits per pixel and the resolution of images ranged from 128 × 128 to 1,024 × 1,024. For the purpose of the paper, all images were converted to double-precision floating-point format and scaled to interval [ 0, 1].
A few images from each database, forming a representative collection of images from the test set, are depicted in the first row of Figure 2. In the second row of Figure 2, the same images after artificial degradation is shown (see Section 5.2 for details). The pairs of images constructed in this way will be used for the evaluation of the ISMs in most of further experiments.

Image degradation model
To evaluate the performance of the ISMs, we will need a pair of each image. Therefore, we introduced several degradation models that will be applied on each image in order to simulate different effects that may happen during the image acquisition process such as excessive noise, contrast changes or non-linear intensity distortion. The image degradation models are described in the following subsection and, as can be noticed, are inspired by the paper of Maes et al. [33].

Contrast inhomogeneity
A linear spatially variant intensity transformation is used to simulate image contrast inhomogeneity effects and is modeled by the following expression: where α(x, y) is defined as follows: with (x c , y c ) being the coordinates of the point around which the curve is positioned and k 1 is the distortion parameter.

Noise
To assure that noise does not randomly affect the contrast, we selected a distribution with finite support. Thus, the uniform distribution seemed a natural choice as a noise degradation model. Notice that this simple degradation model becomes more complex when the sequential image degradation model is applied since that degradation affects the noise distribution as well. Additive uniform noise from the interval [0, k 2 A s ] is superimposed on the original image. Here, A s stands for the amplitude of the signal and k 2 stands for the amplitude ratios between noise and signal (A n /A s ).

Non-linear intensity distortion
A non-linear intensity transformation is used to simulate pixel value distortion and is described by the following polynomial: where I xy stands for the intensity level (at position (x, y)) and i 1 , i 2 , i 3 . . . , i n are roots of the polynomial that simulates the intensity distortion. After each distortion, the image is normalized to keep the original range of pixel values.

Experiments and results
In the first two experiments (Sections 6.1 and 6.2), the image pairs, between which the correct alignment is to be determined, are the original and the degraded image. Therefore, the gold standard is well known since the correct alignment is for unity scaling, and zero translation and rotation. To evaluate the performance of the ISM, an exhaustive search for the global maximum was done in order to assure that the suitability of the ISM, rather

Robustness test
Various image degradations may alter the intensities and may affect the performance of a similarity measure.
To evaluate the robustness of a similarity measure with respect to additive noise, contrast change or non-linear intensity distortion, each degradation model (described in Section 5.2) was applied to each image from the set (see Section 5.1), and the similarity between the original and the degraded image was calculated for different translation, rotation and scaling factor. For the degradation model, the parameters listed in the Table 1 were used. As can be assumed from the Table 1, only third-order polynomials were used for the robustness test. However, this selection ensures non-linear distortion, as can be seen in Figure 3. Here, we will focus on the robustness of the ISMs with respect to the amount of deformation. If the ISM is robust to the degradation effects, the maximum of the ISM will be achieved for approximately the same transformation for any amount and type of degradation. Therefore, in this test, we assess how the number of images with alignment error less than ξ changes with respect to the amount of degradation. ξ is defined as 5 pixels for translation, 0.05 for scaling (see Equation 11 for error definition), and 9°f or rotation -which represents 5% of the Figure 4 x-axis range (maximal deformation). The experiment was done for each degradation model, and the alignment error was measured for three different types of image transformation: translation, scaling and rotation. Only 1 degree of freedom was allowed in each degradation model, namely k 1 , k 2 and i 1 . Each parameter was changed in ten equidistant steps within the range allowed by Table 1. The experimental results are given in Figure 5. Graphs in Figure 4 are sorted so that each row shows results for a different image degradation, and each column shows results for a different geometric transformation used.
As anticipated, this experiment showed that MI 256 is not robust to noise and that CC is not robust to non-linear intensity distortions. For AJM, one can observe that it is affected by high non-linear intensity distortion. However, for a moderate distortion, it still performs satisfactorily. Therefore, AJM seems robust to noise and to moderate amounts of non-linear intensity distortion. AJM is also fairly robust to contrast inhomogeneity since the performance of AJM is comparable to those of other ISMs. All  Figure 5 Robustness of ISMs. The graphs show change in the performance of each ISM with regard to distortion. For contrast inhomogeneities, the parameter is k 1 ; for noise, k 2 ; and for non-linear distortion, i 1 . For each ISM and for each distortion, the y-axis shows the the number of images with alignment error less than ξ and the x-axis shows the amount of distortion (i.e. k 1 , k 2 and i 1 ). The performance of CC, AJM, MI 8 and MI 256 are plotted using circles, solid line, triangles and dashed line, respectively. of these conclusions hold for translation, rotation and scaling.

Scaling Rotation Translation
The results of the previous experiment could vary if a different error threshold ξ is selected. To estimate how this results would change for different ξ selected, we performed the following test. Again, all three degradation models were applied and the images were aligned after three different transformations. However, the fixed parameters in this case are k 1 , k 2 and i 1 . The parameters are set to represent the largest degradation. The results are presented in Figure 5, where, in each line of the table, a graph of the ISMs for different distortion is plotted, and where, in each column, a different transformation is used to achieve the correct image alignment. The graphs show how many images are aligned with an error lower than a certain amount and therefore give insight in the distribution of the ISM error for the image data set. Figure 5 shows that the order of the ISMs for their relative performance would remain approximately the same no matter what error threshold (ξ ) is selected. For all experiments (with the exception of the combination of rotation and contrast inhomogeneities), AJM is between CC and MI with regard to the overall number of correct alignments and sometimes even between the two different implementations of MI. This was, as might be assumed, from its theoretical properties. http://jivp.eurasipjournals.com/content/2013/1/24 As expected, the experiments showed that no ISM could compete with the performance of MI for large non-linear intensity distortions. However, the use of higher order moments was helpful since AJM performs better than CC for non-linear intensity distortion. From the graphs, it is also clear that in a noiseless environment, MI 256 performs better than MI 8 , but in a noisy environment, MI 256 is not such a good choice.

Accuracy test
This test is used to evaluate the accuracy of localizing the exact position of the maximum of the ISM. The image pairs for the experiments are the original images that form the test set and the degraded images, both of which are explained in Section 5. The degraded images are constructed by applying all three degradation models, where first additive noise is added, then contrast degradation is simulated, and finally, non-linear intensity distortion is done on the pixel value. For degradation of each image, the parameters are selected randomly within the value range listed in the Table 2.
As can be noticed in Table 2, the range of parameters is a bit larger in this test than in the previous one. Thus, higher contrast inhomogeneity is allowed as well as lower noise; similarly, non-linear image distortion is implemented as an nth order polynomial, where n can range from 2 to 6. Notice that, according to the robustness test, the increase of the range of these values will not go in favour of AJM.
The alignment error is measured separately for translation, scaling and rotation and the results are summarizes in Table 3. First column shows the average absolute error calculated for translation, and the average absolute errors for rotation and scaling are given in second and third columns, respectively.
The error for translation and rotation is given in pixel and degrees, and the error for scaling is a unitless value that is calculated as follows: where D i is the scaling factor (deformation amount) for which the maximal ISM value is achieved. Index i stands for the measurement (image) number, and N denotes the total number of measurements (images). This is done so  that the scaling error is symmetrical, i.e. it gives the same error for squeezing and stretching the image by the same factor. Also, it gives no error if the images are scaled by the same factor.

Speed test
Since CC, MI and AJM are implemented as defined in Equations 1, 2 and 6, respectively, we can notice that the computational complexity of all three measures is O (N). However, it is expected that CC will work faster than AJM since it has to calculate only a product and a ratio instead of a product of exponential functions. Similarly, we can expect that AJM is faster than MI since it does not requires the histogram formation. To evaluate this, we measured the execution time of all three algorithms. For this purpose, the Matlab profiler was used. The algorithm was implemented on a standard quadcore PC without parallelization. The computation time required for one evaluation of the similarity measure varies linearly with the number of samples in the overlapping region between the two images. The results are shown in Table 4. The speed performance evaluation is calculated as the average from 1,000 function calls for an image size of 512 × 512.

Overall comparison
At the beginning of the paper (Section 3), some examples were given, where CC, MI 8 and MI 256 are unable to find the exact alignment. To investigate how AJM performs in a similar setting, the same experiments were repeated using AJM. The results are presented in Figure 6, showing the performance of AJM alongside CC, MI 8 , or MI 256 .   As can be noticed, in all three cases, AJM outperforms the other ISMs. To check that this is due to the nature of this particular image, we ran this experiment on all images from the data set. To recall, both images were created from the same original image by simple noise addition, and we tried to align them using translation, rotation and scaling. In each experiment, we counted the number of cases where AJM outperforms the other ISMs. The results are given in Table 5.

Conclusions
The experiments show that AJM is robust to noise, fairly robust to contrast inhomogeneities, more robust than CC and less robust than MI for non-linear intensity distortion. The robustness test also showed that MI 256 in not very robust to noise. Overall, we can say that AJM is positioned between CC and MI in terms of robustness.
The accuracy test shows that AJM is less accurate than MI 8 and overall better than MI 256 . This difference The evaluation was done with respect to different geometric transformations. Image pairs were altered using a noise degradation model. Since the total number of images in the data set is 100, the numbers in the table also reflect the percentage of image pairs for which AJM outperforms other ISMs.
between different MI implementations also emphasizes how MI is affected by the number of bins of the histogram and the interpolation techniques, while AJM is not. Both MI and AJM outperform CC significantly, primarily due to the fact that CC cannot cope with non-linear intensity distortion. An additional strength of AJM is that it does not require a statistically significant number of pixels for the calculation of entropy, so it can be calculated for a smaller region compared to MI. Although all three ISMs have similar complexity, AJM is a faster method than MI since it does not require histogram calculations (nor estimations), but it is still slower than CC. The comparison is qualitatively summarized in Table 6.
As a general conclusion, we can say that the experiments have shown that the proposed ISM is able to determine the correspondence among images with complex relationships between the pixel values and is computationally more efficient and does not have some of the inherent disadvantages of MI. Therefore AJM i , from the AJM framework for ISM construction, is complementary to CC and histogram-based approximation of MI, and the specific application will guide the optimal ISM choice.