Skip to main content

Multi-level contrast filtering in image difference metrics

Abstract

In this paper, we present a new metric to estimate the perceived difference in contrast between an original image and a reproduction. This metric, named weighted-level framework Δ E E (WLF-DEE), implements a multilevel filtering based on the difference of Gaussians model proposed by Tadmor and Tolhurst (2000) and the new Euclidean color difference formula in log-compressed OSA-UCS space proposed by Oleari et al. (2009). Extensive tests and analysis are presented on four different categories belonging to the well-known Tampere Image Database and on two databases developed at our institution, providing different distortions directly related to color and contrast. Comparisons in performance with other state-of-the-art metrics are also pointed out. Results promote WLF-DEE as a new stable metric for estimating the perceived magnitude of contrast between an original and a reproduction.

1 Introduction

The quality of image reproduction depends on many factors that cannot be completely distinguished since they are mutually dependent. It is generally considered that the quality of color in an image reproduction may be quantified by measuring how accurate the reproduction of color is to the original when viewed by a human observer. Color discrimination is determined by many factors, including the spatial pattern of the image and the visual processing, which starts with cone activation and is followed by adaptation to the illumination. Images are constituted by spatial color patterns, which are different from the uniform color patches considered in the colorant factories. Color discrimination and appearance in images are a function of spatial pattern, and color differences are harder to detect at higher frequencies, where contrast plays an important role. Moreover, color opponencies have a role in color discrimination, especially differences along the blue-yellow color direction [1]. As a consequence, for measuring the perceptual difference between an original image and its reproduction, a perceptual image difference metric is needed. Generally, this metric is obtained by extending the color difference formulae to the color of a complex image. The history of the color metrics is more than one century long.

Over the years, many image difference metrics have been proposed [2], some for measuring general image quality and image difference and some for detecting specific distortions. However, at the moment, no universal image difference metric exists. Image difference metrics are based on a number of different ideas but are usually following a general framework. The image and its reproduction are first transformed into a suitable color space, preferably a perceptually uniform one. Then, a simulation of the human visual system (HVS) is carried out, from simplistic methods as smoothing of the image by a local neighborhood to more complex methods, e.g. using contrast sensitivity functions (CSFs). Finally, the difference is calculated usually by a color difference formula. Thus, an efficient image difference metric needs a filter which is able to mimic the HVS and a suitable color difference formula. This work will mainly focus on the filtering side, particularly taking into account that contrast is an important image attribute playing an important role in image quality discrimination [3]. However, recent improvements of color difference formula and image processing methods will be considered for the development of our metrics.

After introducing the improvements of color difference formulae in Section 2 and image difference metrics in Section 3, we present our proposal in Section 4. We propose a new image difference metrics based on multi-level contrast filtering using the difference of Gaussians (DOG) model proposed by Tadmor and Tolhurst [4]. This metric will use also the new Euclidean color difference formula in log-compressed OSA-UCS color space proposed by Oleari et al. [5] instead of the traditional Δ E ab that most of the state-of-the-art metrics generally use. Section 5 presents the description of three state-of-the-art databases used to evaluate the performance of the new metric with the experimental results and discusses how our proposal reflects perceptual image difference estimation performance. Finally, in Section 6 conclusions are drawn.

2 Color difference metrics

The first studies on color difference metrics date back to the end of the nineteenth century [6]. In color metrics history, a milestone is represented by the CIELAB color system [7, 8] proposed by the International Commission on Illumination (CIE) in 1976 as a uniform color space, in which the color difference between two colors is represented by the Euclidean distance Δ E ab .

Since 1976, the main effort was to improve the CIELAB color difference formula, and almost all the proposals are based on the CIELAB space. Then, the structure of such a space has strongly conditioned the shape and the quality of almost all other proposals. The empirical color difference data used for evaluating and improving the color difference formulae were the supra-threshold Bradford University (BFD) ellipses [9], here represented on the a 10 b 10 coordinates of the CIELAB space (Figure 1).

Figure 1
figure 1

BFD ellipses represented on the a 10 b 10 . The ellipses show only approximate regularity, with exclusion of the blue-magenta hues, where the ellipses are distorted. The CIEDE2000 formula is particularly complicated for regularizing such a distortion.

In 1984, the British ‘Colour-Measurement Committee’ (CMC) of the ‘Society of Dyers and Colourist - UK’ recommended a color difference formula, Δ E CMC, that has been integrated into some ISO standards [10]. The Δ E CMC formula introduces a local metric tensor in the CIELAB space with the consequence that the color tolerances in the CIELAB space are represented by ellipsoids with semi-axis lengths depending on the point in the space and with one axis oriented as the lightness, one as the chroma and one as the hue, i.e. as a function of the differences of Δ L, Δ C and Δ H. Moreover, the weighting factors of the formula are hue-dependent. The CMC formula has acceptance in industrial color control applications.

In 1987, Luo and Rigg [11, 12] proposed the BFD color difference formula providing a correction of the CMC formula in the blue region. In 1994, CIE proposed the non-Euclidean formula Δ E 94 [13] by introducing a local metric tensor in the CIELAB space in analogy with the CMC formula. In 2000, CIE proposed its last color difference formula, named CIEDE2000 and denoted by Δ E 00 [14], which is based on an enlarged dataset of empirical color differences, known as the COM dataset [14], which is added to the original BFD color difference data. This formula is only apparently based on the CIELAB space because a coordinate transformation is made in order to correct hue distortions typical of this space. The CIEDE2000 formula represents distorted ellipsoids. Although enlarged, the empirical dataset shows an evident noise, and the data related to different laboratories are not completely in agreement. The quality of these datasets combined with the large number of parameters used in their fitting induces us to suppose a risk of over-fitting.

These color difference formulae have been also applied to natural images, and evaluation of some of them for the measurement of color image reproduction quality can be found in [1518].

In 2009, a color difference formula for small-medium color differences not based on CIELAB was published. This formula, termed Δ E E , is Euclidean [5]. The empirical color difference data plotted in the OSA-UCS space show a higher regularity than what has been seen before (Figures 2 and 3), inducing the authors to state a hue-independence hypothesis. On the basis on this hypothesis, the OSA-UCS space has been chroma log-compressed, and in this new space, the color difference formula assumes a Euclidean shape.

Figure 2
figure 2

BFD ellipses on the coordinates JG of the OSA-UCS space. The ellipses show a general regularity that suggests the hue-independence hypothesis and consequently a simple ellipsoidal color difference formula [5, 21], which is based on all the existing data (COM dataset [14]). The next step towards the Euclidean formula [21] is the logarithmic compression of the chroma (Figure 3).

Figure 3
figure 3

OSA-UCS constant lightness plane with log-compression of the chroma (cfr. Figure 2). The BFD ellipses are compared with circles with unitary radius, showing the goodness of the Euclidean color difference formula.

Since this formula is used in this work, let us recall the formula completely from Oleari et al. [5]. The transformation from the tristimulus space (X 10,Y 10,Z 10), where Y 10 is the percentage luminance factor, and the OSA-UCS space (L OSA,G,J) is the following: the lightness is the same as defined by the OSA-UCS committee:

L OSA 5.9 Y 0 1 / 3 - 2 3 + 0 . 042 ( Y 0 - 30 ) 1 / 3 - 14 . 4 1 2 ,
(1)

with

Y 0 Y 10 ( 4 . 4934 x 10 2 + 4 . 3034 y 10 2 - 4 . 276 x 10 y 10 - 1 . 3744 x 10 - 2 . 5643 y 10 + 1 . 8103 ) ,
(2)

the tristimulus values in the main reference frame [19, 20] are

A B C = 0 . 6597 0 . 4492 - 0 . 1089 - 0 . 3053 1 . 2126 0 . 0927 - 0 . 0374 0 . 4795 0 . 5579 X Y Z
(3)

(this is an active transformation that takes into account the adaptation to the visual situation used to define the OSA-UCS system), and the OSA-UCS coordinates (g,j) obtained by computation and denoted by (g,j) are

J G = 2 ( 0 . 5735 L OSA + 7 . 0892 ) 0 0 - 2 ( 0 . 7640 L OSA + 9 . 2521 ) 0 . 1792 0 . 9837 0 . 9482 - 0 . 3175 ln A / B 0 . 9366 ln B / C 0 . 9807
(4)
h=arctan - J G and the chroma C OSA = G 2 + J 2 .
(5)

Then, the logarithmic compression that transforms (L OSA,C OSA,h) into (L E ,G E ,J E ) is

L E = 1 b L ln 1 + b L a L ( 10 L OSA ) with
(6a)
a L = 2 . 890 ,
(6b)
b L = 0 . 015 ,
(6c)
G E = - C E cos ( h ) , J E = C E sin ( h ) with
(7a)
C E = 1 b C ln 1 + b C a C ( 10 C OSA ) ,
(8a)
a C = 1 . 256 ,
(8b)
b C = 0 . 050 .
(8c)

Finally, the Euclidean color difference formula is

Δ E E = ( Δ L E ) 2 + ( Δ G E ) 2 + ( Δ J E ) 2 .
(9)

This formula is as equally good as CIEDE2000 in the prediction of many available empirical datasets, but with higher simplicity [21]. The transformation between the tristimulus space and the OSA-UCS psychometric space is simple and shows a clear relationship with visual processing. No hue distortion, as in CIELAB, exists. The chroma log compression is required in the passage from large to small color differences. Because the image quality is evaluated in the visual situation of the small-medium color differences, we think this formula is a good candidate for the color image metrics [22, 23].

3 Image difference metrics

Nowadays, more than 100 image difference metrics are available in the literature [2], some for general image quality and image difference and some for specific distortion detection, and it would be impossible to report all of them. For this reason, in this paper we discuss a small selection of important image difference metrics, focusing on those based on contrast filtering or which have been shown to have high correlation with observer perceived difference [2, 24]. In this work, we will consider only full-reference image difference metrics, where the original and the reproduction are both available.

In 1997, Zhang and Wandell [25] proposed a spatial extension to the CIELAB color metric. This metric, named Spatial-CIELAB (S-CIELAB), should fulfill two goals: a spatial filtering to simulate the blurring of the HVS and a consistency with the basic CIELAB calculation for large uniform areas. The image is separated into an opponent color space, and each opponent color image is spatially convolved with a kernel defined on the spatial visual sensitivity for the color opponencies. Finally, the filtered image is transformed into the CIELAB system, andΔ E ab is used for calculation of the color differences in each pixel; these are averaged to obtain a single quality value for the whole image.

The structural similarity (SSIM) index, proposed by Wang et al. in 2004 [26], attempts to quantify the visible difference between a distorted image and a reference image. This index is based on the UIQ index of Wang and Bovik [27]. The algorithm defines the structural information in an image as those attributes that represent the structure of the objects in the scene, independent of the average luminance and contrast. The index is based on a combination of luminance, contrast and structure comparison. The comparisons are done for local windows in the image, and the overall image quality is the mean of all these local windows.

In 2006, Egiazarian et al. [28] proposed the peak signal-to-noise ratio (PSNR)-HVS metric based on the HVS and PSNR, which is a measure of the peak error between the original and the reproduction. The metric uses a scanning window to remove mean shift and contrast stretching similar to UIQ [27]. PSNR-HVS is then calculated on the scanned images by using PSNR, where MSE is calculated as described by Nill [29]. An extension of this metric that incorporates contrast masking, PSNR-HVS-M, was given by Ponomarenko et al. [30].

In 2007, Chandler and Hemami [31] proposed a new metric, termed visual signal-to-noise ratio (VSNR), based on near-threshold and supra-threshold properties of the human visual system, incorporating both low-level and mid-level features. Low-level features are related to contrast sensitivity and mid-level features to global precedence, which states that the HVS visually integrates image edges in a coarse-to-fine-scale (global-to-local) fashion. The metric consists of two stages: First, contrast thresholds are used to detect visible distortions in the image, which is done in the wavelet domain by computing the contrast signal-to-noise ratio (CSNR). Then, the contrast detection threshold is computed based on the CSNR, which is done for each octave band. The contrast is then compared with the detection threshold, and if above, the distortion is assumed supra-threshold (visible). In this case, a second stage is carried out, where a model of global precedence is proposed to account for mid-level properties of the HVS. The global precedence takes into account that contrasts of distortions should be proportioned across spatial frequency. The final metric is computed as the combination of perceived contrast of the distortion and disruption of global precedence. The VSNR is an interesting metric since it is based on contrast thresholds and does not modulate the entire image as the CSFs in, for example, S-CIELAB.

Spatial-DEE (S-DEE) developed in 2009 is the first metric using the Euclidean color difference formula in log-compressed OSA-UCS space (see Section 2), and it has been proposed by Simone et al. [32]. This metric is based on the modified S-CIELAB workflow developed by Johnson and Fairchild [33] in 2001. The metric is obtained by substitutingΔ E ab with Δ E E in the last step of the workflow.

4 From contrast to image difference: WLF-DEE

The history of contrast is over a century long, and a lot of effort has been concerned on developing accurate contrast measures which are able to predict observer perceived contrast in natural images [34]. Furthermore, recent studies have shown that contrast is an important image quality attribute that falls under the umbrella of image quality [3].

In this work, we will propose a new image difference metric based on the work on contrast from Tadmor and Tolhurst [4] with two key features:

  1. 1.

    It uses a multi-level approach (or the so-called pyramidal image structure) as first suggested by Frankle and McCann in 1983 [35] and then Adelson et al. in 1984 [36] in order to perform a full investigation of all the frequencies in the image.

  2. 2.

    It uses the new Euclidean color difference formula in log-compressed OSA-UCS space proposed by Oleari et al. [5] and described in detail in Section 2.

In 2000, Tadmor and Tolhurst [4] based their analysis of contrast on the DOG model, which is modified and adapted to natural images. Since this model is used in this work, let us recall it completely:

In the conventional model, the spatial sensitivity in the center of receptive fields is described by a bi-dimensional Gaussian function with unit amplitude:

Center(x,y)=exp - x r c 2 - y r c 2 ,
(10)

where the radius r c represents the distance at which the sensitivity decreases to 1/e and (x,y) are the spatial coordinates of the receptive field. The surround component is represented by another Gaussian curve, with a larger radius, r s :

Surround(x,y)=ρ r c r s 2 exp - x r s 2 - y r s 2 ,
(11)

where the scaling factor ρ = 0.85 sets the integrated sensitivity of the surround component to be 85% of that of the center. This scaling factor used by Tadmor and Tolhurst [4] is representative of the values reported for retinal ganglion cells and lateral geniculate nucleus (LGN) neurons of cat and monkey [37, 38]. When the central point of the receptive field is placed at the location (x,y), the output of the central component is calculated as

R c (x,y)= i = x - 3 r c i = x + 3 r c j = y - 3 r c j = y + 3 r c Center(i-x,j-y)I(i,j),
(12)

while the output of the surround component is

R s (x,y)= i = x - 3 r s i = x + 3 r s j = y - 3 r s j = y + 3 r s Surround(i-x,j-y)I(i,j),
(13)

where in both cases I(i,j) is the image pixel value at position (i,j).

The result of the DOG model is given by

DOG(x,y)= R c (x,y)- R s (x,y).
(14)

The conventional DOG model [39] assumes that the response of a neuron depends uniquely on the local luminance difference (Δ I) between the center and the surround. After the light adaptation process, the gains of the ganglion cells of the retina and the neurons of the LGN depend on the average local luminance I. Thus, the model response depends on the contrast stimulus. After testing three different outputs for contrast

C TT ( x , y ) = R c ( x , y ) - R s ( x , y ) R c ( x , y ) (center-only scheme) ,
(15a)
C TT ( x , y ) = R c ( x , y ) - R s ( x , y ) R s ( x , y ) (surround-only scheme) ,
(15b)
C TT ( x , y ) = R c ( x , y ) - R s ( x , y ) R c ( x , y ) + R s ( x , y ) (center-plus-surround scheme) ,
(15c)

they propose the criterion in Equation 15c for the contrast measuring, which is similar with the Michelson definition of grating contrast [40].

In this approach, a new metric is proposed and referred to as weighted-level framework Δ E E (WLF-DEE). This metric is analogous to the S-CIELAB approach, but the original S-CIELAB spatial filtering is replaced with a multi-level DOG calculation, whileΔ E ab is replaced with the Δ E E color difference formula. The general workflow of the metric is the following: The images are subsampled to various levels. The undersampling is simple since the images are reduced to half, and antialiasing filtering avoids artifacts at low resolutions. A pixelwise neighborhood contrast calculation is executed at each level by using the DOG on the three channels separately. Thus, local contrast maps for each level and each channel are obtained. An example is shown in Figure 4. Local contrast differences are computed by using the Δ E E , described in Section 2. A weighted recombination of the local contrast maps is computed and represented by a global image difference metric. Once local contrast maps are generated for each level, how to reduce the concept of contrast from local values at each pixel location to a single number representing the global image difference is still subject to debate. The simplest strategy is taking the mean of each level and averaging all together. This new metric performs a weighted recombination of the levels, given by the following equation, which leads to the final global image difference:

WLF-DEE= 1 N l l = 1 N l λ l · c ¯ l ,
(16)
Figure 4
figure 4

Original and local contrast maps generated by WLF-DEE with r c =2 and r s =4 . Here the first four filtered levels are shown. The total number of levels is image size dependent. The brightness of all images have been increased to make them suitable for a printable version of the article.

where N l is the number of levels, c ¯ l is the mean contrast in the level l, and λ l is the weight assigned to each level l. So with the final measure, WLF-DEE aims at predicting the perceived magnitude of contrast between an original and a reproduction. This value can be used also as a quality indicator as contrast is one of the main quality attributes [3].

4.1 WLF-DEE characteristics

4.1.1 Parameter tuning

As we can see from the previous Section, the DOG model is affected by the radius of the center component r c , the radius of the surround component r s and the scaling factor of the surround component ρ, in addition to the three different schemes described in Equation 15. With the weighting parameters λ, WLF-DEE is subject to a total of four parameters. Since this metric comes out as a direct application of the research on perceptual contrast of Simone et al. [34] and performing the calculation of all possible values for these four parameters would be computationally exhausting, we have followed their rules of thumb for the choice. Table 1 reports our selection of tested configurations.

Table 1 Selected values of parameters for WLF-DEE

We have chosen four different configurations of r c and r s , with the particular choice of r c  = 2 and r s  = 4, which can be considered as the most effective in the identification of edges and blocks in relation to contrast [34]. Using larger radiuses would reduce the identification of local contrast. For the scaling factor of the surround component, in addition to the value ρ = 0.85 suggested by Tadmor and Tolhurst [4], we have chosen ρ = 1.0 justified by the fact that when the central and surround components are placed on a completely uniform area, the DOG model would give an output of contrast even though no contrast would be perceived by an observer. For the weighting level, we have chosen uniform weighting (λ = 1), which is the simplest strategy, and the variance of the level, which means that λ assumes values taken from the image itself. Thus, λ becomes level and image dependent. This strategy has shown benefits in correlation with observer perceived contrast [34]. For the three different schemes of the DOG model proposed, we have decided to test all of them. In conclusion, WLF-DEE will be tested for a total of 48 different configurations.

4.1.2 Computational complexity

The computational complexity of WLF-DEE follows other state-of-the-art image difference metrics using multi-level approach, which is

Θ N log N ,
(17)

where N is the number of pixels in the image.

5 Tests and results

5.1 Databases

Many different databases have been used for evaluating the image difference metrics, but in order to extensively test WLF-DEE, we have chosen three databases: the public Tampere Image Database 2008 (TID2008) and two databases developed at ‘Norwegian Colour and Visual Computing Laboratory’.

The first database, the TID2008 database [24], contains a total of 1,700 images, with 25 reference images (Figure 5) and 17 types of distortions over four distortion levels. The mean opinion scores are the results of 654 observers attending the experiments. No viewing distance is stated in the TID database; therefore, we have used a standard viewing distance of 50 cm for the metrics requiring this setting. The authors have decided to include the images where distortions provide directly or indirectly a change in contrast, narrowing the TID2008 to a total of 400 images equally divided in the following four categories: masked noise, quantization noise, denoising and contrast change.

Figure 5
figure 5

TID2008 database [24]. The 25 reference images.

The second database, proposed by Pedersen et al. [41], contains four original images (Figure 6), three portraits and one illustration. The originals were altered in lightness, where each image had four versions with global lightness differences and four versions with local lightness changes. The lightness changes were 3 and 5Δ E ab . Four versions were brighter than the original, and four darker, for a total of 32 modified images. The psychophysical experiment was done on a calibrated CRT monitor, LaCie electron 22 blue II (LaCie, Basel, Switzerland), in a grey room. The observers were seated approximately 80 cm from the screen. The light was measured to approximately 17 lux in front of the monitor. A total of 25 observers were recruited for the experiment, and they were asked in a pairwise comparison experiment to choose the image most similar to the original. This database is particularly of our interest because contrast is directly related to change in luminance [40], which is related to lightness [7, 8].

Figure 6
figure 6

Pedersen database [41]. The four reference images changed globally and locally in lightness.

The third database from Ajagamelle [42] contains a total of 10 original images covering a wide range of characteristics and scenes Figure 7. The images were modified using Adobe Photoshop software on a global scale with separate and simultaneous variations of contrast, lightness and saturation, resulting in a total number of 80 test images. The experiment was carried out as a category-judgment experiment with 14 observers. Each pair of images was displayed on an Eizo ColorEdge CG241W digital LCD display (Eizo Corporation, Ishikawa, Japan). The monitor was calibrated and profiled using GretagMacbeth Eye-One Match 3. The settings on the monitor were sRGB with a resolution of 1,600 × 1,200 pixels. The experiment took place in a windowless room with neutral grey walls, ceiling and floor. The ceiling lights in the room was set to provide a level of ambient illumination around 40 lux, which is below the upper threshold of 64 recommended by the CIE [43]. The white point was set to the D65, the gamma to 2.2 and the luminance level to 80 cd/m2. The display was placed at a viewing distance of 70 cm. The images presented were 750 × 499 pixels or 499 × 750 pixels, which subtended roughly 20° of the visual angle when viewed at this distance.

Figure 7
figure 7

Ajagamelle database [42]. Nine images were captured and provided by an independent photographer, and one image was selected from a standard natural image set provided by the CIE [44].

5.2 Performance measures

Two types of correlation coefficients are computed [45] in order to evaluate the performance of WLF-DEE:

  1. 1.

    The Pearson product-moment. It assumes that the variables are ordinal, and it evaluates the linear relationship between two variables. This is a performance measure relating to the prediction accuracy of the metric [46].

  2. 2.

    Spearman rank. It is a non-parametric measure of correlation, and it is used as a measure of linear relationship between two sets of ranked data, instead of the actual values. This describes the relationship between variables with no assumptions on the frequency distribution of the variables and on how tightly the ranked data clusters are around a straight line. This is a performance measure relating to the prediction monotonicity of the metric [46].

The relationships between the metrics and the observers are not necessarily linear. In order to remove any non-linearities due to the subjective experimental process and to facilitate comparison of the metrics in a common analysis space, we investigate the relationship between the metrics and observers by using non-linear regression [46]. In this work, we apply the same mapping function as that of Sheikh et al. [47]:

f(x)= θ 1 1 2 - 1 1 + e θ 2 ( x - θ 3 ) + θ 4 X+ θ 5 ,
(18)

where θ i , i = 1,2,3,4,5, are parameters to be be fitted. The 95% confidence intervals for the correlation values are calculated using Fisher’s Z transformation as described by the Video Quality Expert Group [48].

In order to have a complete analysis the following coefficients are also presented:

  • Root-mean-square error (RMSE)[48]. It is a measure of the differences between the values predicted by the metric and the scores actually given by the observers.

  • Significance of the difference between the Pearson correlation coefficients (t-value)[48]. This measure assumes that a good fit for observers’ quality score is given by the normal distribution. It uses the H 0hypothesis that assumes that there is no significant difference between correlation coefficients and the H 1hypothesis, which considers that the difference is significant, although not specifying better or worse.

5.3 Results

As mentioned in Section 4.1.1, WLF-DEE has been tested for a total of 48 configurations, but in order to give a more readable and understandable presentation of the results, we will present only a selection of them. As WLF-DEE using the two DOG schemes in Equations 15a and 15b have lower performance in correlation with respect to WLF-DEE using the DOG scheme in Equation 15(c), these results will be excluded. This confirms also the statement of Tadmor and Tolhurst that the DOG model in analogy with the Michelson formula has better performance for contrast assessment [4]. On the same way, all configurations with ρ = 0.85 in Equation 13 will be excluded, as they show lower performance in correlation with respect to those configurations with ρ = 1.00. This will end in a presentation of a total of only eight results of WLF-DEE shown in Table 2 for Pearson correlation, Table 3 for Spearman correlation and Table 4 for RMSE. Significance of the difference between the Pearson correlation coefficients are presented for each database in Tables 5, 6, 7, 8, 9 and 10.

Table 2 Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on all databases
Table 3 Spearman correlation for WLF-DEE and the selected state-of-the-art metrics on all databases
Table 4 RMSE for WLF-DEE and the selected state-of-the-art metrics on all databases
Table 5 Significance of the difference between Pearson correlation coefficients on TID-masked noise database
Table 6 Significance of the difference between Pearson correlation coefficients on TID-quantization noise database
Table 7 Significance of the difference between Pearson correlation coefficients on TID-image denoising database
Table 8 Significance of the difference between Pearson correlation coefficients on TID-contrast change database
Table 9 Significance of the difference between Pearson correlation coefficients on Ajagamelle database
Table 10 Significance of the difference between Pearson correlation coefficients on Pedersen database

Considering the TID database, SSIM has the higher Pearson correlation in all four categories. In the masked noise category, SSIM is followed by WLF-DEE K, while in the quantization noise, it is followed by S-CIELAB and then WLF-DEE K. In the category denoising and contrast change, SSIM instead is followed by WLF-DEE I and then WLF-DEE K. For all the four categories, it is possible to notice that all the metrics give higher correlation with perceived observer difference using the logistic fitting. Furthermore, as the confidence intervals (Figures 8, 9, 10 and 11) of WLF-DEE K overlap with the confidence intervals of SSIM, we can claim to have the same performance. Overall, for the four categories of the TID database, WLF-DEE K shows to be significantly better than PSNR and VSNR and to have the same performance with SSIM and S-CIELAB.

Figure 8
figure 8

Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on TID-masked noise database. Results using linear fitting on the left and logistic fitting on the right.

Figure 9
figure 9

Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on TID-quantization noise database. Results using linear fitting on the left and logistic fitting on the right.

Figure 10
figure 10

Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on TID-image denoising database. Results using linear fitting on the left and logistic fitting on the right.

Figure 11
figure 11

Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on TID-contrast change database. Results using linear fitting on the left and logistic fitting on the right.

For the Ajagamelle database, PSNR shows the higher Pearson correlation, followed by PSNR-HVS-M, SCIELAB, SSIM and then four configurations (I, J, K, L) of WLF-DEE, which have very close results. Since in this case, the confidence intervals (Figure 12) of WLF-DEE K overlaps with the confidence interval of those metrics with slightly higher correlation, we can claim to have the same performance. For the Pedersen database, S-CIELAB shows the higher Pearson correlation, followed by PSNR-HVS-M, PSNR, four configurations (I, J, K, L) of WLF-DEE with very close results and then SSIM. In this database instead, confidence intervals (Figure 13) show that WLF-DEE-K has a slightly lower performance than S-CIELAB, but not with respect to SSIM. Also for these two databases, it holds true that all the metrics give higher correlation using logistic fitting.

Figure 12
figure 12

Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on Ajagamelle database. Results using linear fitting on the left and logistic fitting on the right.

Figure 13
figure 13

Pearson correlation for WLF-DEE and the selected state-of-the-art metrics on Pedersen database. Results using linear fitting on the left and logistic fitting on the right.

Considering all the six database sets examined, WLF-DEE gives higher correlation using configurations I, J, K, L with respect to configurations M, N, O, P, and in particular, WLF-DEE K most agrees with observer perceived difference, indicating that large radiuses of the Gaussians and uniform weighting of the levels should be used for the estimation of perceived difference. Furthermore, it is possible to notice that WLF-DEE K with logistic fitting shows a stable trend among the six datasets having an average performance in correlation of 0.65. This holds true also for other tested metrics such as S-CIELAB and PSNR-HVS-M, but not for SSIM and VSNR which show very high correlation in one dataset and very low in an another one.

Analysis with the Spearman correlation follows the same discussion with the Pearson correlation except for the Ajagamelle database, where the highest correlation is shown by S-CIELAB, but not outperforming most of the other metrics. The results are presented only with linear fitting as no improvements can be found in any metric using the logistic fitting. Also with the Spearman correlation, WLF-DEE K shows its stability with an average performance of 0.59.

Analysis with root-mean-square error shows that for all the four categories of the TID database, SSIM has the lowest RMSE. As the confidence intervals of SSIM overlap with two configurations of WLF-DEE-C (J, K) (Figures 14, 15, 16 and 17), it cannot be claimed that the two metrics are significantly different in performance.

Figure 14
figure 14

RMSE for WLF-DEE and the selected state-of-the-art metrics on TID-masked noise database.

Figure 15
figure 15

RMSE for WLF-DEE and the selected state-of-the-art metrics on TID-quantization noise database.

Figure 16
figure 16

RMSE for WLF-DEE and the selected state-of-the-art metrics on TID-image denoising database.

Figure 17
figure 17

RMSE for WLF-DEE and the selected state-of-the-art metrics on TID-contrast change database.

For Ajagamelle database instead, WLF-DEE-C N shows the lowest RMSE, followed by S-CIELAB and then by other several configurations of WLF-DEE-C (M, I, J, K, L) and VSNR. Confidence intervals (Figure 18) shows that these three metrics are not significantly different among each other but they outperform other tested metrics such as SSIM, PSNR and PSNR-HVS-M. For Pedersen database, S-CIELAB shows the lowest RMSE followed by PSNR, PSNR-HVS-M and then all the configurations of WLF-DEE-C. SSIM and VSNR have the highest RMSE. Confidence intervals (Figure 19) show that WLF- DEE-C (I, J, K, L, P) has no difference in performance with the other tested metrics but the overlap with S-CIELAB confidence interval is minimal.

Figure 18
figure 18

RMSE for WLF-DEE and the selected state-of-the-art metrics on Ajagamelle database.

Figure 19
figure 19

RMSE for WLF-DEE and the selected state-of-the-art metrics on Pedersen database.

Analysis with significance of the difference are presented with 5% significance level for Pearson correlation with logistic fitting only. Based on the definition in [48], two metrics can be significantly different if -1.96 < t-value < 1.96. This analysis confirms that WLF-DEE-C K is not significantly different in performance with respect to the other tested metrics for TID-masked noise and TID-image denoising databases. For TID-quantization noise instead, WLF-DEE-C K is not significantly difference in performance with respect to SSIM and S-CIELAB. For TID-contrast change WLF-DEE-C K is significantly difference in performance only with respect to VSNR. For Ajagamelle database, WLF-DEE-C K is not significantly different in performance only from SSIM while for Pedersen database is not significantly different in performance from the other tested metrics except S-CIELAB.

Overall, WLF-DEE-C K shows its particular strength on those databases where a change in contrast between the original and its reproduction is triggered by a change of color attributes and not particular distortions. In conclusion, WLF-DEE-C K promotes itself as a new metric for predicting the perceived magnitude of contrast between an original and a reproduction, fulfilling the purpose for which it was developed.

6 Conclusions

Recent studies have shown that contrast is an important image attribute that falls under the umbrella of image quality [3]. In this paper, we have developed a new metric based on recent work on contrast filtering. This metric called WLF-DEE consists of two key features: a multi-level filtering based on the work on contrast from Tadmor and Tolhurst [4] and the new Euclidean color difference formula in log-compressed OSA-UCS space proposed by Oleari et al. [5].

Extensive tests and analysis are carried out on four different categories of the well-known TID database and on two databases providing different distortions directly related to color and contrast. Comparisons with other state-of-the-art metrics are also presented. Results promote WLF-DEE as a new efficient metric for estimating the perceived magnitude of contrast between an original and a reproduction.

References

  1. Poirson A, Wandell B: Appearance of colored patterns: pattern-color separability. J. Opt. Soc. Am. A 1993, 10(12):2458-2470. 10.1364/JOSAA.10.002458

    Article  Google Scholar 

  2. Pedersen M, Hardeberg JY: Survey of full-reference image quality metrics: classification and evaluation. Foundations Trends. Comput. Graph. Vis. 2012, 7: 1-80.

    Article  Google Scholar 

  3. Pedersen M, Bonnier N, Hardeberg JY, Albregtsen F: Attributes of image quality for color prints. J. Electron. Imaging 2010, 19: 011016-1–011016–13. 10.1117/1.3277145

    Article  Google Scholar 

  4. Tadmor Y, Tolhurst D: Calculating the contrasts that retinal ganglion cells and LGN neurones encounter in natural scenes. Vis. Res. 2000, 40: 3145-3157. 10.1016/S0042-6989(00)00166-8

    Article  Google Scholar 

  5. Oleari C, Melgosa M, Huertas R: Euclidean color-difference formula for small-medium color differences in log-compressed OSA-UCS space. J. Opt. Soc. Am. A 2009, 26: 121-134.

    Article  Google Scholar 

  6. Von Helmholtz H: Versuch das psychophysische Gesetz auf die Farbenunterschiede trichromatischer Augen anzuwenden. Z Psychol Physiol Sinnesorgane 1892, 3: 1-20.

    Google Scholar 

  7. CIE: Colorimetry. Tech. Rep. 15. (CIE, Vienna, 2004)

  8. Colorimetry – Part 4: CIE 1976 L*a*b* Colour space, ISO 11664-4:2008 (CIE S 014-4/E:2007) CIE, Vienna, 2006

  9. Luo MR, Rigg B: Chromaticity-discrimination ellipses for surface colours. Color Res. Appli 1986, 11: 25-42. 10.1002/col.5080110107

    Article  Google Scholar 

  10. Clarke FJJ, McDonald R, Rigg B: Modification to the JPC79 color-difference formula. J. Soc. Dyers Colorists 1984, 100: 128-132.

    Article  Google Scholar 

  11. Luo M, Rigg B: BFD(l:c) colour-difference formula: Part I - development of the formula. J. Soc. Dyers Colorists 1987, 103: 86-94.

    Article  Google Scholar 

  12. Luo MR, Rigg B: BFD(l:c) color-difference formula. Part II – performance of the formula. J. Soc. Dyers Colorists 1987, 103: 126-132.

    Article  Google Scholar 

  13. CIE: Industrial colour-difference evaluation. CIE Technical Report 116, (Central Bureau of the CIE, Vienna, Austria, 1995)

  14. Luo M, Cui G, Rigg B: The development of the CIE 2000 colour-difference formula: CIEDE2000. Color Res. Appli 2001, 26(5):340-350. 10.1002/col.1049

    Article  Google Scholar 

  15. Bando E, Hardeberg JY, Connah D: Can gamut mapping quality be predicted by color image difference formulae. In Human Vision and Electronic Imaging X, ed. B. Rogowitz, T. Pappas, S. Daly, Proc. of SPIE - IST Electronic Imaging,SPIE, vol. 5666. Bellingham: SPIE; 2005:180-191.

    Google Scholar 

  16. Hardeberg JY, Bando E, Pedersen M: Evaluating image difference metrics for gamut mapped images. Coloration Tech 2008, 124(4):243-253. 10.1111/j.1478-4408.2008.00148.x

    Article  Google Scholar 

  17. Kivinen H, Nuutinen M, Oittinen P: Comparison of colour difference methods for natural images. In CGIV 2010 – Fifth European Conference on Colour in Graphics, Imaging. Finland: Joensuu; 2010:510-515.

    Google Scholar 

  18. Hong G, Luo M: Perceptually based colour difference for complex images. In Proceedings of SPIE: 9th Congress of the International Colour Association vol. 4421. Edited by: Rodrigues A, Chung R. Bellingham: SPIE; 2002:618-621.

    Google Scholar 

  19. Oleari C: Color opponencies in the system of the uniform color scales of the Optical Society of America. J. Opt. Soc. Am. A 2004, 21: 677-682. 10.1364/JOSAA.21.000677

    Article  Google Scholar 

  20. Oleari C: Hypotheses for chromatic opponency functions and their performance on classical psychophysical data. Color Res Appli 2005, 30: 31-41. 10.1002/col.20072

    Article  Google Scholar 

  21. Huertas R, Melgosa M, Oleari C: Performance of a color-difference formula based on OSA-UCS space using small-medium color differences. J. Opt. Soc. Am. A 2006, 23(9):2077-2084. 10.1364/JOSAA.23.002077

    Article  Google Scholar 

  22. Simone G, Oleari C, Farup I: Performance of the Euclidean color-difference formula in log-compressed OSA-UCS space applied to modified image-difference metrics. In 11th Congress of the International Colour Association (AIC). Australia: Sydney; 2009:81-81.

    Google Scholar 

  23. Pant DR, Farup I: Riemannian formulation, and comparison of color difference formulas. Color Res. Appl 2012, 37: 429-440. 10.1002/col.20710

    Article  Google Scholar 

  24. Ponomarenko N, Lukin V, Egiazarian K, Astola J, Carli M, Battisti F: Color image database for evaluation of image quality metrics. In 10th International Workshop on Multimedia Signal Processing. Australia: Cairns, Queensland; 2008:403-408. . Accessed 20 Feb 2008 [http://www.ponomarenko.info/tid2008.htm] []. Accessed 20 Feb 2008

    Google Scholar 

  25. Zhang X, Wandell BA: A spatial extension of CIELAB for digital color image reproduction. SID J 1997. [21/05/05: . Accessed 20 Feb 2008 http://white.stanford.edu/~brian/scielab/] [21/05/05:]. Accessed 20 Feb 2008

    Google Scholar 

  26. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Proc 2004, 13(4):600-612. 10.1109/TIP.2003.819861

    Article  Google Scholar 

  27. Wang Z, Bovik A: A universal image quality index. IEEE Signal Proc. Lett 2002, 9: 81-84.

    Article  Google Scholar 

  28. Egiazarian K, Astola J, Ponomarenko N, Lukin V, Battisti F, Carli M: Two new full-reference quality metrics based on HVS. Proc. of the Second International Workshop on Video Proc. and Quality Metrics Scottsdale, Arizona, January 2006.

  29. Nill N: A visual model weighted cosine transform for image compression and quality assessment. IEEE Trans. Commun 1985, 33: 551-557. 10.1109/TCOM.1985.1096337

    Article  Google Scholar 

  30. Ponomarenko N, Silvestri F, Egiazarian K, Carli M, Astola J, Lukin V: On between-coefficient contrast masking of DCT basis functions. CD-ROM Proc. of the Third Int. Workshop on Video Proc. and Quality Metrics Scottsdale, Arizona, 25–26 January, 2007 4 p

  31. Chandler D, Hemami S: VSNR: A wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans. Image Proc 2007, 16(9):2284-2298.

    Article  MathSciNet  Google Scholar 

  32. Simone G, Oleari C, Farup I: An alternative color difference formula for computing image difference. Proceedings from Gjøvik Color Imaging Symposium 2009, no. 4 in Høgskolen i Gjøviks rapportserie, Gjøvik Norway, 2009, pp. 8–11

  33. Johnson GM, Fairchild MD: Darwinism of color image difference models. The 9th Color Imaging Conference: Color Science and Engineering: Systems, Technologies, Applications Scottsdale, Arizona, 2001, pp.108–112

  34. Simone G, Pedersen M, Hardeberg JY: Measuring perceptual contrast in digital images. J. Vis. Commun. Image Representation 2012, 23: 491-506. 10.1016/j.jvcir.2012.01.008

    Article  Google Scholar 

  35. Frankle J, McCann JJ: Method and apparatus for lightness imaging. United States Patent No. 4,384,336 1983.

    Google Scholar 

  36. Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM: Pyramid methods in image processing. RCA Eng 1984., 29(6):

  37. Enroth-Cugell C, Robson JG: The contrast sensitivity of retinal ganglion cells of the cat. J. Phys 1966, 187: 517552.

    Google Scholar 

  38. Sclar G, Maunsell JH, Lennie P: Coding of image contrast in central visual pathways of the macaque monkey. Vis. Res 1990, 30: 1-10. 10.1016/0042-6989(90)90123-3

    Article  Google Scholar 

  39. Rodieck R: Quantitative analysis of cat retinal ganglion cell response to visual stimuli. Vis. Res 1965, 5(12):583-601. 10.1016/0042-6989(65)90033-7

    Article  Google Scholar 

  40. Michelson A: Studies in Optics. Chicago: University of Chicago Press; 1927.

    Google Scholar 

  41. Pedersen M, Hardeberg JY: Rank order and image difference metrics. In CGIV 2008 Fourth European Conference on Color in Graphics, Imaging and Vision. Spain: IS&T: Terrassa; 2008:120-125.

    Google Scholar 

  42. Ajagamelle S: Analysis of the difference of Gaussians model in perceptual image difference metrics. Master’s thesis, Gjøvik University College and Grenoble Institute of Technology 2009.

    Google Scholar 

  43. Dijk J: In search of an objective measure for the perceptual quality of printed images. PhD thesis. Technische Universiteit Delft, 2004

    Google Scholar 

  44. 12640-3, I: ISO 12640–3:2007 graphic technology - prepress digital data exchange - Part 3: CIELAB standard colour image data CIELAB/SCID, International Organization for Standardization 2007 Accessed 20 Feb 2008 [http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=52497] [] Accessed 20 Feb 2008

  45. Kendall MG, Stuart A, Ord JK: Kendall’s Advanced Theory of Statistics: Classical Inference and Relationship, vol. 2. London: Hodder Arnold Publisher; 1991.

    Google Scholar 

  46. Video QualityExperts: Objective perceptual assessment of video quality: full reference television. Tech. rep. (International Telecommunication Union,Geneva, 2004)

  47. Sheikh HR, Sabir MF, Bovik AC: A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Proc 2006, 15(11):3440-3451.

    Article  Google Scholar 

  48. Video Quality Experts Group: Final report from the video quality experts group: validation of reduced-reference and no-reference objective models for standard definition television. Phase I. Tech. rep. (International Telecommunication Union, Geneva, 2009)

Download references

Acknowledgements

This work has been supported by NFR over the SHP project. The authors would like to thank Fritz Albregtsen (University of Oslo), Sebastien Ajagamelle, Valentina Caracciolo and Sara Díaz De Cerio for their contribution in part of the project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Simone.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Simone, G., Pedersen, M., Farup, I. et al. Multi-level contrast filtering in image difference metrics. J Image Video Proc 2013, 39 (2013). https://doi.org/10.1186/1687-5281-2013-39

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-5281-2013-39

Keywords