Skip to main content

RVSIM: a feature similarity method for full-reference image quality assessment


Image quality assessment is an important topic in the field of digital image processing. In this study, a full-reference image quality assessment method called Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM) is proposed. More precisely, a Log-Gabor filter is first used to decompose reference and distorted images, and Riesz transform is performed on the decomposed images on the basis of monogenic signal theory. Then, the monogenic signal similarity matrix is obtained by calculating the similarity of the local amplitude/phase/direction characteristics of monogenic signal. Next, we weight the summation of these characteristics with visual contrast sensitivity. Since the first-order Riesz transform cannot clearly express the corners and intersection points in the image, we calculate the gradient magnitude similarity between the reference and distorted images as a feature, which is combined with monogenic signal similarity to obtain a local quality map. Finally, we conduct the monogenic phase congruency using the Riesz transform feature matrix from the reference image and utilize it as a weighted function to derive the similarity index. Extensive experiments on five benchmark IQA databases, namely, LIVE, CSIQ, TID2008, TID2013, and Waterloo Exploration, indicate that RVSIM is a robust IQA method.

1 Introduction

Digital image is an essential factor to express and communicate information. Digital imaging has been applied in many fields, but digital image quality is inevitably reduced and affected during image collection, compression [13], transmission [4], processing [5], and reconstruction [6, 7]. The accurate assessment of image quality has also become challenging [8]. As such, image quality assessment (IQA) has been extensively investigated [911].

IQA can be divided into full-reference (FR), reduced-reference (RR), and no-reference (NR) assessments [12] based on the presence of reference images. The FR IQA methods are based on “the original image”, which is taken as the reference image. It is mainly used in assessing the similarity and fidelity between distorted image and original undistorted image [13, 14]. The RR IQA methods are considered practical when we can only get access to some extracted features instead of the whole original image [15]. We can use these provided features and give a reasonable estimation on the distorted image’s quality [16]. In some practical applications, the reference image is not available to perform a comparison against. Therefore, the NR IQA methods are needed [17]. This study focuses on FR IQA methods.

MSE and PSNR are widely used FR IQA methods. In these methods, image quality is assessed by calculating the overall pixel error, and average error is used as the final assessment result. These methods provide several advantages, such as simple calculation and easy implementation. But since the modeling is too simple, the comprehending of the image is overly superficial. The absolute error between pixels of two images is calculated, but the correlation between pixels and the perceptive characteristics of human visual system (HVS) are disregarded. Their low-level features, such as edge information, are also yet to be described. Thus, it causes serious incongruency, which is against the perceptive characteristics of HVS and is likely the cause of unrealistic conditions between assessed results and actual phenomena during quality assessment [18, 19].

Many representative assessment methods have been proposed to adapt to human visual characteristics. Wang et al. [12] established a Structural SIMilarity (SSIM) model, which is considered the most common representative based on universal image quality index (UQI) [20]. The structural information of images is applied to assess quality and SSIM index. Experiments show that SSIM is appropriate than previous assessment methods. Although SSIM improves the congruency between assessment results and HVS perception, the structural features of images remain scalar and consequently causes SSIM to lose its validity when images are highly blurred. Numerous methods, such as MS-SSIM [21], ESSIM [22], GSSIM [23], 3-SSIM [24], CW-SSIM [25], and IW-SSIM [26], have been improved on the basis of SSIM, and these methods enhance the assessment result to a certain level. Sheikh et al. [27, 28] also developed methods, such as IFC and VIF, based on natural scene statistics (NSS) to introduce the concept of information fidelity. Zhang et al. [29] proposed a Feature SIMilarity (FSIM) method that introduces phase congruency (PC) and gradient magnitude (GM) similarity as assessment features.

With in-depth research, natural images as a two-dimensional signal characterized by highly structured features must have a vector trait. The pixels of images show a strong dependency, which constitutes the structure of two-dimensional image. The main function of HVS is to obtain structural information from the field of view. Zhang et al. [30] constructed similarity matrices by using the characteristic map of first- and second-order Riesz transforms and utilized edge features as pooling function to derive the RFSIM index because of the good performance of Riesz transform in multidimensional signal processing. Luo et al. [31] introduced monogenic phase congruency (MPC) based on PC and proposed the RMFSIM method. With these methods, the structural method can be used to assess the vector characteristics of two-dimensional images more efficiently. However, these methods simply apply the Riesz transform to construct local features that partially consider the physical meaning of monogenic signal (MS) theory. Moreover, these assessment factors describe high-frequency information, such as edge features. The complexity of HVS has not yet to be fully presented. Hence, there is still much room for improvement.

In this study, a FR IQA method called Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM) is proposed by combining Riesz transform with visual contrast sensitivity. To the best of our knowledge, the Log-Gabor filter and the contrast sensitivity function (CSF) are all well-known theories. However, we are the first to combine the frequency characteristic of Log-Gabor filter and frequency-sensitive features of HVS, so that the objective and subjective evaluation results are consistent as much as possible. In addition, although Riesz transform in multidimensional signal processing performs well, the first-order Riesz transform cannot clearly express the corners and intersection points in the image. The proposed RVSIM method introduces the GM similarity thus improves the assessment of performance. In general, RVSIM takes full advantage of the MS theory [32] and Log-Gabor filter [33] by exploiting visual CSF [34] to allocate the weights of different frequency bands. The similarity matrix is obtained by introducing GM, and the MPC map is utilized as a pooling function to derive the final IQA score. Two groups of simulated experiments were carried out with two kinds of databases. The one kind is the LIVE, CSIQ, TID2008, and TID2013 databases, which mainly assess performance through calculating the absolute indicators of the method. The other kind is the Waterloo Exploration database, which mainly assesses through calculating the competitive ranking among methods. The experimental results demonstrate that the proposed RVSIM method is a robust IQA method.

Notably, RVSIM is different from RFSIM [30] and RMFSIM [31] in four aspects. First, RVSIM employs Log-Gabor band-pass filters on the reference and distorted images to obtain the components of images in different frequency bands. Second, RVSIM does not directly use the Riesz transform to determine the feature matrix. Instead, RVSIM utilizes the analytic space obtained by Riesz transform, including local amplitude, phase, and direction, which constitute a complete orthogonal basis [35], and subsequently calculates local feature similarities. Third, RVSIM applies the characteristics of HVS to assign different weights to various frequency bands. In this manner, the RVSIM model has appropriate congruency with the perceptive characteristics of the HVS. Fourth, RVSIM introduces the GM similarity and demonstrates that the first-order Riesz transform cannot clearly express the corners and intersection points in images.

The remaining parts of this paper are organized as follows: Section 2 presents the MS theory, Log-Gabor filter, MPC, and visual contrast sensitivity. For the specific application of these theories in this study, we give a detailed design ideas and calculation process. Section 3 introduces the structure of the new IQA method proposed in this study and also describes the combination of MS, CSF, GM, and MPC to derive the RVSIM index. Section 4 presents the experimental results. Section 5 draws the conclusion.

2 Related works

2.1 Riesz transform

In one-dimensional signal processing, the Hilbert transform has been proven to be effective. However, after its expansion to the two-dimensional image, various attempts using the Hilbert transform, including the local Hilbert transform, the overall Hilbert transform, and the local and global Hilbert transform [36], have all failed because they all have a common flaw: they are not isotropic [37]. Riesz transform can convert the Hilbert transform into a high-dimensional Euclidean space, which is suitable for image processing applications [38, 39].

Figure 1 shows that the Riesz transform space is a spherical coordinate system in a 3D Euclidean space. R,R1, and R2 are the projections of the points in the spherical coordinate system on the three axes [40]. In this spatial domain, the local amplitude A, the local direction θ, and the local phase φ can be expressed as:

$$ \begin{aligned} \left\{\begin{array}{lll} A_{R}(x,y) &= \sqrt{R(x,y)^{2}+R_{1}(x,y)^{2}+R_{2}(x,y)^{2}} \\[0.2cm] \theta_{R}(x,y) &= \tan^{-1}{(-R_{2} (x,y)/R_{1} (x,y))} \\[0.2cm] \varphi_{R} (x,y) &= \tan^{-1} {(R_{12}(x,y)/R(x,y))} \end{array}\right. \end{aligned} $$
Fig. 1
figure 1

The Riesz transform space

where \(R_{12}(x, y) = \sqrt {R_{1}(x, y)^{2} + R_{2}(x, y)^{2}}, \theta _{R}(x, y) \in [0, \pi), \varphi _{R}(x,y) \in [0, \pi)\).

2.2 Log-Gabor filter

Given that the length of the image signal is limited, the image signal is usually band-pass filtered before the Riesz transform, usually using the Log-Gabor filter [41]. In practical applications, multiple Log-Gabor filters should be used to build a complete filter bank in the radial and horizontal directions because of the bandwidth limitation of a single Log-Gabor filter [42]. The optimum filter bank for a specific application can be established on the basis of previously described methods [43, 44]. In this study, the number of scales n r =5, the number of orientations n θ =1, and the splicing parameters are discussed in detail in Section 4.1.

Section 2.4 shows that the center frequencies ω0i (i=1,…,5) of the filter bank are \(\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1}}\), and \(\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\). The bands of the Log-Gabor filter bank are [0.4786,0.2026],[0.2611,0.0965],[0.1243,0.0460],[0.0591,0.0221], and [0.0282,0.0105]. Using this filter bank, the image R is filtered to complete the five-scale decomposition of the image, and the decomposed images Rbi (i=1,…,5) are obtained. The MS of the reference image \(\left [R^{bi}, R_{1}^{bi}, R_{2}^{bi}\right ]~(i=1,\ldots,5)\) are obtained using Rbi (i=1,…,5) for the Riesz transform. Thus, Eq. (1) becomes:

$$ \begin{aligned} \left\{\begin{array}{lll} A_{R}^{bi}(x,y) &= \sqrt{R^{bi}(x,y)^{2}+R_{1}^{bi}(x,y)^{2}+R_{2}^{bi}(x,y)^{2}} \\[0.2cm] \theta_{R}^{bi}(x,y) &= \tan^{-1} {\left(-R_{2}^{bi}(x,y)/R_{1}^{bi}(x,y)\right)} \\[0.2cm] \varphi_{R}^{bi}(x,y) &= \tan^{-1} {\left(R_{12}^{bi}(x,y)/R^{bi}(x,y)\right)} \end{array}\right. \end{aligned} $$

where \(R_{12}^{bi}(x, y) = \sqrt {R_{1}^{bi}(x, y)^{2} + R_{2}^{bi}(x, y)^{2}}, \theta _{R}^{bi}(x, y) \in [0, \pi), \varphi _{R}^{bi}(x, y) \in [0, \pi), i=1,\ldots,5\). Similarly, the MS of the distorted image is \(\left [D^{bi}, D_{1}^{bi}, D_{2}^{bi}\right ]~(i=1,\ldots,5)\) and the corresponding local amplitude \(A_{D}^{bi}\), the local direction \(\theta _{D}^{bi}\), and the local phase \(\varphi _{D}^{bi}, i=1,\ldots,5\).

In this study, the Log-Gabor filter bank is shown in Fig. 2. The center frequencies ω0i (i=1,…,5) from Fig. 2ae are \(\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1 \times 2.1}}\), and \(\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\). Using this Log-Gabor filter bank, two sample images (which are monarch and sailing2 in the LIVE database [45]) are filtered to obtain the different components of the corresponding five bands. Notably, the sample images are grayed before filtering.

Fig. 2
figure 2

Two examples and their Log-Gabor filter banks. ae Forms of filter bank. Their center frequencies are \(\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1 \times 2.1}}\), and \(\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\) respectively. fk The original image monarch and the different components of the corresponding five bands. lq The original image sailing2 and the different components of the corresponding five bands

Figure 2 also shows that the Log-Gabor filter whose ω0 is set as \(\frac {1}{3}\) reflects the high-frequency components of the image, mainly representing the most detailed information of the original image. The Log-Gabor filter, whose ω0 is set as \(\frac {1}{3^{2.1}}\), reflects the sub-high frequency components of the image. The Log-Gabor filter whose ω0 is set as \(\frac {1}{3^{2.1 \times 2.1 \times 2.1}}\) contains a large number of low-frequency components, which mainly reflect the contour information of the original image. The detailed information describes the small-scale parts of the image such as texture, and the remaining large-scale information expresses the basic structure and the trend of the image.

2.3 Monogenic phase congruency

The traditional PC model [46] utilizes the phase information of the image and is widely used to detect the edges, key feature points, and symmetry of the image. However, noise interference, frequency spread, and other problems will occur [47, 48]. The MPC model developed based on the MS theory and PC can better express the local phase information of the image and improve computational efficiency and local feature accuracy [31].

According to Eq. (2), the sum of the local energy is:

$$ E^{'}(x, y) = \sqrt{R^{b}(x, y)^{2} + R_{1}^{b}(x, y)^{2}+R_{2}^{b}(x, y)^{2}} $$

where \(R^{b}(x, y) = \sum _{i=1}^{5}R^{bi}(x, y), R_{1}^{b}(x, y) = \sum _{i=1}^{5}R_{1}^{bi}(x, y)\), and \(R_{2}^{b} (x, y) = \sum _{i=1}^{5}R_{2}^{bi}(x, y)\).

The sum of the local amplitudes is:

$$ A^{'} (x, y) = \sum_{i=1}^{5}A^{bi}(x, y) $$

The MPC model is expressed as:

$$ \begin{aligned} M&PC(x, y)= \\ & W(x, y)\left \lfloor 1-\xi \times acos \begin{pmatrix} \frac{E^{'}(x, y)}{A^{'}(x, y)} \end{pmatrix} \right \rfloor\frac{\left \lfloor E^{'}(x, y) - T \right \rfloor}{A^{'}(x, y)+\varepsilon} \end{aligned} $$

where indicates that the difference between the functions is not permitted to become negative. ξ is the gain coefficient, which is generally given as 1≤ξ≤2. T is the noise compensation factor. ε is a small positive constant, which is set as ε=0.0001. W(x,y) is the weight function that applies a filter response extended value to S-type growth curve [49].

$$ W(x,y)=\frac{1}{1+\exp(g(c-s(x,y)))} $$

where c is the cutoff value of the filter response spread, below which the PC values become penalized, g is the gain factor that controls the sharpness of the cutoff, and s(x,y) is the spread function [31]. Here, we set g=1.8182 and c=1/3.

Figure 3 shows the three-dimensional surface of W(x,y) used to derive the weight function more intuitively. Two sample images (Fig. 3a, d, which is the same as Fig. 2f, l) in the LIVE database [45] are taken as examples. Figure 3b, e shows the three-dimensional surface of W(x,y). Figure 3c, f shows the three-dimensional rotate surface of W(x,y).

Fig. 3
figure 3

Two sample images used for the weight function. These images are extracted from the LIVE database. a, d Reference image. b, e Three-dimensional surface of the weight function. c, f Rotate maps of the three-dimensional surface

Figure 3 shows that the weight function accurately highlights the local characteristics in the sample image, indicating that the MPC can express the local phase information of the image.

2.4 Visual contrast sensitivity

Physiological and psychological research have revealed that HVS has many characteristics such as visual sensitivity band-pass effect, visual nonlinearity effect, visual multichannel, and masking effect [50]. Among them, the CSF characterizes the HVS sensitivity band-pass effect, which reflects the difference in the sensitivity of HVS to different spatial frequencies. Given that CSF can be combined with subjective visual experience, it has been applied to many IQA methods [51, 52]. This study uses the CSF model proposed by Mannos et al. [34]:

$$ A(f_{r}) \approx 2.6(0.0192+0.114f_{r})\exp{\left(-(0.114f_{r})^{1.1}\right)} $$

where f r is the spatial frequency. The normalized CSF characteristic curve is obtained as shown in Fig. 4.

Fig. 4
figure 4

The visual CSF characteristic curve. The CSF curve is divided into five segments, which correspond to red, orange, green, cyan, and blue colors

To facilitate the calculation and adapt to CSF, the center frequencies ω0i (i=1,…,5) of the Log-Gabor filter bank are set as \(\omega _{01} = \frac {1}{3}, \omega _{02} = \frac {1}{3^{2.1}}, \omega _{03} = \frac {1}{3^{2.1 \times 2.1}}, \omega _{04} = \frac {1}{3^{2.1 \times 2.1 \times 2.1}}\), and \(\omega _{05} = \frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}\). The CSF curve is divided into five segments. The half-power point filter is set as the bandwidth limit. Then, the five bands of the Log-Gabor filter bank are [0.4786,0.2026],[0.2611,0.0965],[0.1243,0.0460],[0.0591,0.0221], and [0.0282,0.0105], which are correspondent to red, orange, green, cyan, and blue colors, respectively, in Fig. 4 (the overlap between the bands in the figure is not reflected). The maximum value of each band is set as the weight of the corresponding similarity matrix, and w1=0.3370,w2=0.8962,w3=0.9809,w4=0.9753, and w5=0.7411.

3 Proposed RVSIM method

3.1 The proposed framework

The framework of the proposed RVSIM method in this study is shown in Fig. 5. The reference image R and the distorted image D are filtered by a five-band Log-Gabor band-pass filter to obtain the components Rbi and Dbi (i=1,…,5) in five different frequency bands. \(\left [R^{bi}, R_{1}^{bi}, R_{2}^{bi}\right ]\) and \(\left [D^{bi}, D_{1}^{bi}, D_{2}^{bi}\right ]~(i=1,\ldots,5)\) are obtained by applying Riesz transform to the decomposed image. Five MS similarity functions \(\left (S_{A}^{bi}, S_{\varphi }^{bi}, S_{\theta }^{bi}\right)~(i=1,\ldots,5)\) are obtained using the five similarity functions of the local features (including local amplitude A, local phase φ, and local direction θ). Then, the similarity matrix S Mi (i=1,…,5) is derived. The weights w i (i=1,…,5) of the five similarity matrices are set using the CSF to obtain a single similarity matrix S M . The GM similarity matrix S G of R and D is calculated. Then, S M and S G are combined to obtain the local feature similarity S L of R and D. At the same time, the MPC calculation is performed using the MS obtained by the reference image R to obtain the pooling function. Finally, the local feature similarity map S L is convoluted by the pooling function MPC to obtain the proposed similarity index.

Fig. 5
figure 5

Illustration of the proposed RVSIM method

3.2 RVSIM index

As described previously, the reference image R and the distorted image D are subjected to a Log-Gabor filter bank and a first-order Riesz transform to obtain five MSs to calculate the characteristic indices in the Riesz transform space, including the amplitude A, phase φ, and direction θ. Then, the MS similarity of R and D at the pixel (x,y) is derived as:

$$ \begin{aligned} \left\{\begin{array}{lll} S_{A}^{bi}(x,y)&= \frac{2A_{R}^{bi}A_{D}^{bi}+C_{1}}{\left(A_{R}^{bi}\right)^{2}+\left(A_{D}^{bi}\right)^{2}+C_{1}} \\[0.2cm] S_{\theta}^{bi} (x,y)&= \exp\left(-\left| tan\left(\theta_{R}^{bi}-\theta_{D}^{bi}\right)\right|\right)\\[0.2cm] &= \exp\left(-\left| \frac{R_{1}^{bi} D_{2}^{bi}-R_{2}^{bi} D_{1}^{bi}}{R_{1}^{bi} D_{1}^{bi}+R_{2}^{bi} D_{2}^{bi}} \right|\right) \\[0.2cm] S_{\varphi}^{bi} (x,y) &= \exp\left(-\left| tan\left(\varphi_{R}^{bi}-\varphi_{D}^{bi}\right)\right|\right)\\[0.2cm] &= \exp\left(-\left| \frac{R^{bi} D_{12}^{bi}-R_{12}^{bi} D^{bi}}{R^{bi} D^{bi}+R_{12}^{bi} D_{12}^{bi}} \right|\right) \end{array}\right. \end{aligned} $$

where i=1,…,5, and C1 is a relatively small positive number.

The construction parameter S Mi is taken as the MS similarity matrix:

$$ S_{Mi}=S_{A}^{bi}\cdot S_{\theta}^{bi}\cdot S_{\varphi}^{bi} $$

where i=1,…,5.

The weights of five MS similarity matrices are set as w i (i=1,…,5) using the CSF curve. The weighted sum is calculated to obtain the MS similarity matrix S M :

$$ S_{M} = \sum_{i=1}^{5}w_{i} S_{Mi} $$

Similar to previous studies [29, 53], the GM similarity is defined as:

$$ S_{G}(x,y)=\frac{2G_{R}(x,y) G_{D}(x,y)+C_{2}}{(G_{R}(x,y))^{2}+(G_{D}(x,y))^{2}+C_{3}} $$

where G R (x,y) and G D (x,y) are GM R and D at the pixel (x,y), respectively. C2 and C3 are relatively small positive numbers.

The value range of S G (x,y) is (0,1]. The smaller the value is, the more severe the GM distortion. When S G (x,y)=1, R and D are not distorted at the GM of the pixel. C3 can prevent Eq. (11) from singularity. C2 and C3 play important roles in adjusting the contrast response at the low gradient region.

Then, S M and S G are combined to derive the similarity S L of R and D. S L is defined as:

$$ S_{L} = \left[S_{M} \right]^{\alpha} \cdot \left[S_{G}\right]^{\beta} $$

where α and β are parameters used to adjust the relative importance of MS and GM features. In this study, α=β=1 is set for simplicity.

$$ S_{L} = S_{M} \cdot S_{G} $$

Finally, the MS PC assessment factor MPC is used as the pooling function to obtain the RVSIM index:

$$ RVSIM=\frac{\sum_{(x,y) \in \Omega}S_{L}(x,y) \cdot MPC(x,y)}{\sum_{(x,y) \in \Omega}MPC(x,y)} $$

where Ω means the whole image spatial domain.

4 Experimental results and discussion

This study runs the RVSIM index on five image databases, namely, LIVE [45], CSIQ [54], TID2008 [55], TID2013 [56], and Waterloo Exploration database [57], to verify the performance of the proposed method. The five image databases are used here for algorithm validation and comparison. The characteristics of these five databases are summarized in Table 1.

Table 1 Comparison of five IQA databases

For the LIVE, CSIQ, TID2008, and TID2013 databases, the five-parameter nonlinear logistic regression function in Eq. (15) is used to fit the data [58]. Moreover, four corresponding indicators, such as Spearman rank-order correlation coefficient (SROCC), Kendall rank-order correlation coefficient (KROCC), Pearson linear correlation coefficient (PLCC), and root mean square error (RMSE), are used to compare the performance of the index objectively [59].

$$ f(z) = {{\beta_{1}}}{{\left[{\frac{1}{2}-\frac{1}{{1 + \exp({\beta_{2}}(z-{\beta_{3}}))}}}\right]}}+{{\beta_{4}}}z+{{\beta_{5}}} $$

where z is the objective IQA index, f(z) is the IQA regression index, and β i (i=1,…,5) are the regressing function parameters.

For the Waterloo Exploration database, the group MAximum Differentiation (gMAD) competition, which provides the strongest test to let the IQA models compete with each other [60], is carried out. The gMAD competition can automatically select a subset of image pairs from the database, which provides the competition ranking and reveals the relative performance of the IQA models.

4.1 Determination of parameters

4.1.1 Determination of the constants C1, C2, and C3

Orthogonal experiments were conducted on the LIVE database using the assessment index SROCC to determine the optimal values of constants C1,C2, and C3. Two rounds of orthogonal experiments were conducted to achieve a balance between the complexity of the experiment and the determination of the parameters. Similar to the SSIM model [12], [C1,C2,C3]=[(K1L)2,(K2L)2,[(K3L)2]. L is the dynamic range of the pixel values. For 8-bit grayscale image, the value is L=28−1=255.

Fig. 6
figure 6

Determine the optimal values of K1,K2, and K3. a K2=1.0 and K3=1.0, b K1=1.0 and K3=1.0, c K1=1.0 and K2=1.2, d K2=1.2 and K3=1.0, e K1=1.09 and K3=1.0, and f K1=1.09 and K2=1.16

  1. 1.

    First round: In the first step, K2=1.0 and K3=1.0 were set. The RVSIM index is applied to the LIVE database when K1 has different values. The K1SROCC curve is obtained. As shown in Fig. 6a, SROCC can achieve its maximum value when K1=1.0. The second step is to set K1=1.0 and K3=1.0 when K2 has different values. The RVSIM index is applied to the LIVE database to obtain the K2SROCC curve. As shown in Fig. 6b, SROCC can achieve its maximum value when K2=1.2. In the third step, K1=1.0 and K2=1.2 when K3 has different values. The RVSIM index is applied to the LIVE database, and the K3SROCC curve is obtained. As shown in Fig. 6c, the maximum value of SROCC is obtained when K3=1.0. At this point, the first round of experiments ends. The parameters are K1=1.0, K2=1.2, and K3=1.0.

  2. 2.

    Second round: Based on the parameters obtained in the first round of experiments, the first round of experiments is repeated to obtain the results shown in Fig. 6df. At the end of the second round of experiments, the finalized parameters are K1=1.09, K2=1.16, and K3=1.00.

4.1.2 Determination of the Log-Gabor filter bank

As described in Section 2.2, the finalized splicing parameters of the Log-Gabor filter bank are the number of scales n r =5 and the number of orientations n θ =1. Table 2 lists the SROCC/KROCC/PLCC/RMSE values obtained by applying the RVSIM index to the LIVE, CSIQ, TID2008, and TID2013 databases when different splicing parameters are taken to illustrate the rationality of the selection of these two parameters. The top performance is highlighted in bold. Table 2 shows that, when the number of scales n r =5 and the number of orientations n θ =1, the RVSIM index exhibits its best performance.

Table 2 SROCC/KROCC/PLCC/RMSE values comparison with different splicing parameters on four benchmark databases

4.2 Two sample examples

In order to determine whether the proposed RVSIM method agrees with human judgment, two sample images (Fig. 7a,g, which are the same as Fig. 2f,l) in the LIVE database [45] are taken as examples. Corresponding to these two ground truth images, we select five noise-distorted images and five blur-distorted images in different degrees from the LIVE database.

Fig. 7
figure 7

Two group of images and their corresponding subjective/objective scores. af The original image monarch and five noise-distorted images. gl The original image sailing2 and five blur-distorted images

As shown in Fig. 7, images seem to degrade with increasing blur or noise from left to right. The LIVE database provides the difference mean opinion score (DMOS) for each image. A small DMOS represents a high-quality image. We calculate the objective scores of these images using the RVSIM method. The results can be found in Fig. 7.

Figure 7 shows that RVSIM index is consistent with DMOS. This indicates that RVSIM method, in line with the subjective perception of HVS, can work well in indicating the image quality.

4.3 Performance comparison

Table 3 lists the performance of RVSIM and 11 other state-of-the-art IQA methods (including PSNR, SSIM [12], GSSIM [23], MS-SSIM [21], IW-SSIM [26], FSIM [29], RFSIM [30], VSI [61], SCQI [13], MDSI [62], and SRSIM [63]) on the LIVE, CSIQ, TID2008, and TID2013 databases. The top 3 performances of the indices are highlighted in bold. Apart from GSSIM, the MATLAB source codes of all of the other methods were obtained from the authors. Compared with traditional methods such as PSNR, SSIM, GSSIM, and MS-SSIM, RVSIM exhibits a good performance on the LIVE and CSIQ databases. As we only conduct the orthogonal experiments based on LIVE database, but do not carry out on TID2008 and TID2013 databases, RVSIM performs slightly worse than the best results on TID2008 and TID2013 databases.

Table 3 Performance comparison of IQA methods on four benchmark databases

Figure 8 shows the scatter distributions of the subjective DMOS versus the quality/distortion predicted scores by PSNR, SSIM, MS-SSIM, IW-SSIM, FSIM, SCQI, MDSI, RFSIM, and RVSIM indices on the LIVE database. Figure 8 shows that the scatter plot of RVSIM is evenly distributed throughout the coordinate system and has a strong linear relationship with DMOS, which indicates that the RVSIM model has a strong congruency with HVS.

Fig. 8
figure 8

Scatter plots of predicted image quality indices on the LIVE database. a PSNR, b SSIM, c MS-SSIM, d IW-SSIM, e FSIM, f SCQI, g MDSI, h RFSIM, and i RVSIM

The experiments on these four databases (LIVE, CSIQ, TID2008, and TID2013) are insufficient to illustrate the problem. This study conducted gMAD competition in the Waterloo Exploration database to test the performance of RVSIM objectively and fairly.

Figure 9 shows the competition ranking in the Waterloo Exploration database. In the gMAD competition experiment, the results of the ranking of the 16 state-of-the-art methods have been provided by the official framework [60]. The experimenter is only allowed to participate in the competition ranking on the basis of 16 algorithms that have been provided. The algorithm to be added in Fig. 9af is RVSIM, SRSIM, RFSIM, VSI, MDSI, and SCQI respectively. Notably, the overall performance of RVSIM ranked first. In particular, the RVSIM performs consistently well in terms of aggressiveness, validating that it is a robust IQA method.

Fig. 9
figure 9

gMAD competition. a RVSIM, b SRSIM, c RFSIM, d VSI, e MDSI, and f SCQI

4.4 Discussion

In Table 3, the top 6 methods are highlighted in bold, i.e., MDSI (16 times in bold), SCQI (12 times in bold), VSI (9 times in bold), SRSIM (4 times in bold), FSIM (3 times in bold), and RVSIM (3 times in bold). In Fig. 9, the top 6 methods of the gMAD competition are RVSIM, SRSIM, MS-SSIM, MDSI, and RFSIM. The results are summarized in Table 3 and Fig. 9, and the algorithm rank statistics are shown in Table 4. The proposed RVSIM is highlighted in bold.

Table 4 Summary of the method rank statistics on five databases LIVE, CSIQ, TID2008, TID2013, and Waterloo Exploration

Table 4 shows that the conclusion of indicator performance on the LIVE, CSIQ, TID2008, and TID2013 databases and the conclusion of gMAD competitive ranking on the Waterloo Exploration database are not exactly the same. MDSI ranked first in indicator performance, but ranked fifth in gMAD competition. SCQI ranked second in indicator performance, but performed poorly in gMAD competition. VSI ranked third in indicator performance, but ranked fourth in gMAD competition. SRSIM ranked fourth in indicator performance, but ranked second in gMAD competition. Although RVSIM, SRSIM, and MS-SSIM are not ranked at the top in indicator performance, they exhibited good results in gMAD competition. In particular, RVSIM had the highest rank in gMAD competition.

What results should be considered? The performance indices of the method and gMAD competition ranking are two kinds of judging basis. The performance indices can objectively reflect the performance of the method, but the benchmark databases only provide limited images because of the time-consuming and laborious subjective scoring. gMAD competitions are performed between methods. The results of competitive ranking objectively reflect the relative performance of the IQA models. However, the subjective scoring is needed because the Waterloo Exploration database is so large that the official did not provide DMOS of the image in advance. In other words, they have both rationality and restrictions. A method which has both good results in performance indices and gMAD competitive ranking is considered as an excellent and more objective method. From this point of view, RVSIM exhibits a more consistent and stable performance than the other methods.

5 Conclusion

This study proposes a FR IQA method called RVSIM, which combines Riesz transform and visual contrast sensitivity. RVSIM takes full advantage of the MS theory and Log-Gabor filter by exploiting CSF to allocate the weights of different frequency bands. At the same time, GM similarity is introduced to obtain the gradient similarity matrix. Then, the MPC matrix is used to construct the pooling function and obtain the RVSIM index.

This study conducts experiments involving the RVSIM index on five benchmark IQA databases. The conclusion of the indicator performance indicates that the RVSIM index delivers a highly competitive prediction accuracy on the LIVE and CSIQ databases. The scatter plot of the subjective DMOS versus scores obtained by RVSIM prediction on the LIVE database suggests that the RVSIM model has a strong congruency with HVS. The conclusion of gMAD competition ranking on the Waterloo Exploration database implies that the performance of the RVSIM method is better than that of advanced IQA methods. The overall performance on all five databases demonstrates that RVSIM is a robust IQA method.


  1. C Yan, H Xie, D Yang, J Yin, Y Zhang, Q Dai, Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transp. Syst. PP(99), 1–12 (2017).

    Google Scholar 

  2. C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Sig. Process Lett.21(5), 573–576 (2014).

    Article  Google Scholar 

  3. C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans. Circ. Syst. Video Technol.24(12), 2077–89 (2014).

    Article  Google Scholar 

  4. Z Wang, EP Simoncelli, in Human Vision and Electronic Imaging, 5666. Reduced-reference image quality assessment using a wavelet-domain natural image statistic model (Proceedings of SPIE, San Jose, 2005), pp. 149–59.

    Google Scholar 

  5. C Yan, H Xie, S Liu, J Yin, Y Zhang, Q Dai, Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transp. Syst. PP(99), 1–10 (2017).

    Google Scholar 

  6. G Xia, J Delon, Y Gousseau, Accurate junction detection and characterization in natural images. Int. J. Comput. Vis. 106(1), 31–56 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  7. K Gu, G Zhai, X Yang, W Zhang, M Liu, in 2013 IEEE International Conference on Image Processing. Subjective and objective quality assessment for images with contrast change (IEEE, Melbourne, 2013), pp. 383–87.

    Chapter  Google Scholar 

  8. J Ma, J Zhao, J Tian, AL Yuille, Z Tu, Robust point matching via vector field consensus. IEEE Trans. Image Process. 23(4), 1706–21 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  9. W Zhang, A Borji, Z Wang, P Le Callet, H Liu, The application of visual saliency models in objective image quality assessment: a statistical evaluation. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1266–78 (2016).

    Article  MathSciNet  Google Scholar 

  10. W Lin, C-CJ Kuo, Perceptual visual quality metrics: a survey. J. Vis. Commun. Image Represent. 22(4), 297–312 (2011).

    Article  Google Scholar 

  11. Z Wang, AC Bovik, L Lu, in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference On, vol. 4. Why is image quality assessment so difficult? (IEEE, Orlando, 2002), p. 3313.

    Google Scholar 

  12. Z Wang, AC Bovik, HR Sheikh, EP Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–12 (2004).

    Article  Google Scholar 

  13. S-H Bae, M Kim, A novel image quality assessment with globally and locally consilient visual quality perception. IEEE Trans. Image Process. 25(5), 2392–2406 (2016).

    Article  MathSciNet  Google Scholar 

  14. K Gu, S Wang, H Yang, W Lin, G Zhai, X Yang, W Zhang, Saliency-guided quality assessment of screen content images. IEEE Trans. Multimed. 18(6), 1098–110 (2016).

    Article  Google Scholar 

  15. A Rehman, Z Wang, Reduced-reference image quality assessment by structural similarity estimation. IEEE Trans. Image Process. 21(8), 3378–89 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  16. J Farah, M-R Hojeij, J Chrabieh, F Dufaux, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Full-reference and reduced-reference quality metrics based on sift (IEEE, Florence, 2014), pp. 161–165.

    Chapter  Google Scholar 

  17. S Xu, S Jiang, W Min, No-reference/blind image quality assessment: a survey. IETE Tech. Rev. 34(3), 223–45 (2017).

    Article  Google Scholar 

  18. Z Wang, AC Bovik, Mean squared error: love it or leave it? a new look at signal fidelity measures. IEEE Signal Proc. Mag. 26(1), 98–117 (2009).

    Article  Google Scholar 

  19. Z Wang, Applications of objective image quality assessment methods [applications corner]. IEEE Signal Proc. Mag. 28(6), 137–42 (2011).

    Article  Google Scholar 

  20. Z Wang, AC Bovik, A universal image quality index. IEEE Sig. Process Lett.9(3), 81–84 (2002).

    Article  Google Scholar 

  21. Z Wang, EP Simoncelli, AC Bovik, in Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference On, 2. Multiscale structural similarity for image quality assessment (IEEE, Florence, 2003), pp. 1398–402.

    Google Scholar 

  22. G-H Chen, C-L Yang, L-M Po, S-L Xie, in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference On, vol. 2. Edge-based structural similarity for image quality assessment (IEEE, Florence, 2006).

    Google Scholar 

  23. G-H Chen, C-L Yang, S-L Xie, in Image Processing, 2006 IEEE International Conference On. Gradient-based structural similarity for image quality assessment (IEEE, Atlanta, 2006), pp. 2929–32.

    Chapter  Google Scholar 

  24. C Li, AC Bovik, in IS&T/SPIE Electronic Imaging. Three-component weighted structural similarity index (International Society for Optics and Photonics, San Jose, 2009), p. 72420.

    Google Scholar 

  25. MP Sampat, Z Wang, S Gupta, AC Bovik, MK Markey, Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18(11), 2385–401 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  26. Z Wang, Q Li, Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 20(5), 1185–98 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  27. HR Sheikh, AC Bovik, G De Veciana, An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 14(12), 2117–28 (2005).

    Article  Google Scholar 

  28. HR Sheikh, AC Bovik, Image information and visual quality. IEEE Trans. Image Process. 15(2), 430–44 (2006).

    Article  Google Scholar 

  29. L Zhang, L Zhang, X Mou, D Zhang, FSIM: a feature similarity index for image quality assessment. IEEE Trans. Image Process. 20(8), 2378–86 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  30. L Zhang, L Zhang, X Mou, in Image Processing (ICIP), 2010 17th IEEE International Conference On. RFSIM: a feature based image quality assessment metric using Riesz transforms (IEEE, Hong Kong, 2010), pp. 321–324.

    Chapter  Google Scholar 

  31. X-G Luo, H-J Wang, S Wang, Monogenic signal theory based feature similarity index for image quality assessment. AEU-International J. Electron. Commun. 69(1), 75–81 (2015).

    Article  Google Scholar 

  32. P Cerejeiras, U Kähler, Monogenic signal theory. Oper. Theory (Springer,Basel, 2015).

    MATH  Google Scholar 

  33. DJ Field, Relations between the statistics of natural images and the response properties of cortical cells. JOSA A. 4(12), 2379–94 (1987).

    Article  Google Scholar 

  34. J Mannos, D Sakrison, The effects of a visual fidelity criterion of the encoding of images. IEEE Trans. Inf. Theory. 20(4), 525–36 (1974).

    Article  MATH  Google Scholar 

  35. M Felsberg, G Sommer, The monogenic scale-space: a unifying approach to phase-based image processing in scale-space. J. Math. Imaging Vis. 21(1), 5–26 (2004).

    Article  MathSciNet  Google Scholar 

  36. C Zhao, J Wan, L Ren, Image feature extraction based on the two-dimensional empirical mode decomposition. Image Sig. Process Congr. 1:, 627–31 (2008).

    Google Scholar 

  37. M Felsberg, G Sommer, The monogenic signal. IEEE Trans. Sig. Process. 49(12), 3136–44 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  38. K Langley, SJ Anderson, The Riesz transform and simultaneous representations of phase, energy and orientation in spatial vision. Vis. Res. 50(17), 1748–65 (2010).

    Article  Google Scholar 

  39. C Wachinger, T Klein, N Navab, The 2d analytic signal for envelope detection and feature extraction on ultrasound images. Med. Image Anal. 16(6), 1073–84 (2012).

    Article  Google Scholar 

  40. L Wietzke, G Sommer, O Fleischmann, in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On. The geometry of 2d image signals (IEEE, Miami, 2009), pp. 1690–7.

    Chapter  Google Scholar 

  41. D Boukerroui, JA Noble, M Brady, On the choice of band-pass quadrature filters. J. Math. Imaging Vis. 21(1-2), 53–80 (2004).

    Article  MathSciNet  Google Scholar 

  42. JR Movellan, Tutorial on Gabor filters. Open Source Document (2002). Accessed 24 July 2017.

  43. S Fischer, R Redondo, G Cristóbal, How to construct Log-Gabor filters. Open Access Digit. CSIC Document. 21:, 1–9 (2009).

    Google Scholar 

  44. P Kovesi, What are Log-Gabor filters and why are they good? (2006). Accessed 24 July 2017.

  45. HR Sheikh, Z Wang, L Cormack, AC Bovik, LIVE image quality assessment database release 2 (2005). [Online]. Available:

  46. P Kovesi, Phase congruency: a low-level image invariant. Psychol. Res. 64(2), 136–48 (2000).

    Article  Google Scholar 

  47. P Kovesi, Image features from phase congruency. Videre: J. Comput. Vis. Res.1(3), 1–26 (1999).

    Google Scholar 

  48. P Kovesi, in The Australian Pattern Recognition Society Conference: DICTA 2003. Phase congruency detects corners and edges (The University of Queensland, Sydney, 2003).

    Google Scholar 

  49. MN Gibbs, DJ MacKay, Variational Gaussian process classifiers. IEEE Trans. Neural Netw.11(6), 1458–64 (2000).

    Article  Google Scholar 

  50. Z Wang, AC Bovik, Modern image quality assessment. Synth. Lect. Image, Video Multimed. Process. 2(1), 1–156 (2006).

    Article  Google Scholar 

  51. X Gao, W Lu, D Tao, X Li, in Visual Communications and Image Processing 2010. Image quality assessment and human visual system (International Society for Optics and Photonics, Huangshan, 2010), p. 77440.

    Chapter  Google Scholar 

  52. DM Chandler, Seven challenges in image quality assessment: past, present, and future research. ISRN Sig. Process. 2013:, 1–53 (2013).

    Article  Google Scholar 

  53. K Gu, G Zhai, X Yang, W Zhang, in 2014 IEEE International Conference on Image Processing (ICIP). An efficient color image quality metric with local-tuned-global model (IEEE, Paris, 2014), pp. 506–510.

    Chapter  Google Scholar 

  54. EC Larson, DM Chandler, Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging. 19(1), 011006 (2010).

    Article  Google Scholar 

  55. N Ponomarenko, V Lukin, A Zelensky, K Egiazarian, M Carli, F Battisti, Tid2008-a database for evaluation of full-reference visual quality assessment metrics. Adv. Mod. Radioelectron.10(4), 30–45 (2009).

    Google Scholar 

  56. N Ponomarenko, L Jin, O Ieremeiev, V Lukin, K Egiazarian, J Astola, B Vozel, K Chehdi, M Carli, F Battisti, et al., Image database tid2013: Peculiarities, results and perspectives. Signal Process. Image Commun.30:, 57–77 (2015).

    Article  Google Scholar 

  57. K Ma, Z Duanmu, Q Wu, Z Wang, H Yong, H Li, L Zhang, Waterloo exploration database: New challenges for image quality assessment models. IEEE Trans. Image Process. 26(2), 1004–1016 (2017).

    Article  MathSciNet  Google Scholar 

  58. HR Sheikh, MF Sabir, AC Bovik, A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 15(11), 3440–51 (2006).

    Article  Google Scholar 

  59. P Corriveau, A Webster, Final report from the video quality experts group on the validation of objective models of video quality assessment, phase II, Video Quality Experts Group, CO, USA,Tech. Rep. Phase II, (2003).

  60. M Kede, W Qingbo, W Zhou, Z Duanmu, H Yong, H Li, Z Lei, Group MAD competition—a new methodology to compare objective image quality models.

  61. L Zhang, Y Shen, H Li, VSI: a visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 23(10), 4270–81 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  62. HZ Nafchi, A Shahkolaei, R Hedjam, M Cheriet, Mean deviation similarity index: efficient and reliable full-reference image quality evaluator. IEEE Access. 4:, 5579–90 (2016).

    Article  Google Scholar 

  63. L Zhang, H Li, in Image Processing (ICIP), 2012 19th IEEE International Conference On. Sr-sim: A fast and high performance IQA index based on spectral residual (IEEE, Orlando, 2012), pp. 1473–76.

    Chapter  Google Scholar 

Download references


The authors would like to thank Jiahua Cao and Associate Professor Weizheng Jin for the valuable opinions they had offered during our heated discussions.


This study is partially supported by National Natural Science Foundation of China (NSFC) (No. 61571334) and National High Technology Research and Development Program (863 Program) (No. 2014AA09A512).

Author information

Authors and Affiliations



GY conducted the experiments and drafted the manuscript. FL and YL implemented the core algorithm and performed the statistical analysis. DL designed the methodology. WY modified the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guangyi Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

Availability of data and materials

The MATLAB source code of RVSIM can be downloaded at for public use and evaluation. You can change this program as you like and use it anywhere, but please refer to its original source.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, G., Li, D., Lu, F. et al. RVSIM: a feature similarity method for full-reference image quality assessment. J Image Video Proc. 2018, 6 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: