RVSIM: a feature similarity method for full-reference image quality assessment

Yang, Guangyi; Li, Deshi; Lu, Fan; Liao, Yue; Yang, Wen

doi:10.1186/s13640-018-0246-1

Research
Open access
Published: 19 January 2018

RVSIM: a feature similarity method for full-reference image quality assessment

Guangyi Yang ORCID: orcid.org/0000-0002-1580-0188¹,
Deshi Li¹,
Fan Lu¹,
Yue Liao¹ &
…
Wen Yang¹

EURASIP Journal on Image and Video Processing volume 2018, Article number: 6 (2018) Cite this article

4153 Accesses
34 Citations
Metrics details

Abstract

Image quality assessment is an important topic in the field of digital image processing. In this study, a full-reference image quality assessment method called Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM) is proposed. More precisely, a Log-Gabor filter is first used to decompose reference and distorted images, and Riesz transform is performed on the decomposed images on the basis of monogenic signal theory. Then, the monogenic signal similarity matrix is obtained by calculating the similarity of the local amplitude/phase/direction characteristics of monogenic signal. Next, we weight the summation of these characteristics with visual contrast sensitivity. Since the first-order Riesz transform cannot clearly express the corners and intersection points in the image, we calculate the gradient magnitude similarity between the reference and distorted images as a feature, which is combined with monogenic signal similarity to obtain a local quality map. Finally, we conduct the monogenic phase congruency using the Riesz transform feature matrix from the reference image and utilize it as a weighted function to derive the similarity index. Extensive experiments on five benchmark IQA databases, namely, LIVE, CSIQ, TID2008, TID2013, and Waterloo Exploration, indicate that RVSIM is a robust IQA method.

1 Introduction

Digital image is an essential factor to express and communicate information. Digital imaging has been applied in many fields, but digital image quality is inevitably reduced and affected during image collection, compression [1–3], transmission [4], processing [5], and reconstruction [6, 7]. The accurate assessment of image quality has also become challenging [8]. As such, image quality assessment (IQA) has been extensively investigated [9–11].

IQA can be divided into full-reference (FR), reduced-reference (RR), and no-reference (NR) assessments [12] based on the presence of reference images. The FR IQA methods are based on “the original image”, which is taken as the reference image. It is mainly used in assessing the similarity and fidelity between distorted image and original undistorted image [13, 14]. The RR IQA methods are considered practical when we can only get access to some extracted features instead of the whole original image [15]. We can use these provided features and give a reasonable estimation on the distorted image’s quality [16]. In some practical applications, the reference image is not available to perform a comparison against. Therefore, the NR IQA methods are needed [17]. This study focuses on FR IQA methods.

MSE and PSNR are widely used FR IQA methods. In these methods, image quality is assessed by calculating the overall pixel error, and average error is used as the final assessment result. These methods provide several advantages, such as simple calculation and easy implementation. But since the modeling is too simple, the comprehending of the image is overly superficial. The absolute error between pixels of two images is calculated, but the correlation between pixels and the perceptive characteristics of human visual system (HVS) are disregarded. Their low-level features, such as edge information, are also yet to be described. Thus, it causes serious incongruency, which is against the perceptive characteristics of HVS and is likely the cause of unrealistic conditions between assessed results and actual phenomena during quality assessment [18, 19].

Many representative assessment methods have been proposed to adapt to human visual characteristics. Wang et al. [12] established a Structural SIMilarity (SSIM) model, which is considered the most common representative based on universal image quality index (UQI) [20]. The structural information of images is applied to assess quality and SSIM index. Experiments show that SSIM is appropriate than previous assessment methods. Although SSIM improves the congruency between assessment results and HVS perception, the structural features of images remain scalar and consequently causes SSIM to lose its validity when images are highly blurred. Numerous methods, such as MS-SSIM [21], ESSIM [22], GSSIM [23], 3-SSIM [24], CW-SSIM [25], and IW-SSIM [26], have been improved on the basis of SSIM, and these methods enhance the assessment result to a certain level. Sheikh et al. [27, 28] also developed methods, such as IFC and VIF, based on natural scene statistics (NSS) to introduce the concept of information fidelity. Zhang et al. [29] proposed a Feature SIMilarity (FSIM) method that introduces phase congruency (PC) and gradient magnitude (GM) similarity as assessment features.

With in-depth research, natural images as a two-dimensional signal characterized by highly structured features must have a vector trait. The pixels of images show a strong dependency, which constitutes the structure of two-dimensional image. The main function of HVS is to obtain structural information from the field of view. Zhang et al. [30] constructed similarity matrices by using the characteristic map of first- and second-order Riesz transforms and utilized edge features as pooling function to derive the RFSIM index because of the good performance of Riesz transform in multidimensional signal processing. Luo et al. [31] introduced monogenic phase congruency (MPC) based on PC and proposed the RMFSIM method. With these methods, the structural method can be used to assess the vector characteristics of two-dimensional images more efficiently. However, these methods simply apply the Riesz transform to construct local features that partially consider the physical meaning of monogenic signal (MS) theory. Moreover, these assessment factors describe high-frequency information, such as edge features. The complexity of HVS has not yet to be fully presented. Hence, there is still much room for improvement.

In this study, a FR IQA method called Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM) is proposed by combining Riesz transform with visual contrast sensitivity. To the best of our knowledge, the Log-Gabor filter and the contrast sensitivity function (CSF) are all well-known theories. However, we are the first to combine the frequency characteristic of Log-Gabor filter and frequency-sensitive features of HVS, so that the objective and subjective evaluation results are consistent as much as possible. In addition, although Riesz transform in multidimensional signal processing performs well, the first-order Riesz transform cannot clearly express the corners and intersection points in the image. The proposed RVSIM method introduces the GM similarity thus improves the assessment of performance. In general, RVSIM takes full advantage of the MS theory [32] and Log-Gabor filter [33] by exploiting visual CSF [34] to allocate the weights of different frequency bands. The similarity matrix is obtained by introducing GM, and the MPC map is utilized as a pooling function to derive the final IQA score. Two groups of simulated experiments were carried out with two kinds of databases. The one kind is the LIVE, CSIQ, TID2008, and TID2013 databases, which mainly assess performance through calculating the absolute indicators of the method. The other kind is the Waterloo Exploration database, which mainly assesses through calculating the competitive ranking among methods. The experimental results demonstrate that the proposed RVSIM method is a robust IQA method.

Notably, RVSIM is different from RFSIM [30] and RMFSIM [31] in four aspects. First, RVSIM employs Log-Gabor band-pass filters on the reference and distorted images to obtain the components of images in different frequency bands. Second, RVSIM does not directly use the Riesz transform to determine the feature matrix. Instead, RVSIM utilizes the analytic space obtained by Riesz transform, including local amplitude, phase, and direction, which constitute a complete orthogonal basis [35], and subsequently calculates local feature similarities. Third, RVSIM applies the characteristics of HVS to assign different weights to various frequency bands. In this manner, the RVSIM model has appropriate congruency with the perceptive characteristics of the HVS. Fourth, RVSIM introduces the GM similarity and demonstrates that the first-order Riesz transform cannot clearly express the corners and intersection points in images.

The remaining parts of this paper are organized as follows: Section 2 presents the MS theory, Log-Gabor filter, MPC, and visual contrast sensitivity. For the specific application of these theories in this study, we give a detailed design ideas and calculation process. Section 3 introduces the structure of the new IQA method proposed in this study and also describes the combination of MS, CSF, GM, and MPC to derive the RVSIM index. Section 4 presents the experimental results. Section 5 draws the conclusion.

2 Related works

2.1 Riesz transform

In one-dimensional signal processing, the Hilbert transform has been proven to be effective. However, after its expansion to the two-dimensional image, various attempts using the Hilbert transform, including the local Hilbert transform, the overall Hilbert transform, and the local and global Hilbert transform [36], have all failed because they all have a common flaw: they are not isotropic [37]. Riesz transform can convert the Hilbert transform into a high-dimensional Euclidean space, which is suitable for image processing applications [38, 39].

Figure 1 shows that the Riesz transform space is a spherical coordinate system in a 3D Euclidean space. R,R₁, and R₂ are the projections of the points in the spherical coordinate system on the three axes [40]. In this spatial domain, the local amplitude A, the local direction θ, and the local phase φ can be expressed as:

$$ \begin{aligned} \left\{\begin{array}{lll} A_{R}(x,y) &= \sqrt{R(x,y)^{2}+R_{1}(x,y)^{2}+R_{2}(x,y)^{2}} \\[0.2cm] \theta_{R}(x,y) &= \tan^{-1}{(-R_{2} (x,y)/R_{1} (x,y))} \\[0.2cm] \varphi_{R} (x,y) &= \tan^{-1} {(R_{12}(x,y)/R(x,y))} \end{array}\right. \end{aligned} $$

(1)

where $R_{12}(x, y) = \sqrt {R_{1}(x, y)^{2} + R_{2}(x, y)^{2}}, \theta _{R}(x, y) \in [0, \pi), \varphi _{R}(x,y) \in [0, \pi)$.

2.2 Log-Gabor filter

Given that the length of the image signal is limited, the image signal is usually band-pass filtered before the Riesz transform, usually using the Log-Gabor filter [41]. In practical applications, multiple Log-Gabor filters should be used to build a complete filter bank in the radial and horizontal directions because of the bandwidth limitation of a single Log-Gabor filter [42]. The optimum filter bank for a specific application can be established on the basis of previously described methods [43, 44]. In this study, the number of scales n_r=5, the number of orientations n_θ=1, and the splicing parameters are discussed in detail in Section 4.1.

Section 2.4 shows that the center frequencies ω_0i (i=1,…,5) of the filter bank are $\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1}}$, and $\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}$. The bands of the Log-Gabor filter bank are [0.4786,0.2026],[0.2611,0.0965],[0.1243,0.0460],[0.0591,0.0221], and [0.0282,0.0105]. Using this filter bank, the image R is filtered to complete the five-scale decomposition of the image, and the decomposed images R^bi (i=1,…,5) are obtained. The MS of the reference image $\left [R^{bi}, R_{1}^{bi}, R_{2}^{bi}\right ]~(i=1,\ldots,5)$ are obtained using R^bi (i=1,…,5) for the Riesz transform. Thus, Eq. (1) becomes:

$$ \begin{aligned} \left\{\begin{array}{lll} A_{R}^{bi}(x,y) &= \sqrt{R^{bi}(x,y)^{2}+R_{1}^{bi}(x,y)^{2}+R_{2}^{bi}(x,y)^{2}} \\[0.2cm] \theta_{R}^{bi}(x,y) &= \tan^{-1} {\left(-R_{2}^{bi}(x,y)/R_{1}^{bi}(x,y)\right)} \\[0.2cm] \varphi_{R}^{bi}(x,y) &= \tan^{-1} {\left(R_{12}^{bi}(x,y)/R^{bi}(x,y)\right)} \end{array}\right. \end{aligned} $$

(2)

where $R_{12}^{bi}(x, y) = \sqrt {R_{1}^{bi}(x, y)^{2} + R_{2}^{bi}(x, y)^{2}}, \theta _{R}^{bi}(x, y) \in [0, \pi), \varphi _{R}^{bi}(x, y) \in [0, \pi), i=1,\ldots,5$. Similarly, the MS of the distorted image is $\left [D^{bi}, D_{1}^{bi}, D_{2}^{bi}\right ]~(i=1,\ldots,5)$ and the corresponding local amplitude $A_{D}^{bi}$, the local direction $\theta _{D}^{bi}$, and the local phase $\varphi _{D}^{bi}, i=1,\ldots,5$.

In this study, the Log-Gabor filter bank is shown in Fig. 2. The center frequencies ω_0i (i=1,…,5) from Fig. 2 a–e are $\omega _{01}=\frac {1}{3}, \omega _{02}=\frac {1}{3^{2.1}}, \omega _{03}=\frac {1}{3^{2.1 \times 2.1}}, \omega _{04}=\frac {1}{3^{2.1 \times 2.1 \times 2.1}}$, and $\omega _{05}=\frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}$. Using this Log-Gabor filter bank, two sample images (which are monarch and sailing2 in the LIVE database [45]) are filtered to obtain the different components of the corresponding five bands. Notably, the sample images are grayed before filtering.

Figure 2 also shows that the Log-Gabor filter whose ω₀ is set as $\frac {1}{3}$ reflects the high-frequency components of the image, mainly representing the most detailed information of the original image. The Log-Gabor filter, whose ω₀ is set as $\frac {1}{3^{2.1}}$, reflects the sub-high frequency components of the image. The Log-Gabor filter whose ω₀ is set as $\frac {1}{3^{2.1 \times 2.1 \times 2.1}}$ contains a large number of low-frequency components, which mainly reflect the contour information of the original image. The detailed information describes the small-scale parts of the image such as texture, and the remaining large-scale information expresses the basic structure and the trend of the image.

2.3 Monogenic phase congruency

The traditional PC model [46] utilizes the phase information of the image and is widely used to detect the edges, key feature points, and symmetry of the image. However, noise interference, frequency spread, and other problems will occur [47, 48]. The MPC model developed based on the MS theory and PC can better express the local phase information of the image and improve computational efficiency and local feature accuracy [31].

According to Eq. (2), the sum of the local energy is:

$$ E^{'}(x, y) = \sqrt{R^{b}(x, y)^{2} + R_{1}^{b}(x, y)^{2}+R_{2}^{b}(x, y)^{2}} $$

(3)

where $R^{b}(x, y) = \sum _{i=1}^{5}R^{bi}(x, y), R_{1}^{b}(x, y) = \sum _{i=1}^{5}R_{1}^{bi}(x, y)$, and $R_{2}^{b} (x, y) = \sum _{i=1}^{5}R_{2}^{bi}(x, y)$.

The sum of the local amplitudes is:

$$ A^{'} (x, y) = \sum_{i=1}^{5}A^{bi}(x, y) $$

(4)

The MPC model is expressed as:

$$ \begin{aligned} M&PC(x, y)= \\ & W(x, y)\left \lfloor 1-\xi \times acos \begin{pmatrix} \frac{E^{'}(x, y)}{A^{'}(x, y)} \end{pmatrix} \right \rfloor\frac{\left \lfloor E^{'}(x, y) - T \right \rfloor}{A^{'}(x, y)+\varepsilon} \end{aligned} $$

(5)

where ⌊ ⌋ indicates that the difference between the functions is not permitted to become negative. ξ is the gain coefficient, which is generally given as 1≤ξ≤2. T is the noise compensation factor. ε is a small positive constant, which is set as ε=0.0001. W(x,y) is the weight function that applies a filter response extended value to S-type growth curve [49].

$$ W(x,y)=\frac{1}{1+\exp(g(c-s(x,y)))} $$

(6)

where c is the cutoff value of the filter response spread, below which the PC values become penalized, g is the gain factor that controls the sharpness of the cutoff, and s(x,y) is the spread function [31]. Here, we set g=1.8182 and c=1/3.

Figure 3 shows the three-dimensional surface of W(x,y) used to derive the weight function more intuitively. Two sample images (Fig. 3 a, d, which is the same as Fig. 2 f, l) in the LIVE database [45] are taken as examples. Figure 3 b, e shows the three-dimensional surface of W(x,y). Figure 3 c, f shows the three-dimensional rotate surface of W(x,y).

Figure 3 shows that the weight function accurately highlights the local characteristics in the sample image, indicating that the MPC can express the local phase information of the image.

2.4 Visual contrast sensitivity

Physiological and psychological research have revealed that HVS has many characteristics such as visual sensitivity band-pass effect, visual nonlinearity effect, visual multichannel, and masking effect [50]. Among them, the CSF characterizes the HVS sensitivity band-pass effect, which reflects the difference in the sensitivity of HVS to different spatial frequencies. Given that CSF can be combined with subjective visual experience, it has been applied to many IQA methods [51, 52]. This study uses the CSF model proposed by Mannos et al. [34]:

$$ A(f_{r}) \approx 2.6(0.0192+0.114f_{r})\exp{\left(-(0.114f_{r})^{1.1}\right)} $$

(7)

where f_r is the spatial frequency. The normalized CSF characteristic curve is obtained as shown in Fig. 4.

To facilitate the calculation and adapt to CSF, the center frequencies ω_0i (i=1,…,5) of the Log-Gabor filter bank are set as $\omega _{01} = \frac {1}{3}, \omega _{02} = \frac {1}{3^{2.1}}, \omega _{03} = \frac {1}{3^{2.1 \times 2.1}}, \omega _{04} = \frac {1}{3^{2.1 \times 2.1 \times 2.1}}$, and $\omega _{05} = \frac {1}{3^{2.1 \times 2.1 \times 2.1 \times 2.1}}$. The CSF curve is divided into five segments. The half-power point filter is set as the bandwidth limit. Then, the five bands of the Log-Gabor filter bank are [0.4786,0.2026],[0.2611,0.0965],[0.1243,0.0460],[0.0591,0.0221], and [0.0282,0.0105], which are correspondent to red, orange, green, cyan, and blue colors, respectively, in Fig. 4 (the overlap between the bands in the figure is not reflected). The maximum value of each band is set as the weight of the corresponding similarity matrix, and w₁=0.3370,w₂=0.8962,w₃=0.9809,w₄=0.9753, and w₅=0.7411.

3 Proposed RVSIM method

3.1 The proposed framework

The framework of the proposed RVSIM method in this study is shown in Fig. 5. The reference image R and the distorted image D are filtered by a five-band Log-Gabor band-pass filter to obtain the components R^bi and D^bi (i=1,…,5) in five different frequency bands. $\left [R^{bi}, R_{1}^{bi}, R_{2}^{bi}\right ]$ and $\left [D^{bi}, D_{1}^{bi}, D_{2}^{bi}\right ]~(i=1,\ldots,5)$ are obtained by applying Riesz transform to the decomposed image. Five MS similarity functions $\left (S_{A}^{bi}, S_{\varphi }^{bi}, S_{\theta }^{bi}\right)~(i=1,\ldots,5)$ are obtained using the five similarity functions of the local features (including local amplitude A, local phase φ, and local direction θ). Then, the similarity matrix S_Mi (i=1,…,5) is derived. The weights w_i (i=1,…,5) of the five similarity matrices are set using the CSF to obtain a single similarity matrix S_M. The GM similarity matrix S_G of R and D is calculated. Then, S_M and S_G are combined to obtain the local feature similarity S_L of R and D. At the same time, the MPC calculation is performed using the MS obtained by the reference image R to obtain the pooling function. Finally, the local feature similarity map S_L is convoluted by the pooling function MPC to obtain the proposed similarity index.

3.2 RVSIM index

As described previously, the reference image R and the distorted image D are subjected to a Log-Gabor filter bank and a first-order Riesz transform to obtain five MSs to calculate the characteristic indices in the Riesz transform space, including the amplitude A, phase φ, and direction θ. Then, the MS similarity of R and D at the pixel (x,y) is derived as:

$$ \begin{aligned} \left\{\begin{array}{lll} S_{A}^{bi}(x,y)&= \frac{2A_{R}^{bi}A_{D}^{bi}+C_{1}}{\left(A_{R}^{bi}\right)^{2}+\left(A_{D}^{bi}\right)^{2}+C_{1}} \\[0.2cm] S_{\theta}^{bi} (x,y)&= \exp\left(-\left| tan\left(\theta_{R}^{bi}-\theta_{D}^{bi}\right)\right|\right)\\[0.2cm] &= \exp\left(-\left| \frac{R_{1}^{bi} D_{2}^{bi}-R_{2}^{bi} D_{1}^{bi}}{R_{1}^{bi} D_{1}^{bi}+R_{2}^{bi} D_{2}^{bi}} \right|\right) \\[0.2cm] S_{\varphi}^{bi} (x,y) &= \exp\left(-\left| tan\left(\varphi_{R}^{bi}-\varphi_{D}^{bi}\right)\right|\right)\\[0.2cm] &= \exp\left(-\left| \frac{R^{bi} D_{12}^{bi}-R_{12}^{bi} D^{bi}}{R^{bi} D^{bi}+R_{12}^{bi} D_{12}^{bi}} \right|\right) \end{array}\right. \end{aligned} $$

(8)

where i=1,…,5, and C₁ is a relatively small positive number.

The construction parameter S_Mi is taken as the MS similarity matrix:

$$ S_{Mi}=S_{A}^{bi}\cdot S_{\theta}^{bi}\cdot S_{\varphi}^{bi} $$

(9)

where i=1,…,5.

The weights of five MS similarity matrices are set as w_i (i=1,…,5) using the CSF curve. The weighted sum is calculated to obtain the MS similarity matrix S_M:

$$ S_{M} = \sum_{i=1}^{5}w_{i} S_{Mi} $$

(10)

Similar to previous studies [29, 53], the GM similarity is defined as:

$$ S_{G}(x,y)=\frac{2G_{R}(x,y) G_{D}(x,y)+C_{2}}{(G_{R}(x,y))^{2}+(G_{D}(x,y))^{2}+C_{3}} $$

(11)

where G_R(x,y) and G_D(x,y) are GM R and D at the pixel (x,y), respectively. C₂ and C₃ are relatively small positive numbers.

The value range of S_G(x,y) is (0,1]. The smaller the value is, the more severe the GM distortion. When S_G(x,y)=1, R and D are not distorted at the GM of the pixel. C₃ can prevent Eq. (11) from singularity. C₂ and C₃ play important roles in adjusting the contrast response at the low gradient region.

Then, S_M and S_G are combined to derive the similarity S_L of R and D. S_L is defined as:

$$ S_{L} = \left[S_{M} \right]^{\alpha} \cdot \left[S_{G}\right]^{\beta} $$

(12)

where α and β are parameters used to adjust the relative importance of MS and GM features. In this study, α=β=1 is set for simplicity.

$$ S_{L} = S_{M} \cdot S_{G} $$

(13)

Finally, the MS PC assessment factor MPC is used as the pooling function to obtain the RVSIM index:

$$ RVSIM=\frac{\sum_{(x,y) \in \Omega}S_{L}(x,y) \cdot MPC(x,y)}{\sum_{(x,y) \in \Omega}MPC(x,y)} $$

(14)

where Ω means the whole image spatial domain.

4 Experimental results and discussion

This study runs the RVSIM index on five image databases, namely, LIVE [45], CSIQ [54], TID2008 [55], TID2013 [56], and Waterloo Exploration database [57], to verify the performance of the proposed method. The five image databases are used here for algorithm validation and comparison. The characteristics of these five databases are summarized in Table 1.

Table 1 Comparison of five IQA databases

Full size table

For the LIVE, CSIQ, TID2008, and TID2013 databases, the five-parameter nonlinear logistic regression function in Eq. (15) is used to fit the data [58]. Moreover, four corresponding indicators, such as Spearman rank-order correlation coefficient (SROCC), Kendall rank-order correlation coefficient (KROCC), Pearson linear correlation coefficient (PLCC), and root mean square error (RMSE), are used to compare the performance of the index objectively [59].

$$ f(z) = {{\beta_{1}}}{{\left[{\frac{1}{2}-\frac{1}{{1 + \exp({\beta_{2}}(z-{\beta_{3}}))}}}\right]}}+{{\beta_{4}}}z+{{\beta_{5}}} $$

(15)

where z is the objective IQA index, f(z) is the IQA regression index, and β_i (i=1,…,5) are the regressing function parameters.

For the Waterloo Exploration database, the group MAximum Differentiation (gMAD) competition, which provides the strongest test to let the IQA models compete with each other [60], is carried out. The gMAD competition can automatically select a subset of image pairs from the database, which provides the competition ranking and reveals the relative performance of the IQA models.

4.1 Determination of parameters

4.1.1 Determination of the constants C₁, C₂, and C₃

Orthogonal experiments were conducted on the LIVE database using the assessment index SROCC to determine the optimal values of constants C₁,C₂, and C₃. Two rounds of orthogonal experiments were conducted to achieve a balance between the complexity of the experiment and the determination of the parameters. Similar to the SSIM model [12], [C₁,C₂,C₃]=[(K₁L)²,(K₂L)²,[(K₃L)²]. L is the dynamic range of the pixel values. For 8-bit grayscale image, the value is L=2⁸−1=255.

1.
First round: In the first step, K₂=1.0 and K₃=1.0 were set. The RVSIM index is applied to the LIVE database when K₁ has different values. The K₁−SROCC curve is obtained. As shown in Fig. 6 a, SROCC can achieve its maximum value when K₁=1.0. The second step is to set K₁=1.0 and K₃=1.0 when K₂ has different values. The RVSIM index is applied to the LIVE database to obtain the K₂−SROCC curve. As shown in Fig. 6 b, SROCC can achieve its maximum value when K₂=1.2. In the third step, K₁=1.0 and K₂=1.2 when K₃ has different values. The RVSIM index is applied to the LIVE database, and the K₃−SROCC curve is obtained. As shown in Fig. 6 c, the maximum value of SROCC is obtained when K₃=1.0. At this point, the first round of experiments ends. The parameters are K₁=1.0, K₂=1.2, and K₃=1.0.
2.
Second round: Based on the parameters obtained in the first round of experiments, the first round of experiments is repeated to obtain the results shown in Fig. 6 d–f. At the end of the second round of experiments, the finalized parameters are K₁=1.09, K₂=1.16, and K₃=1.00.

4.1.2 Determination of the Log-Gabor filter bank

As described in Section 2.2, the finalized splicing parameters of the Log-Gabor filter bank are the number of scales n_r=5 and the number of orientations n_θ=1. Table 2 lists the SROCC/KROCC/PLCC/RMSE values obtained by applying the RVSIM index to the LIVE, CSIQ, TID2008, and TID2013 databases when different splicing parameters are taken to illustrate the rationality of the selection of these two parameters. The top performance is highlighted in bold. Table 2 shows that, when the number of scales n_r=5 and the number of orientations n_θ=1, the RVSIM index exhibits its best performance.

Table 2 SROCC/KROCC/PLCC/RMSE values comparison with different splicing parameters on four benchmark databases

Full size table

4.2 Two sample examples

In order to determine whether the proposed RVSIM method agrees with human judgment, two sample images (Fig. 7 a,g, which are the same as Fig. 2 f,l) in the LIVE database [45] are taken as examples. Corresponding to these two ground truth images, we select five noise-distorted images and five blur-distorted images in different degrees from the LIVE database.

As shown in Fig. 7, images seem to degrade with increasing blur or noise from left to right. The LIVE database provides the difference mean opinion score (DMOS) for each image. A small DMOS represents a high-quality image. We calculate the objective scores of these images using the RVSIM method. The results can be found in Fig. 7.

Figure 7 shows that RVSIM index is consistent with DMOS. This indicates that RVSIM method, in line with the subjective perception of HVS, can work well in indicating the image quality.

4.3 Performance comparison

Table 3 lists the performance of RVSIM and 11 other state-of-the-art IQA methods (including PSNR, SSIM [12], GSSIM [23], MS-SSIM [21], IW-SSIM [26], FSIM [29], RFSIM [30], VSI [61], SCQI [13], MDSI [62], and SRSIM [63]) on the LIVE, CSIQ, TID2008, and TID2013 databases. The top 3 performances of the indices are highlighted in bold. Apart from GSSIM, the MATLAB source codes of all of the other methods were obtained from the authors. Compared with traditional methods such as PSNR, SSIM, GSSIM, and MS-SSIM, RVSIM exhibits a good performance on the LIVE and CSIQ databases. As we only conduct the orthogonal experiments based on LIVE database, but do not carry out on TID2008 and TID2013 databases, RVSIM performs slightly worse than the best results on TID2008 and TID2013 databases.

Table 3 Performance comparison of IQA methods on four benchmark databases

Full size table

Figure 8 shows the scatter distributions of the subjective DMOS versus the quality/distortion predicted scores by PSNR, SSIM, MS-SSIM, IW-SSIM, FSIM, SCQI, MDSI, RFSIM, and RVSIM indices on the LIVE database. Figure 8 shows that the scatter plot of RVSIM is evenly distributed throughout the coordinate system and has a strong linear relationship with DMOS, which indicates that the RVSIM model has a strong congruency with HVS.

The experiments on these four databases (LIVE, CSIQ, TID2008, and TID2013) are insufficient to illustrate the problem. This study conducted gMAD competition in the Waterloo Exploration database to test the performance of RVSIM objectively and fairly.

Figure 9 shows the competition ranking in the Waterloo Exploration database. In the gMAD competition experiment, the results of the ranking of the 16 state-of-the-art methods have been provided by the official framework [60]. The experimenter is only allowed to participate in the competition ranking on the basis of 16 algorithms that have been provided. The algorithm to be added in Fig. 9 a–f is RVSIM, SRSIM, RFSIM, VSI, MDSI, and SCQI respectively. Notably, the overall performance of RVSIM ranked first. In particular, the RVSIM performs consistently well in terms of aggressiveness, validating that it is a robust IQA method.

4.4 Discussion

In Table 3, the top 6 methods are highlighted in bold, i.e., MDSI (16 times in bold), SCQI (12 times in bold), VSI (9 times in bold), SRSIM (4 times in bold), FSIM (3 times in bold), and RVSIM (3 times in bold). In Fig. 9, the top 6 methods of the gMAD competition are RVSIM, SRSIM, MS-SSIM, MDSI, and RFSIM. The results are summarized in Table 3 and Fig. 9, and the algorithm rank statistics are shown in Table 4. The proposed RVSIM is highlighted in bold.

Table 4 Summary of the method rank statistics on five databases LIVE, CSIQ, TID2008, TID2013, and Waterloo Exploration

Full size table

Table 4 shows that the conclusion of indicator performance on the LIVE, CSIQ, TID2008, and TID2013 databases and the conclusion of gMAD competitive ranking on the Waterloo Exploration database are not exactly the same. MDSI ranked first in indicator performance, but ranked fifth in gMAD competition. SCQI ranked second in indicator performance, but performed poorly in gMAD competition. VSI ranked third in indicator performance, but ranked fourth in gMAD competition. SRSIM ranked fourth in indicator performance, but ranked second in gMAD competition. Although RVSIM, SRSIM, and MS-SSIM are not ranked at the top in indicator performance, they exhibited good results in gMAD competition. In particular, RVSIM had the highest rank in gMAD competition.

What results should be considered? The performance indices of the method and gMAD competition ranking are two kinds of judging basis. The performance indices can objectively reflect the performance of the method, but the benchmark databases only provide limited images because of the time-consuming and laborious subjective scoring. gMAD competitions are performed between methods. The results of competitive ranking objectively reflect the relative performance of the IQA models. However, the subjective scoring is needed because the Waterloo Exploration database is so large that the official did not provide DMOS of the image in advance. In other words, they have both rationality and restrictions. A method which has both good results in performance indices and gMAD competitive ranking is considered as an excellent and more objective method. From this point of view, RVSIM exhibits a more consistent and stable performance than the other methods.

5 Conclusion

This study proposes a FR IQA method called RVSIM, which combines Riesz transform and visual contrast sensitivity. RVSIM takes full advantage of the MS theory and Log-Gabor filter by exploiting CSF to allocate the weights of different frequency bands. At the same time, GM similarity is introduced to obtain the gradient similarity matrix. Then, the MPC matrix is used to construct the pooling function and obtain the RVSIM index.

This study conducts experiments involving the RVSIM index on five benchmark IQA databases. The conclusion of the indicator performance indicates that the RVSIM index delivers a highly competitive prediction accuracy on the LIVE and CSIQ databases. The scatter plot of the subjective DMOS versus scores obtained by RVSIM prediction on the LIVE database suggests that the RVSIM model has a strong congruency with HVS. The conclusion of gMAD competition ranking on the Waterloo Exploration database implies that the performance of the RVSIM method is better than that of advanced IQA methods. The overall performance on all five databases demonstrates that RVSIM is a robust IQA method.

References

C Yan, H Xie, D Yang, J Yin, Y Zhang, Q Dai, Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transp. Syst. PP(99), 1–12 (2017). https://doi.org/10.1109/TITS.2017.2749965.
Google Scholar
C Yan, Y Zhang, J Xu, F Dai, L Li, Q Dai, F Wu, A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Sig. Process Lett.21(5), 573–576 (2014).
Article Google Scholar
C Yan, Y Zhang, J Xu, F Dai, J Zhang, Q Dai, F Wu, Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans. Circ. Syst. Video Technol.24(12), 2077–89 (2014).
Article Google Scholar
Z Wang, EP Simoncelli, in Human Vision and Electronic Imaging, 5666. Reduced-reference image quality assessment using a wavelet-domain natural image statistic model (Proceedings of SPIE, San Jose, 2005), pp. 149–59.
Google Scholar
C Yan, H Xie, S Liu, J Yin, Y Zhang, Q Dai, Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transp. Syst. PP(99), 1–10 (2017). https://doi.org/10.1109/TITS.2017.2749977.
Google Scholar
G Xia, J Delon, Y Gousseau, Accurate junction detection and characterization in natural images. Int. J. Comput. Vis. 106(1), 31–56 (2014).
Article MathSciNet MATH Google Scholar
K Gu, G Zhai, X Yang, W Zhang, M Liu, in 2013 IEEE International Conference on Image Processing. Subjective and objective quality assessment for images with contrast change (IEEE, Melbourne, 2013), pp. 383–87.
Chapter Google Scholar
J Ma, J Zhao, J Tian, AL Yuille, Z Tu, Robust point matching via vector field consensus. IEEE Trans. Image Process. 23(4), 1706–21 (2014).
Article MathSciNet MATH Google Scholar
W Zhang, A Borji, Z Wang, P Le Callet, H Liu, The application of visual saliency models in objective image quality assessment: a statistical evaluation. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1266–78 (2016).
Article MathSciNet Google Scholar
W Lin, C-CJ Kuo, Perceptual visual quality metrics: a survey. J. Vis. Commun. Image Represent. 22(4), 297–312 (2011).
Article Google Scholar
Z Wang, AC Bovik, L Lu, in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference On, vol. 4. Why is image quality assessment so difficult? (IEEE, Orlando, 2002), p. 3313.
Google Scholar
Z Wang, AC Bovik, HR Sheikh, EP Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–12 (2004).
Article Google Scholar
S-H Bae, M Kim, A novel image quality assessment with globally and locally consilient visual quality perception. IEEE Trans. Image Process. 25(5), 2392–2406 (2016).
Article MathSciNet Google Scholar
K Gu, S Wang, H Yang, W Lin, G Zhai, X Yang, W Zhang, Saliency-guided quality assessment of screen content images. IEEE Trans. Multimed. 18(6), 1098–110 (2016).
Article Google Scholar
A Rehman, Z Wang, Reduced-reference image quality assessment by structural similarity estimation. IEEE Trans. Image Process. 21(8), 3378–89 (2012).
Article MathSciNet MATH Google Scholar
J Farah, M-R Hojeij, J Chrabieh, F Dufaux, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Full-reference and reduced-reference quality metrics based on sift (IEEE, Florence, 2014), pp. 161–165.
Chapter Google Scholar
S Xu, S Jiang, W Min, No-reference/blind image quality assessment: a survey. IETE Tech. Rev. 34(3), 223–45 (2017).
Article Google Scholar
Z Wang, AC Bovik, Mean squared error: love it or leave it? a new look at signal fidelity measures. IEEE Signal Proc. Mag. 26(1), 98–117 (2009).
Article Google Scholar
Z Wang, Applications of objective image quality assessment methods [applications corner]. IEEE Signal Proc. Mag. 28(6), 137–42 (2011).
Article Google Scholar
Z Wang, AC Bovik, A universal image quality index. IEEE Sig. Process Lett.9(3), 81–84 (2002).
Article Google Scholar
Z Wang, EP Simoncelli, AC Bovik, in Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference On, 2. Multiscale structural similarity for image quality assessment (IEEE, Florence, 2003), pp. 1398–402.
Google Scholar
G-H Chen, C-L Yang, L-M Po, S-L Xie, in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference On, vol. 2. Edge-based structural similarity for image quality assessment (IEEE, Florence, 2006).
Google Scholar
G-H Chen, C-L Yang, S-L Xie, in Image Processing, 2006 IEEE International Conference On. Gradient-based structural similarity for image quality assessment (IEEE, Atlanta, 2006), pp. 2929–32.
Chapter Google Scholar
C Li, AC Bovik, in IS&T/SPIE Electronic Imaging. Three-component weighted structural similarity index (International Society for Optics and Photonics, San Jose, 2009), p. 72420.
Google Scholar
MP Sampat, Z Wang, S Gupta, AC Bovik, MK Markey, Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18(11), 2385–401 (2009).
Article MathSciNet MATH Google Scholar
Z Wang, Q Li, Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 20(5), 1185–98 (2011).
Article MathSciNet MATH Google Scholar
HR Sheikh, AC Bovik, G De Veciana, An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 14(12), 2117–28 (2005).
Article Google Scholar
HR Sheikh, AC Bovik, Image information and visual quality. IEEE Trans. Image Process. 15(2), 430–44 (2006).
Article Google Scholar
L Zhang, L Zhang, X Mou, D Zhang, FSIM: a feature similarity index for image quality assessment. IEEE Trans. Image Process. 20(8), 2378–86 (2011).
Article MathSciNet MATH Google Scholar
L Zhang, L Zhang, X Mou, in Image Processing (ICIP), 2010 17th IEEE International Conference On. RFSIM: a feature based image quality assessment metric using Riesz transforms (IEEE, Hong Kong, 2010), pp. 321–324.
Chapter Google Scholar
X-G Luo, H-J Wang, S Wang, Monogenic signal theory based feature similarity index for image quality assessment. AEU-International J. Electron. Commun. 69(1), 75–81 (2015).
Article Google Scholar
P Cerejeiras, U Kähler, Monogenic signal theory. Oper. Theory (Springer,Basel, 2015).
MATH Google Scholar
DJ Field, Relations between the statistics of natural images and the response properties of cortical cells. JOSA A. 4(12), 2379–94 (1987).
Article Google Scholar
J Mannos, D Sakrison, The effects of a visual fidelity criterion of the encoding of images. IEEE Trans. Inf. Theory. 20(4), 525–36 (1974).
Article MATH Google Scholar
M Felsberg, G Sommer, The monogenic scale-space: a unifying approach to phase-based image processing in scale-space. J. Math. Imaging Vis. 21(1), 5–26 (2004).
Article MathSciNet Google Scholar
C Zhao, J Wan, L Ren, Image feature extraction based on the two-dimensional empirical mode decomposition. Image Sig. Process Congr. 1:, 627–31 (2008).
Google Scholar
M Felsberg, G Sommer, The monogenic signal. IEEE Trans. Sig. Process. 49(12), 3136–44 (2001).
Article MathSciNet MATH Google Scholar
K Langley, SJ Anderson, The Riesz transform and simultaneous representations of phase, energy and orientation in spatial vision. Vis. Res. 50(17), 1748–65 (2010).
Article Google Scholar
C Wachinger, T Klein, N Navab, The 2d analytic signal for envelope detection and feature extraction on ultrasound images. Med. Image Anal. 16(6), 1073–84 (2012).
Article Google Scholar
L Wietzke, G Sommer, O Fleischmann, in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On. The geometry of 2d image signals (IEEE, Miami, 2009), pp. 1690–7.
Chapter Google Scholar
D Boukerroui, JA Noble, M Brady, On the choice of band-pass quadrature filters. J. Math. Imaging Vis. 21(1-2), 53–80 (2004).
Article MathSciNet Google Scholar
JR Movellan, Tutorial on Gabor filters. Open Source Document (2002). http://mplab.ucsd.edu/tutorials/gabor.pdf. Accessed 24 July 2017.
S Fischer, R Redondo, G Cristóbal, How to construct Log-Gabor filters. Open Access Digit. CSIC Document. 21:, 1–9 (2009).
Google Scholar
P Kovesi, What are Log-Gabor filters and why are they good? (2006). www.peterkovesi.com/matlabfns/PhaseCongruency/Docs/convexpl.html. Accessed 24 July 2017.
HR Sheikh, Z Wang, L Cormack, AC Bovik, LIVE image quality assessment database release 2 (2005). [Online]. Available: http://live.ece.utexas.edu/research/quality.
P Kovesi, Phase congruency: a low-level image invariant. Psychol. Res. 64(2), 136–48 (2000).
Article Google Scholar
P Kovesi, Image features from phase congruency. Videre: J. Comput. Vis. Res.1(3), 1–26 (1999).
Google Scholar
P Kovesi, in The Australian Pattern Recognition Society Conference: DICTA 2003. Phase congruency detects corners and edges (The University of Queensland, Sydney, 2003).
Google Scholar
MN Gibbs, DJ MacKay, Variational Gaussian process classifiers. IEEE Trans. Neural Netw.11(6), 1458–64 (2000).
Article Google Scholar
Z Wang, AC Bovik, Modern image quality assessment. Synth. Lect. Image, Video Multimed. Process. 2(1), 1–156 (2006).
Article Google Scholar
X Gao, W Lu, D Tao, X Li, in Visual Communications and Image Processing 2010. Image quality assessment and human visual system (International Society for Optics and Photonics, Huangshan, 2010), p. 77440.
Chapter Google Scholar
DM Chandler, Seven challenges in image quality assessment: past, present, and future research. ISRN Sig. Process. 2013:, 1–53 (2013).
Article Google Scholar
K Gu, G Zhai, X Yang, W Zhang, in 2014 IEEE International Conference on Image Processing (ICIP). An efficient color image quality metric with local-tuned-global model (IEEE, Paris, 2014), pp. 506–510.
Chapter Google Scholar
EC Larson, DM Chandler, Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging. 19(1), 011006 (2010).
Article Google Scholar
N Ponomarenko, V Lukin, A Zelensky, K Egiazarian, M Carli, F Battisti, Tid2008-a database for evaluation of full-reference visual quality assessment metrics. Adv. Mod. Radioelectron.10(4), 30–45 (2009).
Google Scholar
N Ponomarenko, L Jin, O Ieremeiev, V Lukin, K Egiazarian, J Astola, B Vozel, K Chehdi, M Carli, F Battisti, et al., Image database tid2013: Peculiarities, results and perspectives. Signal Process. Image Commun.30:, 57–77 (2015).
Article Google Scholar
K Ma, Z Duanmu, Q Wu, Z Wang, H Yong, H Li, L Zhang, Waterloo exploration database: New challenges for image quality assessment models. IEEE Trans. Image Process. 26(2), 1004–1016 (2017).
Article MathSciNet Google Scholar
HR Sheikh, MF Sabir, AC Bovik, A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 15(11), 3440–51 (2006).
Article Google Scholar
P Corriveau, A Webster, Final report from the video quality experts group on the validation of objective models of video quality assessment, phase II, Video Quality Experts Group, CO, USA,Tech. Rep. Phase II, (2003).
M Kede, W Qingbo, W Zhou, Z Duanmu, H Yong, H Li, Z Lei, Group MAD competition—a new methodology to compare objective image quality models.
L Zhang, Y Shen, H Li, VSI: a visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 23(10), 4270–81 (2014).
Article MathSciNet MATH Google Scholar
HZ Nafchi, A Shahkolaei, R Hedjam, M Cheriet, Mean deviation similarity index: efficient and reliable full-reference image quality evaluator. IEEE Access. 4:, 5579–90 (2016).
Article Google Scholar
L Zhang, H Li, in Image Processing (ICIP), 2012 19th IEEE International Conference On. Sr-sim: A fast and high performance IQA index based on spectral residual (IEEE, Orlando, 2012), pp. 1473–76.
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to thank Jiahua Cao and Associate Professor Weizheng Jin for the valuable opinions they had offered during our heated discussions.

Funding

This study is partially supported by National Natural Science Foundation of China (NSFC) (No. 61571334) and National High Technology Research and Development Program (863 Program) (No. 2014AA09A512).

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, Wuhan, 430072, China
Guangyi Yang, Deshi Li, Fan Lu, Yue Liao & Wen Yang

Authors

Guangyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Deshi Li
View author publications
You can also search for this author in PubMed Google Scholar
Fan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yue Liao
View author publications
You can also search for this author in PubMed Google Scholar
Wen Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GY conducted the experiments and drafted the manuscript. FL and YL implemented the core algorithm and performed the statistical analysis. DL designed the methodology. WY modified the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guangyi Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

Availability of data and materials

The MATLAB source code of RVSIM can be downloaded at https://sites.google.com/site/jacobygy/ for public use and evaluation. You can change this program as you like and use it anywhere, but please refer to its original source.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Yang, G., Li, D., Lu, F. et al. RVSIM: a feature similarity method for full-reference image quality assessment. J Image Video Proc. 2018, 6 (2018). https://doi.org/10.1186/s13640-018-0246-1

Download citation

Received: 24 July 2017
Accepted: 08 January 2018
Published: 19 January 2018
DOI: https://doi.org/10.1186/s13640-018-0246-1

RVSIM: a feature similarity method for full-reference image quality assessment

Abstract

1 Introduction

2 Related works

2.1 Riesz transform

2.2 Log-Gabor filter

2.3 Monogenic phase congruency

2.4 Visual contrast sensitivity

3 Proposed RVSIM method

3.1 The proposed framework

3.2 RVSIM index

4 Experimental results and discussion

4.1 Determination of parameters

4.1.1 Determination of the constants C1, C2, and C3

4.1.2 Determination of the Log-Gabor filter bank

4.2 Two sample examples

4.3 Performance comparison

4.4 Discussion

5 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Additional information

Availability of data and materials

Rights and permissions

About this article

Cite this article

Share this article

Keywords

4.1.1 Determination of the constants C₁, C₂, and C₃