No-reference color image quality assessment: from entropy to perceptual quality

Chen, Xiaoqiao; Zhang, Qingyi; Lin, Manhui; Yang, Guangyi; He, Chu

doi:10.1186/s13640-019-0479-7

Research
Open access
Published: 06 September 2019

No-reference color image quality assessment: from entropy to perceptual quality

Xiaoqiao Chen¹,
Qingyi Zhang¹,
Manhui Lin¹,
Guangyi Yang ORCID: orcid.org/0000-0002-1580-0188¹ &
…
Chu He^1,2

EURASIP Journal on Image and Video Processing volume 2019, Article number: 77 (2019) Cite this article

8126 Accesses
58 Citations
Metrics details

Abstract

This paper presents a high-performance general-purpose no-reference (NR) image quality assessment (IQA) method based on image entropy. The image features are extracted from two domains. In the spatial domain, the mutual information between different color channels and the two-dimensional entropy are calculated. In the frequency domain, the statistical characteristics of the two-dimensional entropy and the mutual information of the filtered subband images are computed as the feature set of the input color image. Then, with all the extracted features, the support vector classifier (SVC) for distortion classification and support vector regression (SVR) are utilized for the quality prediction, to obtain the final quality assessment score. The proposed method, which we call entropy-based no-reference image quality assessment (ENIQA), can assess the quality of different categories of distorted images, and has a low complexity. The proposed ENIQA method was assessed on the LIVE and TID2013 databases and showed a superior performance. The experimental results confirmed that the proposed ENIQA method has a high consistency of objective and subjective assessment on color images, which indicates the good overall performance and generalization ability of ENIQA. The implementation is available on github https://github.com/jacob6/ENIQA.

1 Introduction

In this era of information explosion, we are surrounded by an overwhelming amount of information. The diversification of information is dazzling, and images, as the source of visual information, contain a wealth of valuable information. Considering the incomparable advantages of image information over other types of information, it is important to process images appropriately in the different fields [1]. In image acquisition, processing, transmitting, and recording, image distortion and quality degradation are an inevitable result of the imperfection of the imaging system, the processing method, the transmission medium, and the recording equipment, as well as object movement and noise pollution [2–4]. There is a direct effect of image quality on people’s subjective feelings and perception of information. For example, the quality of the collected images directly affects the accuracy and reliability of the recognition results in an image recognition process [5, 6]. Another example is that remote conferencing and video-on-demand systems are affected by such factors as transmission errors, network latency, and so on [7–9]. Online real-time image quality control is thus introduced to ensure that the service provider dynamically adjusts the source location strategy, in order to meet the service quality requirements [10]. It is therefore not surprising that research into image quality assessment (IQA) has received extensive attention during the last two decades [11].

In accordance with the need for human participation, IQA methods can be divided into two classes: subjective image quality assessment methods and objective image quality assessment methods [12]. Subjective assessment is quantified by the human eye. In contrast, an objective IQA method focuses on automatic assessment of the images via a specific automated, computer assisted method, with the ultimate goal of enabling a computer to model image processing properties of the human visual system (HVS) in viewing and perceiving images [13]. In practice, subjective assessment results are difficult to apply in real-time imaging systems due to their strong randomicity. Therefore, objective IQA methods have been widely studied [14]. According to the availability of a reference image, objective IQA methods can be classified as full-reference (FR), reduced-reference (RR), and no-reference (NR) methods [15]. In a FR method, an original "distortion-free" image is assumed to be supplied, as the assessment result is obtained through the comparison of the two images. With the advances of recent studies, the accuracy of this kind of method is getting better, despite its disadvantage of requiring a complete reference image, which is often not available in practical applications [16]. A RR method, which is also known as a partial reference method, does not make a complete comparison between the distorted image and the pristine one, but only compares certain features [17]. Conversely, a NR method, which is also called a blind image quality assessment (BIQA) method, requires no image as reference. Instead, the quality is estimated according to the features of the distorted image [15]. In many practical applications, a reference image will be inaccessible, and thus the NR-IQA methods have the most practical value and a very wide application potential [18].

In general, the current NR-IQA methods can be divided into two categories: application-specific and general-purpose assessment [19]. The former kind of method assesses the image quality of a specific distortion type and calculates the corresponding score. Common types of distortion include JPEG, JPEG2000 compression (JP2K), blur, contrast distortion, and noise. For images with compression degradation, Suthaharan et al. [20] proposed the visually significant blocking artifact metric (VSBAM) to estimate the degradation level caused by compression. For images with blur degradation, Ciancio et al. [21] utilized various spatial features and adopted a neural network model to assess the quality. The maximum local variation (MLV) method proposed by Khosro et al. [22] provides a fast method of blur level estimation. Rony et al. [23] put forward the concept of just noticeable blur (JNB) and the improved version of cumulative probability of blur detection (CPBD) [24]. For images with contrast distortion, Fang et al. [25] extracted features from the statistical characteristics of the 1-D image entropy distribution and developed an assessment model based on natural scene statistics (NSS) [26]. Hossein et al. [27] used higher orders of the Minkowski distance and entropy to apply an accurate measurement of the contrast distortion level. For images with noise, Yang et al. [28] proposed frequency mapping (FM) and introduced it into quality assessment. Gu et al. [29] proposed a training-free blind quality method based on the concept of information maximization. These methods, however, require prior knowledge of the distortion type, which limits their application range. Therefore, general-purpose NR-IQA methods based on training and learning are highly desirable.

General-purpose NR-IQA methods can be further divided into two types: explicit methods and implicit methods [30]. An explicit method usually contains two steps: feature extraction and model mapping [31]. Generally speaking, the features extracted in the first step represent the visual quality, while the mapping model in the second step bridges the gap between the features and the ground-truth quality score. An implicit general-purpose NR-IQA method constructs a mapping model via deep learning. Although deep networks nowadays generally have an independent feature extraction capability, it is difficult for the existing IQA databases to meet the huge demand for training samples, let alone the large amount of redundant data and network parameters. In addition, compared to preselected features, no clear physical meaning can be given by these automatically extracted features. Thus, manual feature extraction is still an effective and accurate way to summarize the whole image distortion.

According to the existing literature, the features extracted by explicit general-purpose NR-IQA methods are mainly concentrated in two categories: (1) The parameters of a certain model are obtained after a preprocessing operation such as mean-subtracted contrast-normalized (MSCN) coefficients [32]. The typical models are the generalized Gaussian distribution (GGD) model [33], the asymmetric GGD (AGGD) model [34], the Weibull distribution (WD) model [35], etc. (2) Physical quantities that reflect the characteristics of the image are obtained after preprocessing such as blocking and transformation. The typical methods are image entropy [36], wavelet subband correlation coefficients [37], etc. The mapping models from features to image quality are divided into three main types: (1) Classical methods such as BIQI [33], DIIVINE [37], DESIQUE [38], and SSEQ [36] follow a two-stage framework. The probability of each type of distortion in the image is gaged by a support vector classifier (SVC) and denoted as p_i in the first stage. The quality of the image along each of these distortions is then assessed by support vector regression (SVR) and denoted as q_i in the second stage. Finally, the quality of the image is expressed as a probability-weighted summation: $\text {Index}=\sum p_{i}q_{i}$. (2) Methods such as NIQE [39] and IL-NIQE [35] are classified as “distortion-unaware,” and they calculate the distance between a model fitted by features from a distorted image and an ideal model to estimate a final quality score, without identifying the type of distortion. (3) Methods such as BLIINDS-II [40] and BRISQUE [34] implement direct mapping of the image features to obtain a subjective quality score, also without distinguishing the different distortion types.

The existing general-purpose NR-IQA methods are faced with the following problems: (1) The color space of the image is less considered in these methods. (2) Some of the methods take advantage of the statistical features of the pixels only, and they ignore the spatial distribution of the features. Liu et al. [36] calculated the 1-D entropy of image blocks in the spatial and frequency domains, respectively, and used the mean, along with the skewness [41], of all the local entropy values as the image features to implement the SSEQ method. Gabarda et al. [42] approximated the probability density function by the spatial and frequency distribution to calculate the pixel-wise entropy on a local basis. The measured variance of the entropy is a function of orientation, which is used as an anisotropic indicator to estimate the fidelity and quality of the image [43]. Although some aggregated features of image grayscale distribution can be embodied in these one-dimensional entropy-based methods, the spatial features of the distribution cannot be obtained.

In this paper, we introduce a NR-IQA method based on image entropy, namely, ENIQA. Firstly, by using the two-dimensional entropy (TE) [44] instead of the one-dimensional entropy [45], the proposed method better embodies the correlativity of pixel neighbors. Secondly, we calculate the mutual information (MI) [46] between the different color channels and the TE of the color image in two scales. We split the image into patches in order to exploit the statistical properties of each local region. During this process, visual saliency detection [47] is performed to weight the patches, and the less important ones are then excluded. Thirdly, a log-Gabor filter [48, 49] is applied on the image to simulate the neurons’ selective response to stimulus orientation and frequency. After that, the MI between the different subband images and the TE of the filtered images are computed. The MI, as well as the mean and the skewness of the TE, is then utilized as the structural feature to determine the perceptual quality of the input image. Specifically, SVC and SVR are used to implement a two-stage framework for the final prediction. The experiments undertaken with the LIVE [50] and TID2013 [51] databases confirmed that the proposed ENIQA method performs well and shows a high consistency of subjective and objective assessment.

The rest of this paper is structured as follows. In Section 2, we introduce the structural block diagram of the novel IQA method proposed in this study and present a detailed introduction to image entropy, correlation analysis of the RGB color space, and the log-Gabor filter. Section 3 provides an experimental analysis, and describes the testing and verification of the proposed method from multiple perspectives. Finally, Section 4 concludes with a summary of our work.

2 Methods

In order to describe the local information of the image, the proposed ENIQA method introduces the MI and the TE in both the spatial and frequency domains. Given a color image whose quality is to be assessed, the MI between the three channels R, G, and B is first calculated (f₁– f₃). We convert the input image to grayscale and divide it into patches to calculate patch-wise entropy values. The obtained local entropy values are then pooled to compute the mean and the skewness (f₇– f₈). For the frequency domain features, we apply log-Gabor filtering at two center frequencies and in four orientations to the grayscale image and obtain eight subband images, on which blocking and entropy calculation are applied. The eight pairs of mean and skewness values are obtained from each subband (f₁₁– f₂₆). Furthermore, in order to acquire the relationship of the subband images, the MI between any two of the subband images in the four different orientations (f₄₃– f₄₈) and that between the two center frequencies (f₅₅) are also calculated, respectively. The image is down-sampled using the nearest-neighbor method to capture multiscale behavior, yielding another set of 28 features (f₄– f₆,f₉– f₁₀,f₂₇– f₄₂,f₄₉– f₅₄, and f₅₆). Thus, ENIQA extracts a total of 56 features for an input color image, as listed in Table 1. We group the features for a clearer representation. The right half of Fig. 1 illustrates the extraction process of the five feature groups.

Table 1 Features used for ENIQA

Full size table

After all the features are extracted, the proposed ENIQA method utilizes a two-stage framework to obtain a score index of the test image. In the first stage, the presence of a set of distortions in the image is estimated via SVC, giving the amount or probability of each type of distortion. In the second stage, for each type of distortion we consider, a support vector machine [52] is trained to perform a regression that maps the features to the objective quality. Finally, the quality score of the image is produced by a weighted summation, where the probabilities from the first stage are multiplied by the corresponding regressed scores from the second stage and then added altogether. The left half of Fig. 1 shows the structure of the two-stage framework.

2.1 Two-dimensional entropy

Image entropy is a statistical feature that reflects the average information content in an image. The one-dimensional entropy of an image represents the information contained in the aggregated features of the grayscale distribution in the image but does not contribute to the extraction of the structural features. In order to characterize the local structure of the image, TE that describes the spatial correlation of the grayscale values is introduced.

After the color image X is converted to grayscale, the neighborhood mean of the grayscale image is selected as the spatial distribution feature. Let p(x) denote the proportion of pixels whose gray value is x in image X, the one-dimensional entropy of a gray image is defined as follows:

$$ H_{1}({X}) = -\sum\limits_{x=0}^{255}p(x)\log_{2} p(x) $$

(1)

The gray level of the current pixel and the neighborhood mean then form a feature pair, which is denoted as (x₁,x₂), where x₁ is the gray level of the pixel (0≤x₁≤255) and x₂ is the mean value of the neighbors (0≤x₂≤255). The combined probability density distribution function of x₁ and x₂ is given by the following:

$$ p(x_{1}, x_{2})=\frac{f(x_{1}, x_{2})}{MN} $$

(2)

where f(x₁,x₂) is the frequency at which the feature pair (x₁,x₂) appears, and the size of X is M×N.

In our implementation, x₂ is based on the eight adjacent neighbors of the center pixel, as shown in Fig. 2. The discrete TE is defined as follows:

$$ H_{2}({X})=-\sum\limits_{x_{1}=0}^{255}\sum\limits_{x_{2}=0}^{255}p(x_{1}, x_{2}) \log_{2} p(x_{1}, x_{2}) $$

(3)

The TE based on the above can describe the comprehensive features of the grayscale information of the pixel and the grayscale distribution in the neighborhood of the pixel. We determined the TE for a reference image (monarch.bmp in the LIVE [50] database) and the five corresponding distorted images with the same distortion level but different distortion types. The statistical characteristics are shown in Fig. 3a and b. All the differential mean opinion score (DMOS) [53] values are around 25 in Fig. 3a and 50 in Fig. 3b, and the distortion types span JPEG and JP2K compression, additive white Gaussian noise (WN), Gaussian blue (GBlur), and fast fading (FF) Rayleigh channel distortion. Similarly, the same experiments were carried out on monarch.bmp and the five corresponding distorted images with the image distortion type but different distortion levels (taking WN and GBlur as examples), whose statistical characteristics are shown in Fig. 3c and d. In Fig. 3, the abscissa axis represents the entropy and the vertical axis represents the normalized number of blocks. It can be seen that both the distortion level and the distortion type can be distinguished by TE. Consequently, the TE can be considered a meaningful feature. Inspired by [26, 36, 54], we utilize the mean and skewness as the most typical features to describe the histogram.

The HVS automatically sets different priorities of attention for different regions of the observed image [47]. Thus, before calculating the statistical characteristics of the TE, we conducted visual saliency detection on the image, i.e., only the more important image patches were involved in the subsequent computation. To realize this, we first split the image into patches, pooled the patches according to human vision priority, and screened out the more significant ones. Then, according to the saliency values, we sorted the patches and calculated the mean and skewness of the local TE on the 80% more important patches only. In the experiments, we used the spectral residual (SR) method [55] to generate the saliency map of the image to be measured. It is worth noting that the frequencies of different pixel values (integers from 0 to 255) are counted in every important patch to estimate the probability distributions in Eq. (3).

2.2 Mutual information

The application of colors in image display can not only stimulate the eye, but also allows the observer to perceive more information. The human eye has the ability to distinguish between thousands of colors, in spite of the perception of only dozens of gray levels [56]. There is a strong correlation between the RGB components of an image, which is embodied by the fact that the changes of individual color components reflected in the same region tend to be synchronized, i.e. when the color of a certain area of a natural color image changes, the pixel gray values of the corresponding R, G, and B components also change at the same time. Moreover, although the gray value of a pixel varies with the color channels, different RGB components have a high similarity and consistency in textures, edges, phases, and grayscale gradients [57]. Therefore, it is meaningful to characterize the MI between the three channels of R, G, and B.

Taking R and G as an example, x_r and x_g are the gray values of the red and green components of the input color image X, while p(x_r),p(x_g) are the grayscale probability distribution functions in the two channels. p(x_r,x_g) is the joint probability distribution function. The MI between the R and G channels is then formulated as follows:

$$ \begin{aligned} I({X}_{R}; {X}_{G}) &= H_{1}({X}_{R}) + H_{1}({X}_{G}) - H_{2}({X}_{R}, {X}_{G}) \\ &= \sum\limits_{x_{r}=0}^{255}\sum\limits_{x_{g}=0}^{255}p(x_{r}, x_{g}) \log_{2} \frac{p(x_{r}, x_{g})}{p(x_{r}) p(x_{g})} \end{aligned} $$

(4)

where H₁(X_R) and H₁(X_G) are the one-dimensional entropy of the corresponding channel, and H₂(X_R,X_G) represents the two-dimensional entropy between the two images, which is defined as follows:

$$ H_{2}({X}_{R}, {X}_{G})=-\sum\limits_{x_{r}=0}^{255}\sum\limits_{x_{g}=0}^{255}p(x_{r}, x_{g}) \log_{2} p(x_{r}, x_{g}) $$

(5)

2.3 Log-Gabor filtering

It is known that the log-Gabor filter function conforms to the HVS and is consistent with the symmetry of the cellular response of the human eye at logarithmic frequency scales [58]. The log-Gabor filter eliminates the DC component, overcomes the bandwidth limitation of the conventional Gabor filter, and has a typical frequency response with a Gaussian shape [48]. Thus, it is much easier, as well as more efficient, for a log-Gabor filter to extract information on a higher band. The transfer function of a two-dimensional log-Gabor filter can be expressed as follows:

$$ G(f, \theta) = \exp\left(-\frac{(\log(f/f_{0}))^{2}}{2(\log(\sigma_{r}/f_{0}))^{2}} \right) \exp\left(-\frac{(\theta - \theta_{0})^{2}}{2\sigma_{\theta}^{2}} \right) $$

(6)

In Eq. 6, f₀ gives the center frequency and θ₀ represents the center orientation. σ_r and σ_θ are the width parameters for the frequency and the orientation, respectively.

We distill the features in the frequency domain by convolving the image with the log-Gabor filter. The log-Gabor filter bank designed in this study consists of eight filters, with orientations of 0^∘,45^∘,90^∘, and 135^∘, and two frequency bands. Eight subband images in four orientations and two bands are obtained after the input image is filtered.

3 Results and discussion

In order to assess the performance of the proposed method, we carried out experiments on the LIVE [50] and TID2013 [51] databases. The LIVE database consists of 29 reference images and 779 distorted images of five distortion types, while the TID2013 database contains 25 reference images and 3000 distorted images of 24 distortion types. Of these 25 images, only 24 are natural images, so we only used the 24 natural images in the testing. In order to ensure the consistency of the training and testing, we carried out the cross-database testing over the four of the five distortion types that are in common with the LIVE database, namely, JP2K, JPEG, WN, and GBlur.

The indices used to measure the performance of the proposed method are the Spearman’s rank-order correlation coefficient (SROCC), the Pearson linear correlation coefficient (PLCC), and the root-mean-square error (RMSE) between the predicted scores and the ground-truth DMOS [59]. A value close to 1 for SROCC and PLCC and a value close to 0 for RMSE indicates better correlation with human perception. Note that PLCC and RMSE were computed after the predicted scores were fitted by a nonlinear logistic regression function with five parameters [53]:

$$ f(z)=\beta_{1}\left[\frac{1}{2}-\frac{1}{1+exp(\beta_{2}(z-\beta_{3}))} \right]+\beta_{4}z+\beta_{5} $$

(7)

where z is the objective IQA score, f(z) is the IQA regression fitting score, and β_i(i=1,2,⋯,5) are the parameters of the regression function.

3.1 Correlation of feature vectors with human opinion

In this experiment, we assessed the discriminatory power of different feature combinations. With the feature groups listed in Table 1, we visually illustrate the relationship between image quality and features in the form of two-dimensional /three-dimensional scatter plots. As shown in Fig. 4, the different feature combinations are used as the axes, and each image in the LIVE database corresponds to a scatter point in the coordinate system. Furthermore, we use different markings to distinguish the five types of distortion and map the score of each image to the preset colormap. The ideal case is that the points with different distortion types are well separated. In this paper, we selected only a few representative images as examples. It can be seen from Fig. 4a and b that the scatter points of JPEG and WN have a very different spatial distribution than the other points, which allows them to be better distinguished. From Fig. 4c and d, we can see that GBlur can be distinguished, to some extent, from the other types of distortion. However, for GBlur points with lower distortion levels, they cannot be easily separated from FF and JP2K, since the distributions of the scatter points of these three distortion types are very similar. As can be observed in Fig. 4e and f, images with higher distortion levels of WN, GBlur, and FF are more easily distinguished from images with good quality. Nonetheless, GBlur and FF are indistinguishable. And still, JP2K points cause the reduction of distinguishability, as some of them are scattered close to the highly-distorted GBlur and FF points. According to Fig. 4, the number of features we selected seems too small to distinguish all the distortion types. Due to the limitation of human spatial cognition, it is not possible to show the discriminative ability of the features in a graphical way, such as a four-dimensional scatter plot, by selecting feature combinations of a higher dimension. In Section 3.6, we prove that if more features are selected (actually, we chose 56-dimensional features), the discriminatory power of the feature vector on the distortion type is further enhanced, which indicates the accuracy and reliability of our selection of features.

3.2 Correlation of individual feature vectors with human perception

In order to quantitatively study the predictive ability of each feature vector [60, 61], we performed a recombination of the features in Table 1, separately deployed specific subsets (feature vectors), and designed three limited models: (1) The feature vector f₁– f₆ represents the MI between the three color channels on two scales, denoted as ENIQA₁. (2) The feature vector f₇– f₄₂ represents the mean and skewness of the TE on two scales, denoted as ENIQA₂. (3) The feature vector f₄₃– f₅₆ represents the MI between the subband images on two scales, denoted as ENIQA₃.

We performed the assessment of these three limited models by 1000 train-test iterations of cross-validation. In each iteration, we randomly split the LIVE [50] database into two non-overlapping sets: a training set comprising 80% of the reference images as well as their corresponding distorted counterparts, and a test set composed of the remaining 20%. Finally, the median SROCC, PLCC, and RMSE values over 1000 trials are reported as the final performance indices, as shown in Table 2. It is not difficult to see that each feature vector has a different degree of correlation with the subjective assessment. Among them, the TE contributes the most to the performance of the method, followed by the MI between the subband images. Although the MI between the color channels contributes the least, it is a valuable extension of the TE feature.

Table 2 Median SROCC/PLCC/RMSE values across 1000 train-test trials on the LIVE database

Full size table

3.3 Variation with window size

As mentioned above, since the local saliency difference of the image is considered, the proposed ENIQA method blocks the image with a window and counts the frequency of the gray values in each block to generate feature pairs before calculating the local TE. Table 3 shows the effect of different window sizes (K×L) on the performance of the proposed method, where the highest SROCC value of each column is italicized. The average time consumption for assessing a single image is also reported in Table 3. All the experiments are performed on a PC with Intel-i7-6700K CPU@4.0GHz, 16G RAM, MATLAB R2016a. The elapsed time is the mean value measured through 10 times of evaluations on the same 384×512×3 image.

Table 3 Median SROCC value of ENIQA on the LIVE database with different window sizes

Full size table

In order to visualize the trend, we also drew two line charts in Fig. 5, which intuitively illustrate the change of the elapsed time and the overall SROCC value with the selected window size.

It can be observed that the performance of the proposed method varies with the size of the window. As the window size increases, the SROCC value shows a trend of increasing first and then decreasing, reaching a peak at 8×8. At the same time, the runtime of the method mostly decreases monotonically with the increase of the window size. To make a compromise, we used K=L=8 in this study. It should be pointed out that the overall SROCC value maintains above 0.9 when the window size is 16×16, which implies that the window size can be appropriately increased to trade accuracy in time-critical applications.

3.4 Comparison with other iQA methods

To further illustrate the superiority of the proposed method, we compared ENIQA with 10 other state-of-the-art IQA methods. The three FR-IQA approaches were the peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM) [12], and visual information fidelity (VIF) [62], and the seven NR-IQA approaches were BIQI [33], DIIVINE [37], BLIINDS-II [40], BRISQUE [34], NIQE [39], ILNIQE [35], and SSEQ [36]. To make a fair comparison, we used the same 80% training /20% testing protocol over 1000 iterations on all the models. The source code of all the methods was provided by the authors. In the training of a NR model, the LIBSVM toolkit [63] was used to implement the SVC and SVR, both adopting a radial-basis function (RBF) kernel. We selected an e−SVR model for regression, and both the cost and γ for RBF are set to 1e⁻⁴. Since the FR approaches do not require a training procedure, they were only performed on distorted images, i.e., the reference images were not included. For the results listed in Table 4, The top performances of the FR-IQA and NR-IQA indices are italicized. The second-best results of the NR-IQA indices are highlighted in bold.

Table 4 Median SROCC/PLCC/RMSE values on the LIVE database

Full size table

It can be seen that the proposed ENIQA method performs well on the LIVE database. To be specific, ENIQA obtains the highest SROCC values for JPEG and WN, and the second-highest overall SROCC value among the NR methods listed in Table 4. In terms of PLCC and RMSE, ENIQA is superior to all the other NR methods, except BRISQUE, on JPEG and WN, and also ranks second in overall performance. Generally speaking, the overall performance of the proposed ENIQA method is superior to most of the other NR methods and is ahead of some of the classic FR methods such as SSIM. Moreover, ENIQA is rather good at evaluating images with distortions of JPEG and WN.

3.5 Statistical significance testing

In order to compare the performance of the different methods in a more intuitive way, Fig. 6 shows a box plot of the SROCC distributions for the 11 IQA methods (including the proposed ENIQA method) across 1000 train-test trials, which provides key information about the location and dispersion of the data. We also performed a two-sample t test [64] between the methods, and the results are shown in Table 5. The null hypothesis is that the mean correlation value of the row is equal to the mean correlation value of the column at the 95% confidence level. The alternative hypothesis is that the mean correlation value of the row is greater (or less) than the mean correlation value of the column. Table 5 indicates which row is statistically superior (“1”), statistically equivalent (“0”), or statistically inferior (“ −1”) to which column. Although BRISQUE and SSEQ are statistically superior to ENIQA in Table 5, it can be seen from Fig. 6 that ENIQA outperforms all the other FR and NR approaches, except BRISQUE, in terms of the median value.

Table 5 Results of the t-tests performed between SROCC values

Full size table

3.6 Classification performance analysis

We analyzed the classification accuracy of ENIQA on the LIVE database based on the two-stage framework. The average classification accuracies for all the distortion types across 1000 random trials are listed in Table 6. It can be seen that if the feature dimensions reach 56, the classification accuracy of JP2K reaches 71.6369%, which is fairly acceptable. In Section 3.1, however, we showed that it is extremely difficult to distinguish JP2K images by low-dimensional feature vectors. Thus, we can speculate that in the 56-dimensional space composed of the features, the distorted images of the JP2K type are discernible by the hyperplane constructed by SVC. Furthermore, in order to visualize which distortion types may be confused with each other, we plotted a confusion matrix [65], as shown in Fig. 7. Each value in the confusion matrix indicates the probability of the distortion type on the vertical axis being confused with that on the horizontal axis. The numerical values are the average classification accuracies of the 1000 random trials.

Table 6 Mean classification accuracy across 1000 train-test trials

Full size table

It can be seen from Table 6 and Fig. 7 that WN cannot easily be confused with the other distortion types, while the other four distortion types are more easily confused. As FF consists of JP2K followed by packet loss, it becomes clear that FF distortion is more easily confused with JP2K compression distortion. From Fig. 3, we can also see that the TE distributions of WN and JPEG are very specific, while JP2K, GBlur, and FF have quite similar TE distributions, which results in them being more easily confused.

3.7 Database independence

In order to test the generalization ability of the assessment model to different samples, we trained the model on the whole LIVE or TID2013 database and tested it on the TID2013 or LIVE database, noting that we only chose distortion types in common with the LIVE database (JP2K, JPEG, WN, and GBlur). The computed performance indices are shown in Table 7, and the top performances for the FR-IQA indices and those for the NR-IQA indices are highlighted in italics. For the NR-IQA indices, we have also boldfaced the second-best results. On the one hand, ENIQA achieves the best performance indices when trained on LIVE and tested on TID2013. On the other hand, when trained on TID2013 and tested on LIVE, ENIQA ranks in the top three, though not the best one, with the SROCC higher than 0.9. It is worth noting that ENIQA outperforms BRISQUE in both cross-database experiments, which embodies the good generalization ability of ENIQA.

Table 7 Performance indices obtained by training on the LIVE or TID2013 database and testing on the TID2013 or LIVE database

Full size table

Figure 8 shows the results of the scatter plot fitting of ENIQA on the LIVE and TID2013 databases. As in the previous experiments, when performing the scatter plot experiment on the LIVE database, we trained with the random 80% of the images separated by content in the LIVE database and then tested with the remaining 20%, for which the results are shown in Fig. 8a. When conducting the experiment on the TID2013 database, we trained the model on the entire LIVE database and then tested it on the selected portion of the TID2013 database, for which the results are given in Fig. 8b. It can be observed from Fig. 8 that the scatter points are evenly distributed in the entire coordinate system and have a strong linear relationship with DMOS /MOS, which further proves the superior overall performance and generalization ability of the proposed ENIQA method.

3.8 Runtime analysis

Table 8 shows the average running time of the 11 IQA methods, which is measured through 10 times of assessments on the same 384×512×3 image. All MATLAB source codes of the IQA methods, apart from PSNR, are the official implementations from the original authors. For ENIQA, the window size is set to 8×8. It can be seen that ENIQA maintains moderate computation in addition to its superior performance.

Table 8 The average running time of the 11 IQA methods

Full size table

4 Conclusions

In this paper, we proposed a general-purpose NR-IQA method called entropy-based no-reference image quality assessment (ENIQA). Based on the concept of image entropy, ENIQA combines log-Gabor filtering and saliency detection for feature extension and accuracy improvement. To construct an effective feature vector, ENIQA extracts the structural information of the input color images, including the MI and the TE in both the spatial and the frequency domains. The image quality score is then predicted by the SVC and SVR. The proposed ENIQA method was assessed on the LIVE and TID2013 databases, and we carried out cross-validation experiments and cross-database experiments to compare it with several other FR- and NR-IQA approaches. From the experiments, ENIQA showed a superior overall performance and generalization ability when compared to the other state-of-the-art methods.

Abbreviations

AGGD:: Asymmetric generalized Gaussian distribution
BIQA:: Blind image quality assessment
CPBD:: Cumulative probability of blur detection
DMOS:: Differential mean opinion score
ENIQA:: Entropy-based no-reference image quality assessment
FF:: Fast fading
FM:: Frequency mapping
FR:: Full reference
GBlur:: Gaussian blur
GGD:: Generalized Gaussian distribution
HVS:: Human visual system
IQA:: Image quality assessment
JNB:: Just noticeable blur
JP2K:: JPEG2000
MI:: Mutual information
MLV:: Maximum local variation
MOS:: Mean opinion score
MSCN:: Mean-subtracted contrast-normalized
NR:: No-reference
NSS:: Natural scene statistics
PLCC:: Pearson linear correlation coefficient
PSNR:: Peak signal-to-noise relationship
RBF:: Radial-basis function
RMSE:: Root-mean-square error
RR:: Reduced-reference
SR:: Spectral residual
SROCC:: Spearman’s rank-order correlation coefficient
SSIM:: Structural similarity index
SVC:: Support vector classifier
SVR:: Support vector regression
TE:: Two-dimensonal entropy
VIF:: Visual information fidelity
VSBAM:: Visually significant blocking artifact metric
WD:: Weibull distribution
WN:: Additive white Gaussian noise

References

P. Mohammadi, A. Ebrahimimoghadam, S. Shirani, Subjective and objective quality assessment of image: A survey. Majlesi J. Electr. Eng.9(1), 55–83 (2014).
Google Scholar
Y. Fang, K. Zeng, Z. Wang, W. Lin, Z. Fang, C. -W. Lin, Objective quality assessment for image retargeting based on structural similarity. IEEE J. Emerg. Sel. Top. Circ. Syst.4(1), 95–105 (2014).
Article Google Scholar
X. Zhang, J. Li, H. Wang, D. Xiong, J. Qu, H. Shin, J. P. Kim, T. Zhang, Realizing transparent os/apps compression in mobile devices at zero latency overhead. IEEE Trans. Comput.66(7), 1188–1199 (2017).
Article MathSciNet Google Scholar
K. Gu, D. Tao, J. -F. Qiao, W. Lin, Learning a no-reference quality assessment model of enhanced images with big data. IEEE Trans. Neural Netw. Learn. Syst.29(4), 1301–1313 (2018).
Article Google Scholar
H. Fronthaler, K. Kollreider, J. Bigun, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Automatic image quality assessment with application in biometrics (IEEENew York, 2006), pp. 30–37.
Google Scholar
C. Yan, H. Xie, J. Chen, Z. Zha, X. Hao, Y. Zhang, Q. Dai, A fast uyghur text detector for complex background images. IEEE Trans. Multimed.20(12), 3389–3398 (2018).
Article Google Scholar
Q. Li, Z. Wang, Reduced-reference image quality assessment using divisive normalization-based image representation. IEEE J. Sel. Top. Signal Process.3(2), 202–211 (2009).
Article Google Scholar
C. Yan, L. Li, C. Zhang, B. Liu, Y. Zhang, Q. Dai, Cross-modality bridging and knowledge transferring for image understanding. Trans. Multimed. IEEE, 1–10 (2019). https://doi.org/10.1109/TMM.2019.2903448.
C. Yan, Y. Tu, X. Wang, Y. Zhang, X. Hao, Q. Dai, Stat: Spatial-temporal attention mechanism for video captioning. IEEE Trans. Multimed. (2019). https://doi.org/10.1109/TMM.2019.2924576.
S. S. Hemami, A. R. Reibman, No-reference image and video quality estimation: Applications and human-motivated design. Signal Process. Image Commun.25(7), 469–481 (2010).
Article Google Scholar
D. M. Chandler, Seven challenges in image quality assessment: Past, present, and future research. ISRN Signal Process.2013:, 1–53 (2013).
Article Google Scholar
Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004).
Article Google Scholar
W. Lin, C. -C. J. Kuo, Perceptual visual quality metrics: A survey. J. Visual Commun. Image Represent.22(4), 297–312 (2011).
Article Google Scholar
A. K. Moorthy, A. C. Bovik, Visual quality assessment algorithms: What does the future hold?. Multimed. Tools Appl.51(2), 675–696 (2011).
Article Google Scholar
A. C. Bovik, Automatic prediction of perceptual image and video quality. Proc. IEEE. 101(9), 2008–2024 (2013).
MathSciNet Google Scholar
W. Xue, L. Zhang, X. Mou, A. C. Bovik, Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Process.23(2), 684–695 (2014).
Article MathSciNet MATH Google Scholar
Z. Wang, A. C. Bovik, Reduced- and no-reference image quality assessment. IEEE Signal Process. Mag.28(6), 29–40 (2011).
Article Google Scholar
S. Xu, S. Jiang, W. Min, No-reference/blind image quality assessment: A survey. Iete Tech. Rev.34(3), 223–245 (2017).
Article Google Scholar
V. Kamble, K. Bhurchandi, No-reference image quality assessment algorithms: A survey. Optik-Int. J. Light Electron Optics. 126(11-12), 1090–1097 (2015).
Article Google Scholar
S. Suthaharan, No-reference visually significant blocking artifact metric for natural scene images. Signal Process.89(8), 1647–1652 (2009).
Article MATH Google Scholar
A. Ciancio, A. L. N. T. da Costa, E. A. da Silva, A. Said, R. Samadani, P. Obrador, No-reference blur assessment of digital pictures based on multifeature classifiers. IEEE Trans. Image Process.20(1), 64–75 (2011).
Article MathSciNet MATH Google Scholar
K. Bahrami, A. C. Kot, A fast approach for no-reference image sharpness assessment based on maximum local variation. IEEE Signal Process. Lett.21(6), 751–755 (2014).
Article Google Scholar
R. Ferzli, L. J. Karam, A no-reference objective image sharpness metric based on the notion of just noticeable blur (jnb). IEEE Trans. Image Process.18(4), 717–728 (2009).
Article MathSciNet MATH Google Scholar
N. D. Narvekar, L. J. Karam, A no-reference image blur metric based on the cumulative probability of blur detection (cpbd). IEEE Trans. Image Process.20(9), 2678–2683 (2011).
Article MathSciNet MATH Google Scholar
Y. Fang, K. Ma, Z. Wang, W. Lin, Z. Fang, G. Zhai, No-reference quality assessment of contrast-distorted images based on natural scene statistics. IEEE Signal Process. Lett.22(7), 838–842 (2015).
Google Scholar
D. L. Ruderman, The statistics of natural images. Network: Comput. Neural Syst.5(4), 517–548 (1994).
Article MATH Google Scholar
H. Z. Nafchi, M. Cheriet, Efficient no-reference quality assessment and classification model for contrast distorted images. IEEE Trans. Broadcast.64(2), 518–523 (2018).
Article Google Scholar
G. Yang, Y. Liao, Q. Zhang, D. Li, W. Yang, No-reference quality assessment of noise-distorted images based on frequency mapping. IEEE Access. 5:, 23146–23156 (2017).
Article Google Scholar
K. Gu, W. Lin, G. Zhai, X. Yang, W. Zhang, C. W. Chen, No-reference quality metric of contrast-distorted images based on information maximization. IEEE Trans. Cybernet.47(12), 4559–4565 (2017).
Article Google Scholar
J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, A. C. Bovik, Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment. IEEE Signal Process. Mag.34(6), 130–141 (2017).
Article Google Scholar
J. Guan, S. Yi, X. Zeng, W. Cham, X. Wang, Visual importance and distortion guided deep image quality assessment framework. IEEE Trans. Multimed.19(11), 2505–2520 (2017).
Article Google Scholar
K. Gu, G. Zhai, X. Yang, W. Zhang, Using free energy principle for blind image quality assessment. IEEE Trans. Multimed.17(1), 50–63 (2015).
Article Google Scholar
A. K. Moorthy, A. C. Bovik, A two-step framework for constructing blind image quality indices. IEEE Signal Process. Lett.17(5), 513–516 (2010).
Article Google Scholar
A. Mittal, A. K. Moorthy, A. C. Bovik, No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process.21(12), 4695–4708 (2012).
Article MathSciNet MATH Google Scholar
L. Zhang, L. Zhang, A. C. Bovik, A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process.24(8), 2579–2591 (2015).
Article MathSciNet MATH Google Scholar
L. Liu, B. Liu, H. Huang, A. C. Bovik, No-reference image quality assessment based on spatial and spectral entropies. Signal Process. Image Commun.29(8), 856–863 (2014).
Article Google Scholar
A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Trans. Image Process.20(12), 3350–3364 (2011).
Article MathSciNet MATH Google Scholar
Y. Zhang, D. M. Chandler, No-reference image quality assessment based on log-derivative statistics of natural scenes. J. Electron. Imaging. 22(4), 043025 (2013).
Article Google Scholar
A. Mittal, R. Soundararajan, A. C. Bovik, Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett.20(3), 209–212 (2013).
Article Google Scholar
M. A. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A natural scene statistics approach in the dct domain. IEEE Trans. Image Process.21(8), 3339–3352 (2012).
Article MathSciNet MATH Google Scholar
I. Motoyoshi, S. Nishida, L. Sharan, E. H. Adelson, Image statistics and the perception of surface qualities. Nature. 447(7141), 206 (2007).
Article Google Scholar
S. Gabarda, G. Cristóbal, Blind image quality assessment through anisotropy. J. Opt. Soc. Am. A. 24(12), 42–51 (2007).
Article Google Scholar
L. Dong, Y. Fang, W. Lin, C. Deng, C. Zhu, H. S. Seah, Exploiting entropy masking in perceptual graphic rendering. Signal Process Image Commun.33:, 1–13 (2015).
Article Google Scholar
A. S. Abutaleb, Automatic thresholding of gray-level pictures using two-dimensional entropy. Comput. Vis. Graph. Image Process.47(1), 22–32 (1989).
Article Google Scholar
A. Brink, Using spatial information as an aid to maximum entropy image threshold selection. Patt. Recog. Lett.17(1), 29–36 (1996).
Article MathSciNet Google Scholar
F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, P. Suetens, Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging. 16(2), 187–198 (1997).
Article Google Scholar
W. Zhang, A. Borji, Z. Wang, P. Le Callet, H. Liu, The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Trans. Neural Netw. Learn. Syst.27(6), 1266–1278 (2016).
Article MathSciNet Google Scholar
D. J. Field, Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A. 4(12), 2379–2394 (1987).
Article Google Scholar
L. Zhang, L. Zhang, X. Mou, D. Zhang, et al, Fsim: A feature similarity index for image quality assessment. IEEE Trans. Image Process.20(8), 2378–2386 (2011).
Article MathSciNet MATH Google Scholar
H. R. Sheikh, Z. Wang, L. Cormack, A. C. Bovik, LIVE image quality assessment database release 2 (2005). http://live.ece.utexas.edu/research/quality.
N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, et al, Image database tid2013: Peculiarities, results and perspectives. Signal Process. Image Commun.30:, 57–77 (2015).
Article Google Scholar
C. J. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov.2(2), 121–167 (1998).
Article Google Scholar
H. R. Sheikh, M. F. Sabir, A. C. Bovik, A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process.15(11), 3440–3451 (2006).
Article Google Scholar
L. Liu, Y. Hua, Q. Zhao, H. Huang, A. C. Bovik, Blind image quality assessment by relative gradient statistics and adaboosting neural network. Signal Process. Image Commun.40:, 1–15 (2016).
Article Google Scholar
X. Hou, L. Zhang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Saliency detection: A spectral residual approach (IEEEMinneapolis, 2007), pp. 1–8.
Google Scholar
A. Rose, Quantum effects in human vision. Adv. Biol. Med. Phys.5:, 211–242 (1957).
Article Google Scholar
J. Ren, J. Jiang, D. Wang, S. Ipson, Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process.4(4), 294–301 (2010).
Article Google Scholar
D. Boukerroui, J. A. Noble, M. Brady, On the choice of band-pass quadrature filters. J. Math. Imaging Vis.21(1-2), 53–80 (2004).
Article MathSciNet Google Scholar
P. Corriveau, A. Webster, Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii.Tech. Rep. (2003). https://www.itu.int/md/T01-SG09-C-0060.
G. Zhao, S. Liu, Estimation of discriminative feature subset using community modularity. Sci. Rep.6(25040), 1–16 (2016).
Google Scholar
H. Tao, C. Hou, F. Nie, Y. Jiao, D. Yi, Effective discriminative feature selection with nontrivial solution. IEEE Trans. Neural Netw. Learn. Syst.27(4), 796–808 (2015).
Article MathSciNet Google Scholar
H. R. Sheikh, A. C. Bovik, Image information and visual quality. IEEE Trans. Image Process.15(2), 430–444 (2006).
Article Google Scholar
C. -C. Chang, C. -J. Lin, Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol.2(3), 27 (2011).
Article Google Scholar
D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd (CRC Press, London, U.K., 2003).
Book MATH Google Scholar
R. G. Congalton, A review of assessing the accuracy of classifications of remotely sensed data. Remote. Sens. Environ.37(1), 35–46 (1991).
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Prof. Wen Yang and Prof. Hongyan Zhang for the valuable opinions they have offered during our heated discussions.

Funding

This study is partially supported by the National Natural Science Foundation of China (NSFC) (No. 61571334, 61671333), National High Technology Research and Development Program (863 Program) (No. 2014AA09A512), and National Key Research and Development Program of China (No. 2018YFB0504500).

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, Wuhan, 430072, China
Xiaoqiao Chen, Qingyi Zhang, Manhui Lin, Guangyi Yang & Chu He
Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan, 430079, China
Chu He

Authors

Xiaoqiao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qingyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Manhui Lin
View author publications
You can also search for this author in PubMed Google Scholar
Guangyi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chu He
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XC conducted the experiments and drafted the manuscript. QZ and ML implemented the core method and performed the statistical analysis. GY designed the methodology. CH modified the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guangyi Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Chen, X., Zhang, Q., Lin, M. et al. No-reference color image quality assessment: from entropy to perceptual quality. J Image Video Proc. 2019, 77 (2019). https://doi.org/10.1186/s13640-019-0479-7

Download citation

Received: 29 December 2018
Accepted: 21 August 2019
Published: 06 September 2019
DOI: https://doi.org/10.1186/s13640-019-0479-7

No-reference color image quality assessment: from entropy to perceptual quality

Abstract

1 Introduction

2 Methods

2.1 Two-dimensional entropy

2.2 Mutual information

2.3 Log-Gabor filtering

3 Results and discussion

3.1 Correlation of feature vectors with human opinion

3.2 Correlation of individual feature vectors with human perception

3.3 Variation with window size

3.4 Comparison with other iQA methods

3.5 Statistical significance testing

3.6 Classification performance analysis

3.7 Database independence

3.8 Runtime analysis

4 Conclusions

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords