Skip to main content

Advertisement

No-reference color image quality assessment: from entropy to perceptual quality

Article metrics

  • 213 Accesses

Abstract

This paper presents a high-performance general-purpose no-reference (NR) image quality assessment (IQA) method based on image entropy. The image features are extracted from two domains. In the spatial domain, the mutual information between different color channels and the two-dimensional entropy are calculated. In the frequency domain, the statistical characteristics of the two-dimensional entropy and the mutual information of the filtered subband images are computed as the feature set of the input color image. Then, with all the extracted features, the support vector classifier (SVC) for distortion classification and support vector regression (SVR) are utilized for the quality prediction, to obtain the final quality assessment score. The proposed method, which we call entropy-based no-reference image quality assessment (ENIQA), can assess the quality of different categories of distorted images, and has a low complexity. The proposed ENIQA method was assessed on the LIVE and TID2013 databases and showed a superior performance. The experimental results confirmed that the proposed ENIQA method has a high consistency of objective and subjective assessment on color images, which indicates the good overall performance and generalization ability of ENIQA. The implementation is available on github https://github.com/jacob6/ENIQA.

Introduction

In this era of information explosion, we are surrounded by an overwhelming amount of information. The diversification of information is dazzling, and images, as the source of visual information, contain a wealth of valuable information. Considering the incomparable advantages of image information over other types of information, it is important to process images appropriately in the different fields [1]. In image acquisition, processing, transmitting, and recording, image distortion and quality degradation are an inevitable result of the imperfection of the imaging system, the processing method, the transmission medium, and the recording equipment, as well as object movement and noise pollution [24]. There is a direct effect of image quality on people’s subjective feelings and perception of information. For example, the quality of the collected images directly affects the accuracy and reliability of the recognition results in an image recognition process [5, 6]. Another example is that remote conferencing and video-on-demand systems are affected by such factors as transmission errors, network latency, and so on [79]. Online real-time image quality control is thus introduced to ensure that the service provider dynamically adjusts the source location strategy, in order to meet the service quality requirements [10]. It is therefore not surprising that research into image quality assessment (IQA) has received extensive attention during the last two decades [11].

In accordance with the need for human participation, IQA methods can be divided into two classes: subjective image quality assessment methods and objective image quality assessment methods [12]. Subjective assessment is quantified by the human eye. In contrast, an objective IQA method focuses on automatic assessment of the images via a specific automated, computer assisted method, with the ultimate goal of enabling a computer to model image processing properties of the human visual system (HVS) in viewing and perceiving images [13]. In practice, subjective assessment results are difficult to apply in real-time imaging systems due to their strong randomicity. Therefore, objective IQA methods have been widely studied [14]. According to the availability of a reference image, objective IQA methods can be classified as full-reference (FR), reduced-reference (RR), and no-reference (NR) methods [15]. In a FR method, an original "distortion-free" image is assumed to be supplied, as the assessment result is obtained through the comparison of the two images. With the advances of recent studies, the accuracy of this kind of method is getting better, despite its disadvantage of requiring a complete reference image, which is often not available in practical applications [16]. A RR method, which is also known as a partial reference method, does not make a complete comparison between the distorted image and the pristine one, but only compares certain features [17]. Conversely, a NR method, which is also called a blind image quality assessment (BIQA) method, requires no image as reference. Instead, the quality is estimated according to the features of the distorted image [15]. In many practical applications, a reference image will be inaccessible, and thus the NR-IQA methods have the most practical value and a very wide application potential [18].

In general, the current NR-IQA methods can be divided into two categories: application-specific and general-purpose assessment [19]. The former kind of method assesses the image quality of a specific distortion type and calculates the corresponding score. Common types of distortion include JPEG, JPEG2000 compression (JP2K), blur, contrast distortion, and noise. For images with compression degradation, Suthaharan et al. [20] proposed the visually significant blocking artifact metric (VSBAM) to estimate the degradation level caused by compression. For images with blur degradation, Ciancio et al. [21] utilized various spatial features and adopted a neural network model to assess the quality. The maximum local variation (MLV) method proposed by Khosro et al. [22] provides a fast method of blur level estimation. Rony et al. [23] put forward the concept of just noticeable blur (JNB) and the improved version of cumulative probability of blur detection (CPBD) [24]. For images with contrast distortion, Fang et al. [25] extracted features from the statistical characteristics of the 1-D image entropy distribution and developed an assessment model based on natural scene statistics (NSS) [26]. Hossein et al. [27] used higher orders of the Minkowski distance and entropy to apply an accurate measurement of the contrast distortion level. For images with noise, Yang et al. [28] proposed frequency mapping (FM) and introduced it into quality assessment. Gu et al. [29] proposed a training-free blind quality method based on the concept of information maximization. These methods, however, require prior knowledge of the distortion type, which limits their application range. Therefore, general-purpose NR-IQA methods based on training and learning are highly desirable.

General-purpose NR-IQA methods can be further divided into two types: explicit methods and implicit methods [30]. An explicit method usually contains two steps: feature extraction and model mapping [31]. Generally speaking, the features extracted in the first step represent the visual quality, while the mapping model in the second step bridges the gap between the features and the ground-truth quality score. An implicit general-purpose NR-IQA method constructs a mapping model via deep learning. Although deep networks nowadays generally have an independent feature extraction capability, it is difficult for the existing IQA databases to meet the huge demand for training samples, let alone the large amount of redundant data and network parameters. In addition, compared to preselected features, no clear physical meaning can be given by these automatically extracted features. Thus, manual feature extraction is still an effective and accurate way to summarize the whole image distortion.

According to the existing literature, the features extracted by explicit general-purpose NR-IQA methods are mainly concentrated in two categories: (1) The parameters of a certain model are obtained after a preprocessing operation such as mean-subtracted contrast-normalized (MSCN) coefficients [32]. The typical models are the generalized Gaussian distribution (GGD) model [33], the asymmetric GGD (AGGD) model [34], the Weibull distribution (WD) model [35], etc. (2) Physical quantities that reflect the characteristics of the image are obtained after preprocessing such as blocking and transformation. The typical methods are image entropy [36], wavelet subband correlation coefficients [37], etc. The mapping models from features to image quality are divided into three main types: (1) Classical methods such as BIQI [33], DIIVINE [37], DESIQUE [38], and SSEQ [36] follow a two-stage framework. The probability of each type of distortion in the image is gaged by a support vector classifier (SVC) and denoted as pi in the first stage. The quality of the image along each of these distortions is then assessed by support vector regression (SVR) and denoted as qi in the second stage. Finally, the quality of the image is expressed as a probability-weighted summation: \(\text {Index}=\sum p_{i}q_{i}\). (2) Methods such as NIQE [39] and IL-NIQE [35] are classified as “distortion-unaware,” and they calculate the distance between a model fitted by features from a distorted image and an ideal model to estimate a final quality score, without identifying the type of distortion. (3) Methods such as BLIINDS-II [40] and BRISQUE [34] implement direct mapping of the image features to obtain a subjective quality score, also without distinguishing the different distortion types.

The existing general-purpose NR-IQA methods are faced with the following problems: (1) The color space of the image is less considered in these methods. (2) Some of the methods take advantage of the statistical features of the pixels only, and they ignore the spatial distribution of the features. Liu et al. [36] calculated the 1-D entropy of image blocks in the spatial and frequency domains, respectively, and used the mean, along with the skewness [41], of all the local entropy values as the image features to implement the SSEQ method. Gabarda et al. [42] approximated the probability density function by the spatial and frequency distribution to calculate the pixel-wise entropy on a local basis. The measured variance of the entropy is a function of orientation, which is used as an anisotropic indicator to estimate the fidelity and quality of the image [43]. Although some aggregated features of image grayscale distribution can be embodied in these one-dimensional entropy-based methods, the spatial features of the distribution cannot be obtained.

In this paper, we introduce a NR-IQA method based on image entropy, namely, ENIQA. Firstly, by using the two-dimensional entropy (TE) [44] instead of the one-dimensional entropy [45], the proposed method better embodies the correlativity of pixel neighbors. Secondly, we calculate the mutual information (MI) [46] between the different color channels and the TE of the color image in two scales. We split the image into patches in order to exploit the statistical properties of each local region. During this process, visual saliency detection [47] is performed to weight the patches, and the less important ones are then excluded. Thirdly, a log-Gabor filter [48, 49] is applied on the image to simulate the neurons’ selective response to stimulus orientation and frequency. After that, the MI between the different subband images and the TE of the filtered images are computed. The MI, as well as the mean and the skewness of the TE, is then utilized as the structural feature to determine the perceptual quality of the input image. Specifically, SVC and SVR are used to implement a two-stage framework for the final prediction. The experiments undertaken with the LIVE [50] and TID2013 [51] databases confirmed that the proposed ENIQA method performs well and shows a high consistency of subjective and objective assessment.

The rest of this paper is structured as follows. In Section 2, we introduce the structural block diagram of the novel IQA method proposed in this study and present a detailed introduction to image entropy, correlation analysis of the RGB color space, and the log-Gabor filter. Section 3 provides an experimental analysis, and describes the testing and verification of the proposed method from multiple perspectives. Finally, Section 4 concludes with a summary of our work.

Methods

In order to describe the local information of the image, the proposed ENIQA method introduces the MI and the TE in both the spatial and frequency domains. Given a color image whose quality is to be assessed, the MI between the three channels R, G, and B is first calculated (f1f3). We convert the input image to grayscale and divide it into patches to calculate patch-wise entropy values. The obtained local entropy values are then pooled to compute the mean and the skewness (f7f8). For the frequency domain features, we apply log-Gabor filtering at two center frequencies and in four orientations to the grayscale image and obtain eight subband images, on which blocking and entropy calculation are applied. The eight pairs of mean and skewness values are obtained from each subband (f11f26). Furthermore, in order to acquire the relationship of the subband images, the MI between any two of the subband images in the four different orientations (f43f48) and that between the two center frequencies (f55) are also calculated, respectively. The image is down-sampled using the nearest-neighbor method to capture multiscale behavior, yielding another set of 28 features (f4f6,f9f10,f27f42,f49f54, and f56). Thus, ENIQA extracts a total of 56 features for an input color image, as listed in Table 1. We group the features for a clearer representation. The right half of Fig. 1 illustrates the extraction process of the five feature groups.

Fig. 1
figure1

The framework of the proposed ENIQA method

Table 1 Features used for ENIQA

After all the features are extracted, the proposed ENIQA method utilizes a two-stage framework to obtain a score index of the test image. In the first stage, the presence of a set of distortions in the image is estimated via SVC, giving the amount or probability of each type of distortion. In the second stage, for each type of distortion we consider, a support vector machine [52] is trained to perform a regression that maps the features to the objective quality. Finally, the quality score of the image is produced by a weighted summation, where the probabilities from the first stage are multiplied by the corresponding regressed scores from the second stage and then added altogether. The left half of Fig. 1 shows the structure of the two-stage framework.

Two-dimensional entropy

Image entropy is a statistical feature that reflects the average information content in an image. The one-dimensional entropy of an image represents the information contained in the aggregated features of the grayscale distribution in the image but does not contribute to the extraction of the structural features. In order to characterize the local structure of the image, TE that describes the spatial correlation of the grayscale values is introduced.

After the color image X is converted to grayscale, the neighborhood mean of the grayscale image is selected as the spatial distribution feature. Let p(x) denote the proportion of pixels whose gray value is x in image X, the one-dimensional entropy of a gray image is defined as follows:

$$ H_{1}({X}) = -\sum\limits_{x=0}^{255}p(x)\log_{2} p(x) $$
(1)

The gray level of the current pixel and the neighborhood mean then form a feature pair, which is denoted as (x1,x2), where x1 is the gray level of the pixel (0≤x1≤255) and x2 is the mean value of the neighbors (0≤x2≤255). The combined probability density distribution function of x1 and x2 is given by the following:

$$ p(x_{1}, x_{2})=\frac{f(x_{1}, x_{2})}{MN} $$
(2)

where f(x1,x2) is the frequency at which the feature pair (x1,x2) appears, and the size of X is M×N.

In our implementation, x2 is based on the eight adjacent neighbors of the center pixel, as shown in Fig. 2. The discrete TE is defined as follows:

$$ H_{2}({X})=-\sum\limits_{x_{1}=0}^{255}\sum\limits_{x_{2}=0}^{255}p(x_{1}, x_{2}) \log_{2} p(x_{1}, x_{2}) $$
(3)
Fig. 2
figure2

A pixel and its eight neighborhoods

The TE based on the above can describe the comprehensive features of the grayscale information of the pixel and the grayscale distribution in the neighborhood of the pixel. We determined the TE for a reference image (monarch.bmp in the LIVE [50] database) and the five corresponding distorted images with the same distortion level but different distortion types. The statistical characteristics are shown in Fig. 3a and b. All the differential mean opinion score (DMOS) [53] values are around 25 in Fig. 3a and 50 in Fig. 3b, and the distortion types span JPEG and JP2K compression, additive white Gaussian noise (WN), Gaussian blue (GBlur), and fast fading (FF) Rayleigh channel distortion. Similarly, the same experiments were carried out on monarch.bmp and the five corresponding distorted images with the image distortion type but different distortion levels (taking WN and GBlur as examples), whose statistical characteristics are shown in Fig. 3c and d. In Fig. 3, the abscissa axis represents the entropy and the vertical axis represents the normalized number of blocks. It can be seen that both the distortion level and the distortion type can be distinguished by TE. Consequently, the TE can be considered a meaningful feature. Inspired by [26, 36, 54], we utilize the mean and skewness as the most typical features to describe the histogram.

Fig. 3
figure3

Histograms of TE values. a The six curves correspond to an undistorted image and its distorted counterparts with the same distortion level but different distortion types. The DMOS values are around 25. b The six curves correspond to an undistorted image and its distorted counterparts with the same distortion level but different distortion types. The DMOS values are around 50. c The six curves correspond to an undistorted image and its distorted counterparts with the same distortion type but different distortion levels. The distortion type is WN. d The six curves correspond to an undistorted image and its distorted counterparts with the same distortion type but different distortion levels. The distortion type is GBlur

The HVS automatically sets different priorities of attention for different regions of the observed image [47]. Thus, before calculating the statistical characteristics of the TE, we conducted visual saliency detection on the image, i.e., only the more important image patches were involved in the subsequent computation. To realize this, we first split the image into patches, pooled the patches according to human vision priority, and screened out the more significant ones. Then, according to the saliency values, we sorted the patches and calculated the mean and skewness of the local TE on the 80% more important patches only. In the experiments, we used the spectral residual (SR) method [55] to generate the saliency map of the image to be measured. It is worth noting that the frequencies of different pixel values (integers from 0 to 255) are counted in every important patch to estimate the probability distributions in Eq. (3).

Mutual information

The application of colors in image display can not only stimulate the eye, but also allows the observer to perceive more information. The human eye has the ability to distinguish between thousands of colors, in spite of the perception of only dozens of gray levels [56]. There is a strong correlation between the RGB components of an image, which is embodied by the fact that the changes of individual color components reflected in the same region tend to be synchronized, i.e. when the color of a certain area of a natural color image changes, the pixel gray values of the corresponding R, G, and B components also change at the same time. Moreover, although the gray value of a pixel varies with the color channels, different RGB components have a high similarity and consistency in textures, edges, phases, and grayscale gradients [57]. Therefore, it is meaningful to characterize the MI between the three channels of R, G, and B.

Taking R and G as an example, xr and xg are the gray values of the red and green components of the input color image X, while p(xr),p(xg) are the grayscale probability distribution functions in the two channels. p(xr,xg) is the joint probability distribution function. The MI between the R and G channels is then formulated as follows:

$$ \begin{aligned} I({X}_{R}; {X}_{G}) &= H_{1}({X}_{R}) + H_{1}({X}_{G}) - H_{2}({X}_{R}, {X}_{G}) \\ &= \sum\limits_{x_{r}=0}^{255}\sum\limits_{x_{g}=0}^{255}p(x_{r}, x_{g}) \log_{2} \frac{p(x_{r}, x_{g})}{p(x_{r}) p(x_{g})} \end{aligned} $$
(4)

where H1(XR) and H1(XG) are the one-dimensional entropy of the corresponding channel, and H2(XR,XG) represents the two-dimensional entropy between the two images, which is defined as follows:

$$ H_{2}({X}_{R}, {X}_{G})=-\sum\limits_{x_{r}=0}^{255}\sum\limits_{x_{g}=0}^{255}p(x_{r}, x_{g}) \log_{2} p(x_{r}, x_{g}) $$
(5)

Log-Gabor filtering

It is known that the log-Gabor filter function conforms to the HVS and is consistent with the symmetry of the cellular response of the human eye at logarithmic frequency scales [58]. The log-Gabor filter eliminates the DC component, overcomes the bandwidth limitation of the conventional Gabor filter, and has a typical frequency response with a Gaussian shape [48]. Thus, it is much easier, as well as more efficient, for a log-Gabor filter to extract information on a higher band. The transfer function of a two-dimensional log-Gabor filter can be expressed as follows:

$$ G(f, \theta) = \exp\left(-\frac{(\log(f/f_{0}))^{2}}{2(\log(\sigma_{r}/f_{0}))^{2}} \right) \exp\left(-\frac{(\theta - \theta_{0})^{2}}{2\sigma_{\theta}^{2}} \right) $$
(6)

In Eq. 6, f0 gives the center frequency and θ0 represents the center orientation. σr and σθ are the width parameters for the frequency and the orientation, respectively.

We distill the features in the frequency domain by convolving the image with the log-Gabor filter. The log-Gabor filter bank designed in this study consists of eight filters, with orientations of 0,45,90, and 135, and two frequency bands. Eight subband images in four orientations and two bands are obtained after the input image is filtered.

Results and discussion

In order to assess the performance of the proposed method, we carried out experiments on the LIVE [50] and TID2013 [51] databases. The LIVE database consists of 29 reference images and 779 distorted images of five distortion types, while the TID2013 database contains 25 reference images and 3000 distorted images of 24 distortion types. Of these 25 images, only 24 are natural images, so we only used the 24 natural images in the testing. In order to ensure the consistency of the training and testing, we carried out the cross-database testing over the four of the five distortion types that are in common with the LIVE database, namely, JP2K, JPEG, WN, and GBlur.

The indices used to measure the performance of the proposed method are the Spearman’s rank-order correlation coefficient (SROCC), the Pearson linear correlation coefficient (PLCC), and the root-mean-square error (RMSE) between the predicted scores and the ground-truth DMOS [59]. A value close to 1 for SROCC and PLCC and a value close to 0 for RMSE indicates better correlation with human perception. Note that PLCC and RMSE were computed after the predicted scores were fitted by a nonlinear logistic regression function with five parameters [53]:

$$ f(z)=\beta_{1}\left[\frac{1}{2}-\frac{1}{1+exp(\beta_{2}(z-\beta_{3}))} \right]+\beta_{4}z+\beta_{5} $$
(7)

where z is the objective IQA score, f(z) is the IQA regression fitting score, and βi(i=1,2,,5) are the parameters of the regression function.

Correlation of feature vectors with human opinion

In this experiment, we assessed the discriminatory power of different feature combinations. With the feature groups listed in Table 1, we visually illustrate the relationship between image quality and features in the form of two-dimensional /three-dimensional scatter plots. As shown in Fig. 4, the different feature combinations are used as the axes, and each image in the LIVE database corresponds to a scatter point in the coordinate system. Furthermore, we use different markings to distinguish the five types of distortion and map the score of each image to the preset colormap. The ideal case is that the points with different distortion types are well separated. In this paper, we selected only a few representative images as examples. It can be seen from Fig. 4a and b that the scatter points of JPEG and WN have a very different spatial distribution than the other points, which allows them to be better distinguished. From Fig. 4c and d, we can see that GBlur can be distinguished, to some extent, from the other types of distortion. However, for GBlur points with lower distortion levels, they cannot be easily separated from FF and JP2K, since the distributions of the scatter points of these three distortion types are very similar. As can be observed in Fig. 4e and f, images with higher distortion levels of WN, GBlur, and FF are more easily distinguished from images with good quality. Nonetheless, GBlur and FF are indistinguishable. And still, JP2K points cause the reduction of distinguishability, as some of them are scattered close to the highly-distorted GBlur and FF points. According to Fig. 4, the number of features we selected seems too small to distinguish all the distortion types. Due to the limitation of human spatial cognition, it is not possible to show the discriminative ability of the features in a graphical way, such as a four-dimensional scatter plot, by selecting feature combinations of a higher dimension. In Section 3.6, we prove that if more features are selected (actually, we chose 56-dimensional features), the discriminatory power of the feature vector on the distortion type is further enhanced, which indicates the accuracy and reliability of our selection of features.

Fig. 4
figure4

Illustration of the discriminatory power of different feature combinations (zoom in to get the markers more discriminative). a Elements 1, 2, and 3. b Elements 9 and 10. c Elements 12, 14, and 16 (d). Elements 14 and 22. e Elements 52, 53, and 54. f Elements 55 and 56

Correlation of individual feature vectors with human perception

In order to quantitatively study the predictive ability of each feature vector [60, 61], we performed a recombination of the features in Table 1, separately deployed specific subsets (feature vectors), and designed three limited models: (1) The feature vector f1f6 represents the MI between the three color channels on two scales, denoted as ENIQA1. (2) The feature vector f7f42 represents the mean and skewness of the TE on two scales, denoted as ENIQA2. (3) The feature vector f43f56 represents the MI between the subband images on two scales, denoted as ENIQA3.

We performed the assessment of these three limited models by 1000 train-test iterations of cross-validation. In each iteration, we randomly split the LIVE [50] database into two non-overlapping sets: a training set comprising 80% of the reference images as well as their corresponding distorted counterparts, and a test set composed of the remaining 20%. Finally, the median SROCC, PLCC, and RMSE values over 1000 trials are reported as the final performance indices, as shown in Table 2. It is not difficult to see that each feature vector has a different degree of correlation with the subjective assessment. Among them, the TE contributes the most to the performance of the method, followed by the MI between the subband images. Although the MI between the color channels contributes the least, it is a valuable extension of the TE feature.

Table 2 Median SROCC/PLCC/RMSE values across 1000 train-test trials on the LIVE database

Variation with window size

As mentioned above, since the local saliency difference of the image is considered, the proposed ENIQA method blocks the image with a window and counts the frequency of the gray values in each block to generate feature pairs before calculating the local TE. Table 3 shows the effect of different window sizes (K×L) on the performance of the proposed method, where the highest SROCC value of each column is italicized. The average time consumption for assessing a single image is also reported in Table 3. All the experiments are performed on a PC with Intel-i7-6700K CPU@4.0GHz, 16G RAM, MATLAB R2016a. The elapsed time is the mean value measured through 10 times of evaluations on the same 384×512×3 image.

Table 3 Median SROCC value of ENIQA on the LIVE database with different window sizes

In order to visualize the trend, we also drew two line charts in Fig. 5, which intuitively illustrate the change of the elapsed time and the overall SROCC value with the selected window size.

Fig. 5
figure5

Line charts between the selected window size and the overall SROCC value as well as the average time consumed on evaluating a single image according to Table 3. When the window size is set to 8×8, the method achieves best SROCC performance

It can be observed that the performance of the proposed method varies with the size of the window. As the window size increases, the SROCC value shows a trend of increasing first and then decreasing, reaching a peak at 8×8. At the same time, the runtime of the method mostly decreases monotonically with the increase of the window size. To make a compromise, we used K=L=8 in this study. It should be pointed out that the overall SROCC value maintains above 0.9 when the window size is 16×16, which implies that the window size can be appropriately increased to trade accuracy in time-critical applications.

Comparison with other iQA methods

To further illustrate the superiority of the proposed method, we compared ENIQA with 10 other state-of-the-art IQA methods. The three FR-IQA approaches were the peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM) [12], and visual information fidelity (VIF) [62], and the seven NR-IQA approaches were BIQI [33], DIIVINE [37], BLIINDS-II [40], BRISQUE [34], NIQE [39], ILNIQE [35], and SSEQ [36]. To make a fair comparison, we used the same 80% training /20% testing protocol over 1000 iterations on all the models. The source code of all the methods was provided by the authors. In the training of a NR model, the LIBSVM toolkit [63] was used to implement the SVC and SVR, both adopting a radial-basis function (RBF) kernel. We selected an eSVR model for regression, and both the cost and γ for RBF are set to 1e−4. Since the FR approaches do not require a training procedure, they were only performed on distorted images, i.e., the reference images were not included. For the results listed in Table 4, The top performances of the FR-IQA and NR-IQA indices are italicized. The second-best results of the NR-IQA indices are highlighted in bold.

Table 4 Median SROCC/PLCC/RMSE values on the LIVE database

It can be seen that the proposed ENIQA method performs well on the LIVE database. To be specific, ENIQA obtains the highest SROCC values for JPEG and WN, and the second-highest overall SROCC value among the NR methods listed in Table 4. In terms of PLCC and RMSE, ENIQA is superior to all the other NR methods, except BRISQUE, on JPEG and WN, and also ranks second in overall performance. Generally speaking, the overall performance of the proposed ENIQA method is superior to most of the other NR methods and is ahead of some of the classic FR methods such as SSIM. Moreover, ENIQA is rather good at evaluating images with distortions of JPEG and WN.

Statistical significance testing

In order to compare the performance of the different methods in a more intuitive way, Fig. 6 shows a box plot of the SROCC distributions for the 11 IQA methods (including the proposed ENIQA method) across 1000 train-test trials, which provides key information about the location and dispersion of the data. We also performed a two-sample t test [64] between the methods, and the results are shown in Table 5. The null hypothesis is that the mean correlation value of the row is equal to the mean correlation value of the column at the 95% confidence level. The alternative hypothesis is that the mean correlation value of the row is greater (or less) than the mean correlation value of the column. Table 5 indicates which row is statistically superior (“1”), statistically equivalent (“0”), or statistically inferior (“ −1”) to which column. Although BRISQUE and SSEQ are statistically superior to ENIQA in Table 5, it can be seen from Fig. 6 that ENIQA outperforms all the other FR and NR approaches, except BRISQUE, in terms of the median value.

Fig. 6
figure6

Box plot of SROCC distributions of the compared IQA methods over 1000 trials on the LIVE database

Table 5 Results of the t-tests performed between SROCC values

Classification performance analysis

We analyzed the classification accuracy of ENIQA on the LIVE database based on the two-stage framework. The average classification accuracies for all the distortion types across 1000 random trials are listed in Table 6. It can be seen that if the feature dimensions reach 56, the classification accuracy of JP2K reaches 71.6369%, which is fairly acceptable. In Section 3.1, however, we showed that it is extremely difficult to distinguish JP2K images by low-dimensional feature vectors. Thus, we can speculate that in the 56-dimensional space composed of the features, the distorted images of the JP2K type are discernible by the hyperplane constructed by SVC. Furthermore, in order to visualize which distortion types may be confused with each other, we plotted a confusion matrix [65], as shown in Fig. 7. Each value in the confusion matrix indicates the probability of the distortion type on the vertical axis being confused with that on the horizontal axis. The numerical values are the average classification accuracies of the 1000 random trials.

Fig. 7
figure7

Mean confusion matrix of the classification accuracy across 1000 train-test trials

Table 6 Mean classification accuracy across 1000 train-test trials

It can be seen from Table 6 and Fig. 7 that WN cannot easily be confused with the other distortion types, while the other four distortion types are more easily confused. As FF consists of JP2K followed by packet loss, it becomes clear that FF distortion is more easily confused with JP2K compression distortion. From Fig. 3, we can also see that the TE distributions of WN and JPEG are very specific, while JP2K, GBlur, and FF have quite similar TE distributions, which results in them being more easily confused.

Database independence

In order to test the generalization ability of the assessment model to different samples, we trained the model on the whole LIVE or TID2013 database and tested it on the TID2013 or LIVE database, noting that we only chose distortion types in common with the LIVE database (JP2K, JPEG, WN, and GBlur). The computed performance indices are shown in Table 7, and the top performances for the FR-IQA indices and those for the NR-IQA indices are highlighted in italics. For the NR-IQA indices, we have also boldfaced the second-best results. On the one hand, ENIQA achieves the best performance indices when trained on LIVE and tested on TID2013. On the other hand, when trained on TID2013 and tested on LIVE, ENIQA ranks in the top three, though not the best one, with the SROCC higher than 0.9. It is worth noting that ENIQA outperforms BRISQUE in both cross-database experiments, which embodies the good generalization ability of ENIQA.

Table 7 Performance indices obtained by training on the LIVE or TID2013 database and testing on the TID2013 or LIVE database

Figure 8 shows the results of the scatter plot fitting of ENIQA on the LIVE and TID2013 databases. As in the previous experiments, when performing the scatter plot experiment on the LIVE database, we trained with the random 80% of the images separated by content in the LIVE database and then tested with the remaining 20%, for which the results are shown in Fig. 8a. When conducting the experiment on the TID2013 database, we trained the model on the entire LIVE database and then tested it on the selected portion of the TID2013 database, for which the results are given in Fig. 8b. It can be observed from Fig. 8 that the scatter points are evenly distributed in the entire coordinate system and have a strong linear relationship with DMOS /MOS, which further proves the superior overall performance and generalization ability of the proposed ENIQA method.

Fig. 8
figure8

Scatter plots of DMOS /MOS versus prediction of ENIQA on the LIVE and TID2013 databases. a DMOS versus prediction of ENIQA on LIVE. b MOS versus prediction of ENIQA on TID2013

Runtime analysis

Table 8 shows the average running time of the 11 IQA methods, which is measured through 10 times of assessments on the same 384×512×3 image. All MATLAB source codes of the IQA methods, apart from PSNR, are the official implementations from the original authors. For ENIQA, the window size is set to 8×8. It can be seen that ENIQA maintains moderate computation in addition to its superior performance.

Table 8 The average running time of the 11 IQA methods

Conclusions

In this paper, we proposed a general-purpose NR-IQA method called entropy-based no-reference image quality assessment (ENIQA). Based on the concept of image entropy, ENIQA combines log-Gabor filtering and saliency detection for feature extension and accuracy improvement. To construct an effective feature vector, ENIQA extracts the structural information of the input color images, including the MI and the TE in both the spatial and the frequency domains. The image quality score is then predicted by the SVC and SVR. The proposed ENIQA method was assessed on the LIVE and TID2013 databases, and we carried out cross-validation experiments and cross-database experiments to compare it with several other FR- and NR-IQA approaches. From the experiments, ENIQA showed a superior overall performance and generalization ability when compared to the other state-of-the-art methods.

Abbreviations

AGGD:

Asymmetric generalized Gaussian distribution

BIQA:

Blind image quality assessment

CPBD:

Cumulative probability of blur detection

DMOS:

Differential mean opinion score

ENIQA:

Entropy-based no-reference image quality assessment

FF:

Fast fading

FM:

Frequency mapping

FR:

Full reference

GBlur:

Gaussian blur

GGD:

Generalized Gaussian distribution

HVS:

Human visual system

IQA:

Image quality assessment

JNB:

Just noticeable blur

JP2K:

JPEG2000

MI:

Mutual information

MLV:

Maximum local variation

MOS:

Mean opinion score

MSCN:

Mean-subtracted contrast-normalized

NR:

No-reference

NSS:

Natural scene statistics

PLCC:

Pearson linear correlation coefficient

PSNR:

Peak signal-to-noise relationship

RBF:

Radial-basis function

RMSE:

Root-mean-square error

RR:

Reduced-reference

SR:

Spectral residual

SROCC:

Spearman’s rank-order correlation coefficient

SSIM:

Structural similarity index

SVC:

Support vector classifier

SVR:

Support vector regression

TE:

Two-dimensonal entropy

VIF:

Visual information fidelity

VSBAM:

Visually significant blocking artifact metric

WD:

Weibull distribution

WN:

Additive white Gaussian noise

References

  1. 1

    P. Mohammadi, A. Ebrahimimoghadam, S. Shirani, Subjective and objective quality assessment of image: A survey. Majlesi J. Electr. Eng.9(1), 55–83 (2014).

  2. 2

    Y. Fang, K. Zeng, Z. Wang, W. Lin, Z. Fang, C. -W. Lin, Objective quality assessment for image retargeting based on structural similarity. IEEE J. Emerg. Sel. Top. Circ. Syst.4(1), 95–105 (2014).

  3. 3

    X. Zhang, J. Li, H. Wang, D. Xiong, J. Qu, H. Shin, J. P. Kim, T. Zhang, Realizing transparent os/apps compression in mobile devices at zero latency overhead. IEEE Trans. Comput.66(7), 1188–1199 (2017).

  4. 4

    K. Gu, D. Tao, J. -F. Qiao, W. Lin, Learning a no-reference quality assessment model of enhanced images with big data. IEEE Trans. Neural Netw. Learn. Syst.29(4), 1301–1313 (2018).

  5. 5

    H. Fronthaler, K. Kollreider, J. Bigun, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Automatic image quality assessment with application in biometrics (IEEENew York, 2006), pp. 30–37.

  6. 6

    C. Yan, H. Xie, J. Chen, Z. Zha, X. Hao, Y. Zhang, Q. Dai, A fast uyghur text detector for complex background images. IEEE Trans. Multimed.20(12), 3389–3398 (2018).

  7. 7

    Q. Li, Z. Wang, Reduced-reference image quality assessment using divisive normalization-based image representation. IEEE J. Sel. Top. Signal Process.3(2), 202–211 (2009).

  8. 8

    C. Yan, L. Li, C. Zhang, B. Liu, Y. Zhang, Q. Dai, Cross-modality bridging and knowledge transferring for image understanding. Trans. Multimed. IEEE, 1–10 (2019). https://doi.org/10.1109/TMM.2019.2903448.

  9. 9

    C. Yan, Y. Tu, X. Wang, Y. Zhang, X. Hao, Q. Dai, Stat: Spatial-temporal attention mechanism for video captioning. IEEE Trans. Multimed. (2019). https://doi.org/10.1109/TMM.2019.2924576.

  10. 10

    S. S. Hemami, A. R. Reibman, No-reference image and video quality estimation: Applications and human-motivated design. Signal Process. Image Commun.25(7), 469–481 (2010).

  11. 11

    D. M. Chandler, Seven challenges in image quality assessment: Past, present, and future research. ISRN Signal Process.2013:, 1–53 (2013).

  12. 12

    Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process.13(4), 600–612 (2004).

  13. 13

    W. Lin, C. -C. J. Kuo, Perceptual visual quality metrics: A survey. J. Visual Commun. Image Represent.22(4), 297–312 (2011).

  14. 14

    A. K. Moorthy, A. C. Bovik, Visual quality assessment algorithms: What does the future hold?. Multimed. Tools Appl.51(2), 675–696 (2011).

  15. 15

    A. C. Bovik, Automatic prediction of perceptual image and video quality. Proc. IEEE. 101(9), 2008–2024 (2013).

  16. 16

    W. Xue, L. Zhang, X. Mou, A. C. Bovik, Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Process.23(2), 684–695 (2014).

  17. 17

    Z. Wang, A. C. Bovik, Reduced- and no-reference image quality assessment. IEEE Signal Process. Mag.28(6), 29–40 (2011).

  18. 18

    S. Xu, S. Jiang, W. Min, No-reference/blind image quality assessment: A survey. Iete Tech. Rev.34(3), 223–245 (2017).

  19. 19

    V. Kamble, K. Bhurchandi, No-reference image quality assessment algorithms: A survey. Optik-Int. J. Light Electron Optics. 126(11-12), 1090–1097 (2015).

  20. 20

    S. Suthaharan, No-reference visually significant blocking artifact metric for natural scene images. Signal Process.89(8), 1647–1652 (2009).

  21. 21

    A. Ciancio, A. L. N. T. da Costa, E. A. da Silva, A. Said, R. Samadani, P. Obrador, No-reference blur assessment of digital pictures based on multifeature classifiers. IEEE Trans. Image Process.20(1), 64–75 (2011).

  22. 22

    K. Bahrami, A. C. Kot, A fast approach for no-reference image sharpness assessment based on maximum local variation. IEEE Signal Process. Lett.21(6), 751–755 (2014).

  23. 23

    R. Ferzli, L. J. Karam, A no-reference objective image sharpness metric based on the notion of just noticeable blur (jnb). IEEE Trans. Image Process.18(4), 717–728 (2009).

  24. 24

    N. D. Narvekar, L. J. Karam, A no-reference image blur metric based on the cumulative probability of blur detection (cpbd). IEEE Trans. Image Process.20(9), 2678–2683 (2011).

  25. 25

    Y. Fang, K. Ma, Z. Wang, W. Lin, Z. Fang, G. Zhai, No-reference quality assessment of contrast-distorted images based on natural scene statistics. IEEE Signal Process. Lett.22(7), 838–842 (2015).

  26. 26

    D. L. Ruderman, The statistics of natural images. Network: Comput. Neural Syst.5(4), 517–548 (1994).

  27. 27

    H. Z. Nafchi, M. Cheriet, Efficient no-reference quality assessment and classification model for contrast distorted images. IEEE Trans. Broadcast.64(2), 518–523 (2018).

  28. 28

    G. Yang, Y. Liao, Q. Zhang, D. Li, W. Yang, No-reference quality assessment of noise-distorted images based on frequency mapping. IEEE Access. 5:, 23146–23156 (2017).

  29. 29

    K. Gu, W. Lin, G. Zhai, X. Yang, W. Zhang, C. W. Chen, No-reference quality metric of contrast-distorted images based on information maximization. IEEE Trans. Cybernet.47(12), 4559–4565 (2017).

  30. 30

    J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, A. C. Bovik, Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment. IEEE Signal Process. Mag.34(6), 130–141 (2017).

  31. 31

    J. Guan, S. Yi, X. Zeng, W. Cham, X. Wang, Visual importance and distortion guided deep image quality assessment framework. IEEE Trans. Multimed.19(11), 2505–2520 (2017).

  32. 32

    K. Gu, G. Zhai, X. Yang, W. Zhang, Using free energy principle for blind image quality assessment. IEEE Trans. Multimed.17(1), 50–63 (2015).

  33. 33

    A. K. Moorthy, A. C. Bovik, A two-step framework for constructing blind image quality indices. IEEE Signal Process. Lett.17(5), 513–516 (2010).

  34. 34

    A. Mittal, A. K. Moorthy, A. C. Bovik, No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process.21(12), 4695–4708 (2012).

  35. 35

    L. Zhang, L. Zhang, A. C. Bovik, A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process.24(8), 2579–2591 (2015).

  36. 36

    L. Liu, B. Liu, H. Huang, A. C. Bovik, No-reference image quality assessment based on spatial and spectral entropies. Signal Process. Image Commun.29(8), 856–863 (2014).

  37. 37

    A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Trans. Image Process.20(12), 3350–3364 (2011).

  38. 38

    Y. Zhang, D. M. Chandler, No-reference image quality assessment based on log-derivative statistics of natural scenes. J. Electron. Imaging. 22(4), 043025 (2013).

  39. 39

    A. Mittal, R. Soundararajan, A. C. Bovik, Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett.20(3), 209–212 (2013).

  40. 40

    M. A. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A natural scene statistics approach in the dct domain. IEEE Trans. Image Process.21(8), 3339–3352 (2012).

  41. 41

    I. Motoyoshi, S. Nishida, L. Sharan, E. H. Adelson, Image statistics and the perception of surface qualities. Nature. 447(7141), 206 (2007).

  42. 42

    S. Gabarda, G. Cristóbal, Blind image quality assessment through anisotropy. J. Opt. Soc. Am. A. 24(12), 42–51 (2007).

  43. 43

    L. Dong, Y. Fang, W. Lin, C. Deng, C. Zhu, H. S. Seah, Exploiting entropy masking in perceptual graphic rendering. Signal Process Image Commun.33:, 1–13 (2015).

  44. 44

    A. S. Abutaleb, Automatic thresholding of gray-level pictures using two-dimensional entropy. Comput. Vis. Graph. Image Process.47(1), 22–32 (1989).

  45. 45

    A. Brink, Using spatial information as an aid to maximum entropy image threshold selection. Patt. Recog. Lett.17(1), 29–36 (1996).

  46. 46

    F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, P. Suetens, Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging. 16(2), 187–198 (1997).

  47. 47

    W. Zhang, A. Borji, Z. Wang, P. Le Callet, H. Liu, The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Trans. Neural Netw. Learn. Syst.27(6), 1266–1278 (2016).

  48. 48

    D. J. Field, Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A. 4(12), 2379–2394 (1987).

  49. 49

    L. Zhang, L. Zhang, X. Mou, D. Zhang, et al, Fsim: A feature similarity index for image quality assessment. IEEE Trans. Image Process.20(8), 2378–2386 (2011).

  50. 50

    H. R. Sheikh, Z. Wang, L. Cormack, A. C. Bovik, LIVE image quality assessment database release 2 (2005). http://live.ece.utexas.edu/research/quality.

  51. 51

    N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, et al, Image database tid2013: Peculiarities, results and perspectives. Signal Process. Image Commun.30:, 57–77 (2015).

  52. 52

    C. J. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov.2(2), 121–167 (1998).

  53. 53

    H. R. Sheikh, M. F. Sabir, A. C. Bovik, A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process.15(11), 3440–3451 (2006).

  54. 54

    L. Liu, Y. Hua, Q. Zhao, H. Huang, A. C. Bovik, Blind image quality assessment by relative gradient statistics and adaboosting neural network. Signal Process. Image Commun.40:, 1–15 (2016).

  55. 55

    X. Hou, L. Zhang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Saliency detection: A spectral residual approach (IEEEMinneapolis, 2007), pp. 1–8.

  56. 56

    A. Rose, Quantum effects in human vision. Adv. Biol. Med. Phys.5:, 211–242 (1957).

  57. 57

    J. Ren, J. Jiang, D. Wang, S. Ipson, Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process.4(4), 294–301 (2010).

  58. 58

    D. Boukerroui, J. A. Noble, M. Brady, On the choice of band-pass quadrature filters. J. Math. Imaging Vis.21(1-2), 53–80 (2004).

  59. 59

    P. Corriveau, A. Webster, Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii.Tech. Rep. (2003). https://www.itu.int/md/T01-SG09-C-0060.

  60. 60

    G. Zhao, S. Liu, Estimation of discriminative feature subset using community modularity. Sci. Rep.6(25040), 1–16 (2016).

  61. 61

    H. Tao, C. Hou, F. Nie, Y. Jiao, D. Yi, Effective discriminative feature selection with nontrivial solution. IEEE Trans. Neural Netw. Learn. Syst.27(4), 796–808 (2015).

  62. 62

    H. R. Sheikh, A. C. Bovik, Image information and visual quality. IEEE Trans. Image Process.15(2), 430–444 (2006).

  63. 63

    C. -C. Chang, C. -J. Lin, Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol.2(3), 27 (2011).

  64. 64

    D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 3rd (CRC Press, London, U.K., 2003).

  65. 65

    R. G. Congalton, A review of assessing the accuracy of classifications of remotely sensed data. Remote. Sens. Environ.37(1), 35–46 (1991).

Download references

Acknowledgements

The authors would like to thank Prof. Wen Yang and Prof. Hongyan Zhang for the valuable opinions they have offered during our heated discussions.

Funding

This study is partially supported by the National Natural Science Foundation of China (NSFC) (No. 61571334, 61671333), National High Technology Research and Development Program (863 Program) (No. 2014AA09A512), and National Key Research and Development Program of China (No. 2018YFB0504500).

Author information

XC conducted the experiments and drafted the manuscript. QZ and ML implemented the core method and performed the statistical analysis. GY designed the methodology. CH modified the manuscript. All authors read and approved the final manuscript.

Correspondence to Guangyi Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • Image entropy
  • Mutual information
  • No-reference image quality assessment
  • Support vector classifier
  • Support vector regression