No-Reference Color Image Quality Assessment: From Entropy to Perceptual Quality

This paper presents a high-performance general-purpose no-reference (NR) image quality assessment (IQA) method based on image entropy. The image features are extracted from two domains. In the spatial domain, the mutual information between the color channels and the two-dimensional entropy are calculated. In the frequency domain, the two-dimensional entropy and the mutual information of the filtered sub-band images are computed as the feature set of the input color image. Then, with all the extracted features, the support vector classifier (SVC) for distortion classification and support vector regression (SVR) are utilized for the quality prediction, to obtain the final quality assessment score. The proposed method, which we call entropy-based no-reference image quality assessment (ENIQA), can assess the quality of different categories of distorted images, and has a low complexity. The proposed ENIQA method was assessed on the LIVE and TID2013 databases and showed a superior performance. The experimental results confirmed that the proposed ENIQA method has a high consistency of objective and subjective assessment on color images, which indicates the good overall performance and generalization ability of ENIQA. The source code is available on github https://github.com/jacob6/ENIQA.


Introduction
In this era of information explosion, we are surrounded by an overwhelming amount of information. The diversification of information is dazzling, and images, as the source of visual information, contain a wealth of valuable information. Considering the incomparable advantages of image information over other types of information, it is important to process images appropriately in the different fields [1]. In image acquisition, processing, transmitting, and recording, image distortion and quality degradation are an inevitable result of the imperfection of the imaging system, the processing method, the transmission medium, and the recording equipment, as well as object movement and noise pollution [2,3,4]. Image quality has a direct effect on people's subjective feelings and information acquisition. For example, the quality of the collected images directly affects the accuracy and reliability of the recognition results in an image recognition process [5]. Another example is that remote conferencing and video-on-demand systems are affected by such factors as transmission errors, network latency, and so on [6]. Online real-time image quality control is thus introduced to ensure that the service provider dynamically adjusts the source location strategy, so as to meet the service quality requirements [7]. It is therefore not surprising that research into image quality assessment (IQA) has received extensive attention during the last two decades [8].
In accordance with the need for human participation, IQA methods can be divided into two classes: subjective image quality assessment methods and objective image quality assessment methods [9]. Subjective assessment is quantified by the human eye. In contrast, an objective IQA method focuses on automatic assessment of the images via a specific method by the use of computing equipment, with the ultimate goal of enabling a computer to act as a substitute for the human visual system (HVS) in viewing and perceiving images [10]. In practice, subjective assessment results are difficult to apply in real-time imaging systems due to their strong randomicity. Therefore, objective IQA methods have been widely studied [11]. According to the availability of a reference image, objective IQA methods can be classified as full-reference (FR), reduced-reference (RR), and no-reference (NR) methods [12]. In an FR method, an original "distortionfree" image is assumed to be supplied, as the assessment result is obtained through the comparison of the two images. With the advances of recent studies, the accuracy of this kind of method is getting better, despite its disadvantage of requiring a complete reference image, which is often not available in practical applications [13]. An RR method, which is also known as a partial reference method, does not make a complete comparison between the distorted image and the pristine one, but only compares certain features [14]. Conversely, an NR method, which is also called a blind image quality assessment (BIQA) method, requires no image as reference. Instead, the quality is estimated according to the features of the distorted image [12]. In many practical applications, a reference image will be inaccessible, and thus the NR-IQA methods have the most practical value and a very wide application potential [15].
In general, the current NR-IQA methods can be divided into two categories: application-specific and general-purpose assessment [16]. The former kind of method assesses the image quality of a specific distortion type and calculates the corresponding score. Common types of distortion include JPEG, JPEG2000 (JP2K), blur, contrast distortion, and noise. For images with compression degradation, Suthaharan et al. [17] proposed the visually significant blocking artifact metric (VSBAM) to estimate the degradation level caused by compression. For images with blur degradation, Ciancio et al. [18] utilized various spatial features and adopted a neural network model to assess the quality. The maximum local variation (MLV) method proposed by Khosro et al. [19] provides a fast method of blur level estimation. Rony et al. [20] put forward the concept of just noticeable blur (JNB) and the improved version of cumulative probability of blur detection (CPBD) [21]. For images with contrast distortion, Fang et al. [22] extracted features from the statistical characteristics of the 1-D image entropy distribution and developed an assessment model based on natural scene statistics (NSS) [23]. Hossein et al. [24] used higher orders of the Minkowski distance and entropy to apply an accurate measurement of the contrast distortion level. For images with noise, Yang et al. [25] proposed frequency mapping (FM) and introduced it into quality assessment. Gu et al. [26] proposed a training-free blind quality method based on the concept of information maximization. These methods, however, require prior knowledge of the distortion type, which limits their application range. Therefore, general-purpose NR-IQA methods based on training and learning are highly desirable.
General-purpose NR-IQA methods can be further divided into two types: explicit methods and implicit methods [27]. An explicit method usually contains two steps: feature extraction and model mapping [28]. Generally speaking, the features extracted in the first step represent the visual quality, while the mapping model in the second step bridges the gap between the features and the ground-truth quality score. An implicit general-purpose NR-IQA method constructs a mapping model via deep learning. Although deep networks nowadays generally have an independent feature extraction capability, it is difficult for the existing IQA databases to meet the huge demand for training samples, let alone the large amount of redundant data and network parameters. In addition, compared to preselected features, no clear physical meaning can be given by these automatically extracted features. Thus, manual feature extraction is still an effective and accurate way to summarize the whole image distortion.
According to the existing literature, the features extracted by explicit general-purpose NR-IQA methods are mainly concentrated in two categories. 1) The parameters of a certain model are obtained after a preprocessing operation such as mean-subtracted contrast-normalized (MSCN) coefficients [29]. The typical models are the generalized Gaussian distribution (GGD) model [30], the asymmetric GGD (AGGD) model [31], the Weibull distribution (WD) model [32], etc. 2) Physical quantities that reflect the characteristics of the image are obtained after preprocessing such as blocking and transformation. The typical methods are image entropy [33], wavelet sub-band correlation coefficients [34], etc. The mapping models from features to image quality are divided into three main types. 1) Classical methods such as BIQI [30], DI-IVINE [34], DESIQUE [35], and SSEQ [33] follow a two-stage framework. The probability of each type of distortion in the image is gauged by a support vector classifier (SVC) and denoted as p i in the first stage. The quality of the image along each of these distortions is then assessed by support vector regression (SVR) and denoted as q i in the second stage. Finally, the quality of the image is expressed as a probabilityweighted summation: Index = p i q i . 2) Methods such as NIQE [36] and IL-NIQE [32] are classified as "distortion-unaware", and they calculate the distance between a model fitted by features from a distorted image and an ideal model to estimate a final quality score, without identifying the type of distortion. 3) Methods such as BLIINDS-II [37], and BRISQUE [31] implement direct mapping of the image features to obtain a subjective quality score, also without distinguishing the different distortion types.
The existing general-purpose NR-IQA methods are faced with the following problems. 1) The color space of the image is less considered in these methods. 2) Some of the methods take advantage of only the statistical features of the pixels, and they ignore the spatial distribution of the features. Liu et al. [33] calculated the 1-D entropy of image blocks in the spatial and frequency domains, respectively, and used the mean, along with the skewness [38], of all the local entropy values as the image features to implement the SSEQ method. Gabarda et al. [39] approximated the probability density function by the spatial and frequency distribution to calculate the pixel-wise entropy on a local basis. The measured variance of the entropy is a function of orientation, which is used as an anisotropic indicator to estimate the fidelity and quality of the image [40]. Although some aggregated features of image grayscale distribution can be embodied in these onedimensional entropy-based methods, the spatial features of the distribution cannot be obtained.
In this paper, we introduce an NR-IQA method based on image entropy, namely, ENIQA. Firstly, by using the two-dimensional entropy (TE) [41] instead of the one-dimensional entropy [42], the proposed method better embodies the correlativity of pixel neighbors. Secondly, we calculate the mutual information (MI) [43] between the different color channels and the TE of the color image in two scales. we split the image into patches in order to exploit the statistical laws of each local region. During this process, visual saliency detection [44] is performed to weight the patches, and the less important ones are then excluded. Thirdly, a Log-Gabor filter [45,46] is applied on the image to simulate the neurons' selective response to stimulus orientation and frequency. After that, the MI between the different sub-band images and the TE of the filtered images are computed. The MI, as well as the mean and the skewness of the TE, is then utilized as the structural feature to determine the perceptual quality of the input image. Specifically, SVC and SVR are used to implement a two-stage framework for the final prediction. The experiments undertaken with the LIVE [47] and TID2013 [48] databases confirmed that the proposed ENIQA method performs well and shows a high consistency of subjective and objective assessment.
The rest of this paper is structured as follows. In Section 2, we introduce the structural block diagram of the novel IQA method proposed in this study, and we then present a detailed introduction to image entropy, correlation analysis of the RGB color space, and the log-Gabor filter. Section 3 provides an experimental analysis, and describes the testing and verification of the proposed method from multiple perspectives. Finally, Section 4 concludes with a summary of our work.

The Proposed ENIQA Framework
In order to describe the local information of the image, the proposed ENIQA method introduces the MI and the TE in both the spatial and frequency domains. Given a color image whose quality is to be assessed, the MI between the three channels --R, G, and B -is first calculated as feature group 1. To extract feature group 2, we convert the input image to grayscale and divide it into patches to calculate patch-wise entropy values. The obtained local entropy values are then pooled. The mean and the skewness then make up feature group 2. For the frequency domain features, we apply log-Gabor filtering at two center frequencies and in four orientations to the grayscale image and obtain eight sub-band images, on which blocking and entropy calculation are implemented. The eight pairs of mean and skewness values are obtained from each sub-band, and they constitute feature group 3. Furthermore, the MI between the sub-band images in the four different orientations and that between the two center frequencies are also calculated, respectively, as feature group 4 and feature group 5. The image is down-sampled using the nearest-neighbor method to capture multiscale behavior, yielding another set of 28 features. Thus, ENIQA extracts a total of 56 features for an input color image, as tabulated in Table 1. The right half of Fig. 1 illustrates the extraction process of the five feature groups.
After all the features are extracted, the proposed ENIQA method utilizes a two-stage framework to obtain a score index of the test image. In the first stage, the presence of a set of distortions in the image is estimated via SVC, giving the amount or probability of each type of distortion. In the second stage, for each type of distortion we consider, a support vector machine [49] is trained to perform a regression that maps the features to the objective quality. Finally, the quality score of the image is produced by a weighted summation, where the probabilities from the first stage are multiplied by the corresponding regressed scores from the second stage and then added altogether. The left half of Fig. 1 shows the structure of the two-stage framework.

Two-Dimensional Entropy
Image entropy is a statistical feature that reflects the average information content in an image. The onedimensional entropy of an image represents the information contained in the aggregated features of the grayscale distribution in the image, but does not contribute to the extraction of the spatial features. In order to characterize the local structure of the image, TE that describes the spatial correlation of the grayscale values is introduced.

SVR
Step 1 Step 2  Figure 1 The framework of the proposed ENIQA method After the color image X is converted to grayscale, the neighborhood mean of the grayscale image is selected as the spatial distribution feature. Let p(x) denote the proportion of pixels whose gray value is x in image X, the one-dimensional entropy of a gray image is defined as: The gray level of the current pixel and the neighborhood mean then form a feature pair, which is denoted as (x 1 , x 2 ), where x 1 is the gray level of the pixel (0 ≤ x 1 ≤ 255) and x 2 is the mean value of the neighbors (0 ≤ x 2 ≤ 255). The combined probability density distribution function of x 1 and x 2 is given by: where f (x 1 , x 2 ) is the frequency at which the feature pair (x 1 , x 2 ) appears, and the size of X is M × N . In our implementation, x 2 is based on the eight adjacent neighbors of the center pixel, as shown in Fig. 2. The discrete TE is defined as: x 1 x 2

Figure 2 A pixel and its eight neighborhoods
The TE based on the above can describe the comprehensive features of the grayscale information of the pixel and the grayscale distribution in the neighborhood of the pixel. We determined the TE for a reference image (monarch.bmp in the LIVE [47] database) and the five corresponding distorted images with the same distortion level but different distortion types. The statistical characteristics are shown in Fig. 3(a). All the differential mean opinion score (DMOS) [50] values are around 25, and the distortion types span JPEG and JP2K compression, additive white Gaussian noise (WN), Gaussian blur (GBlur), and fast fading (FF) Rayleigh channel distortion. Similarly, the same experiment was also carried out on monarch.bmp and the five corresponding distorted images with the same distortion type but different distortion levels (taking GBlur as an example), whose statistical characteristics are shown in Fig. 3(b). In Fig. 3, the abscissa axis represents the entropy and the vertical axis represents the normalized number of blocks. It can be seen from Fig. 3 that both the distortion level and the distortion type can be distinguished by TE. Consequently, the TE can be considered a meaningful feature. Inspired by [23,33,51], we utilize the mean and skewness as the most typical features to describe the histogram. The HVS automatically sets different priorities of attention for different regions of the observed image [44]. Thus, before calculating the statistical characteristics of the TE, we conducted visual saliency detection on the image, i.e., only the more important image patches were involved in the subsequent computation. To realize this, we first split the image into patches, pooled the patches according to human vision priority, and screened out the more significant ones. Then, according to the saliency values, we sorted the patches and calculated the mean and skewness of the local TE on the 80% more important patches only. In the experiments, we used the spectral residual (SR) method [52] to generate the saliency map of the image to be measured. It is worth noting that the frequencies of different pixel values (integers from 0 to 255) are counted in every important patch to estimate the probability distributions in Eq. (3).

Mutual Information
The application of colors in image display can not only stimulate the eye, but also allows the observer to perceive more information. The human eye has the ability to distinguish between thousands of colors, in spite of the perception of only dozens of gray levels [53]. There is a strong correlation between the RGB components of an image, which is embodied by the fact that the changes of individual color components reflected in the same region tend to be synchronized. That is to say, when the color of a certain area of a natural color image changes, the pixel gray values of the corresponding R, G, and B components also change at the same time. Moreover, although the gray value of a pixel varies with the color channels, different RGB components have quite good similarity and consistency in textures, edges, phases, and grayscale gradients [54]. Therefore, it is meaningful to characterize the MI between the three channels of R, G, and B.
Taking R and G as an example, it is assumed that x r and x g are the gray values of the red and green components of the input color image X, while p(x r ), p(x g ) are the grayscale probability distribution functions in the two channels. p(x r , x g ) is the joint probability distribution function. The MI between the R and G channels is then formulated as: where H 1 (X R ) and H 1 (X G ) are the one-dimensional entropy of the corresponding channel, and H 2 (X R , X G ) represents the two-dimensional entropy between the two images, which is defined as: p(x r , x g ) log 2 p(x r , x g ) (5)

Log-Gabor Filtering
It is known that the log-Gabor filter function conforms to the HVS and is consistent with the symmetry of the cellular response of the human eye at logarithmic frequency scales [55]. The log-Gabor filter eliminates the DC component, overcomes the bandwidth limitation of the conventional Gabor filter, and has a typical frequency response with a Gaussian shape [45]. Thus, it is much easier, as well as more efficient, for a log-Gabor filter to extract information on a higher band. The transfer function of a two-dimensional log-Gabor filter can be expressed as: In Eq. 6, f 0 gives the center frequency and θ 0 represents the center orientation. σ r and σ θ are the width parameters for the frequency and the orientation, respectively.
We distill the features in the frequency domain by implementing convolution on the log-Gabor filter and the image. The log-Gabor filter bank designed in this study consists of four filters, with orientations of 0 • , 45 • , 90 • , and 135 • , and two frequency bands. Eight sub-band images in four orientations and two bands are obtained after the input image is filtered.

Experimental Results
In order to assess the statistical performance of the proposed method, we carried out experiments on the LIVE [47] and TID2013 [48] databases. The LIVE database consists of 29 reference images and 779 distorted images of five distortion categories, while the TID2013 database contains 25 reference images and 3000 distorted images of 24 distortion categories. Of these 25 images, only 24 are natural images, so we only used the 24 natural images in the testing. At the same time, in order to ensure the consistency of the training and testing, we carried out the cross-database testing only over the four distortion categories in common with the LIVE database, namely, JP2K, JPEG, WN, and GBlur.
The indices used to measure the performance of the proposed method are the Spearman's rank-order correlation coefficient (SROCC), the Pearson linear correlation coefficient (PLCC), and the root-meansquare error (RMSE) between the predicted DMOS and the ground-truth DMOS [56]. A value close to 1 for SROCC and PLCC and a value close to 0 for RMSE indicates better correlation with human perception. It is worth noting that PLCC and RMSE were computed after the predicted DMOS values were fitted by a nonlinear logistic regression function with five parameters [50].
where z is the objective IQA score, f (z) is the IQA regression fitting score, and β i (i = 1, 2, · · · , 5) are the parameters of the regression function.

Correlation of Feature Vectors with Human Opinion
In this experiment, we assessed the discriminatory power of different feature combinations. With the feature groups listed in Table 1, we visually illustrate the relationship between image quality and features in the form of two-dimensional/three-dimensional scatter plots. As shown in Fig. 4, the different feature combinations are used as the axes, and each image in the LIVE database corresponds to a scatter point in the coordinate system. Furthermore, we use different markings to distinguish the five types of distortion and map the score of each image to the preset colormap. The ideal case is that the points with different distortion types are well separated. In this paper, we selected only a few representative images as examples. It can be seen from Fig. 4(a) and 4(b) that the scatter points of JPEG and WN have a very different spatial distribution than the other points, which allows them to be better distinguished. From Fig. 4(c) and 4(d), we can see that GBlur can be distinguished, to some extent, from the other types of distortion. However, for GBlur points with lower distortion levels, they cannot be easily separated from FF and JP2K, since the distributions of the scatter points of these three distortion types are very similar. As can be observed in Fig. 4(e) and 4(f), images with higher distortion levels of WN, GBlur, and FF are more easily distinguished from images with good quality. Nonetheless, GBlur and FF are indistinguishable. And still, JP2K points cause the reduction of distinguishability, as some of them are scattered close to the highly-distorted GBlur and FF points. According to Fig. 4, the number of features we selected seems too small to distinguish all the distortion types. Due to the limitation of human spatial cognition, it is difficult for us to show the discriminative ability of the features in a graphical way, such as a four-dimensional scatter plot, by selecting feature combinations of a higher dimension. In Section 3.6, we prove that when more features are selected (actually, we chose 56-dimensional features), the discriminatory power of the feature vector on the distortion type is further enhanced, which indicates the accuracy and reliability of our selection of features.

Correlation of Individual Feature Vectors with Human Perception
In order to quantitatively study the predictive ability of each feature vector, we performed a recombination of the features in Table 1, separately deployed specific subsets (feature vectors), and designed three limited models: 1) The feature vector f 1 -f 6 represents the MI between the three color channels on two scales, denoted as EN IQA 1 . 2) The feature vector f 7 -f 42 represents the mean and skewness of the TE on two scales, denoted as EN IQA 2 . 3) The feature vector f 43 -f 56 represents the MI between the sub-band images on two scales, denoted as EN IQA 3 .
We performed the assessment of these three limited models by 1000 train-test iterations of cross-validation. In each iteration, we randomly split the LIVE [47] database into two non-overlapping sets: a training set comprising 80% of the reference images as well as their corresponding distorted counterparts, and a test set composed of the remaining 20%. Finally, the median SROCC, PLCC, and RMSE values over 1000 trials are reported as the final performance indices, as shown in Table 2−4. It is not difficult to see that each feature vector has a different degree of correlation with the subjective assessment. Among them, the TE contributes the most to the performance of the method, followed by the MI between the sub-band images. Al-though the MI between the color channels contributes the least, it is a valuable extension of the TE feature.

Comparison with Other IQA Methods
To further illustrate the superiority of the proposed method, we compared ENIQA with 10 other state-ofthe-art IQA methods. The three FR-IQA approaches were the peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM) [9], and visual information fidelity (VIF) [57], and the seven NR-IQA approaches were BIQI [30], DIIVINE [34], BLIINDS-II [37], BRISQUE [31], NIQE [36], ILNIQE [32], and SSEQ [33]. To make a fair comparison, we used the same 80% training/20% testing protocol over 1000 iterations on all the models. The source code of all the methods was provided by the authors. In the training of an NR model, the LIBSVM toolkit [58] was used to implement SVC and SVR, both adopting a radialbasis function (RBF) kernel. We selected an e − SV R model for regression, and both the cost and the γ for RBF are set to 1e −4 . Since the FR approaches do not require a training procedure, they were only performed on distorted images, i.e., the reference images were not included. For the results listed in Table 5−7, the top performances in the FR-IQA indices and those in the NR-IQA indices are highlighted in bold. For the NR-IQA indices, we have also underlined the second-best results. It can be seen that the proposed ENIQA method performs well on the LIVE database. To be specific, ENIQA obtains the highest SROCC value for JPEG and WN, and the second-highest overall SROCC value among the NR methods listed in Table 5. In terms of PLCC and RMSE, ENIQA is superior to all the other NR methods, except BRISQUE, on JPEG and WN, and also ranks second in overall performance. Generally speaking, the overall performance of the proposed ENIQA method is superior to most of the other NR methods, and is even ahead of some of the classic FR methods such as SSIM. Besides, ENIQA is rather good at evaluating images with distortions of JPEG and WN.

Variation with Window Size
As mentioned above, since the local saliency difference of the image is considered, the proposed ENIQA method blocks the image with a window and counts the frequency of the gray values in each block to generate feature pairs before calculating the local TE. Table 8 shows the effect of different window sizes on the performance of the proposed method, where the highest SROCC value of each column is highlighted in bold. The average time consumption for evaluating a single image is also reported in Table 8. All the experiments are performed on a PC with Intel-i7-6700K CPU@4.0GHz, 16G RAM, MATLAB R2016a. The elapsed time is the mean value measured through 10 times of evaluations on the same 384×512×3 image. In order to visualize the trend, we also drew two line charts in Fig. 5, which intuitively illustrate the change of the elapsed time and the SROCC value with the selected window size. Figure 5 Line charts between the selected window size and the SROCC value as well as the average time consumed on evaluating a single image according to Table 8. When the window size is set to 8 × 8, the method achieves best SROCC performance It can be observed that the performance of the proposed method varies with the size of the window. As the window size increases, the SROCC value shows a trend of increasing first and then decreasing, reaching a peak at 8 × 8. At the same time, the runtime of the method mostly decreases monotonically with the increase of the window size. To make a compromise, we used K = L = 8 in this study. It should be pointed out that the overall SROCC value is still maintained above 0.9 when the window size is 32 × 32, which implies that the window size can be appropriately increased to trade accuracy for real-time performance in time-critical applications.

Statistical Significance Testing
In order to compare the performance of the different methods in a more intuitive way, Fig. 6 shows a box plot of the SROCC distributions for the 11 IQA methods (including the proposed ENIQA method) across 1000 train-test trials, which provides key information about the location and dispersion of the data. Meanwhile, we performed a two-sample t-test [59] between the methods, and the results are shown in Table 9. The null hypothesis is that the mean correlation value of the row is equal to the mean correlation value of the column at the 95% confidence level. The alternative hypothesis is that the mean correlation value of the row is greater (or less) than the mean correlation value of the column. Table 9 indicates which row is statistically superior ('1'), statistically equivalent ('0'), or statistically inferior ('−1') to which column. Although BRISQUE and SSEQ are statistically superior to ENIQA in Table 9, it can be seen from Fig. 6 that ENIQA outperforms all the other FR and NR approaches, except BRISQUE, in terms of the median value.

Classification Performance Analysis
We analyzed the classification accuracy of ENIQA on the LIVE database based on the two-stage framework. The average classification accuracies for all the distortion types across 1000 random trials are listed in Table 10. It can be seen from Table 10 that when the feature dimensions reach 56, the classification accuracy of JP2K reaches 71.6369%, which is fairly acceptable. In Section 3.1, however, we showed that it is extremely difficult to distinguish JP2K images by low-dimensional feature vectors. Thus, we can speculate that in the 56-dimensional space composed of the features, the distorted images of the JP2K type are discernible by the hyperplane constructed by SVC. Furthermore, in order to visualize which distortion categories may be confused with each other, we plotted a confusion matrix [60], as shown in Fig. 7. Each value in the confusion matrix indicates the probability of the distortion category on the vertical axis being confused with that on the horizontal axis. The numerical values are the average classification accuracies of the 1000 random trials.
It can be seen from Table 10 and Fig. 7 that WN cannot easily be confused with the other distortion categories, while the other four distortion categories are more easily confused. As FF consists of JP2K followed by packet loss, it is understandable that FF distortion is more easily confused with JP2K compression distortion. From Fig. 3, we can also see that the TE distributions of WN and JPEG are very specific, while JP2K, GBlur, and FF have quite similar TE distributions, which results in them being more easily confused.

Database Independence
In order to test the generalization ability of the assessment model to different samples, we trained the model on the whole LIVE database and tested it on the TID2013 database, noting that we only chose distortion types in common with the LIVE database (JP2K, JPEG, WN, and GBlur). The computed performance indices are shown in Table 11, and the top performances for the FR-IQA indices and those for the NR-IQA indices are highlighted in bold. For the NR-IQA indices, we have also underlined the secondbest results. It is clear that the proposed ENIQA method remains competitive on TID2013, with a superior performance to all the other NR methods, including BRISQUE, which shows an excellent performance on the LIVE database. Fig. 8 shows the results of the scatter plot fitting of ENIQA on the LIVE and TID2013 databases. As in the previous experiments, when performing the scatter plot experiment on the LIVE database, we trained with the random 80% of the images separated by content in the LIVE database and then tested with the remaining 20%, for which the results are shown in Fig. 8(a). When conducting the experiment on the TID2013 database, we trained the model on the entire LIVE database and then tested it on the selected portion of the TID2013 database, for which the results are given in Fig. 8(b). It can be observed from Fig. 8 that the scatter points are evenly distributed in the entire coordinate system and have a strong linear relationship with DMOS/MOS, which further proves the superior overall performance and generalization ability of the proposed ENIQA method.

Conclusions
In this paper, we have proposed a general-purpose NR-IQA method called entropy-based no-reference image quality assessment (ENIQA). Based on the concept of image entropy, ENIQA combines log-Gabor filtering and saliency detection for feature extension and accuracy improvement. To construct an effective feature vector, ENIQA extracts the structural information of the input color images, including the MI and the TE in both the spatial and the frequency domains. The image quality score is then predicted by SVC and SVR. The proposed ENIQA method was assessed on the LIVE and TID2013 databases, and we carried out cross-validation experiments and cross-database experiments to compare it with several other FR-and NR-IQA approaches. From the experiments, ENIQA showed a superior overall performance and generalization ability when compared to the other state-of-theart methods.