A no-reference objective image quality metric based on perceptually weighted local noise

This work proposes a perceptual based no-reference objective image quality metric by integrating perceptually weighted local noise into a probability summation model. Unlike existing objective metrics, the proposed no-reference metric is able to predict the relative amount of noise perceived in images with different content, without a reference. Results are reported on both the LIVE and TID2008 databases. The proposed no-reference metric achieves consistently a good performance across noise types and across databases as compared to many of the best very recent no-reference quality metrics. The proposed metric is able to predict with high accuracy the relative amount of perceived noise in images of different content.


Introduction
Reliable assessment of image quality plays an important role in meeting the promised quality of service (QoS) and in improving the end user's quality of experience (QoE). There is a growing interest to develop objective quality assessment algorithms that can predict perceived image quality automatically. These methods are highly useful in various image processing applications, such as image compression, transmission, restoration, enhancement, and display. For example, the quality metric can be used to evaluate and control the performance of individual system components in image/video processing and transmission systems.
One direct way to evaluate video quality is through subjective tests. In these tests, a group of human subjects are asked to judge the quality under a predefined viewing condition. The scores given by observers are averaged to produce the mean opinion score (MOS). However, subjective tests are time-consuming, laborious, and expensive. Objective image quality (IQA) assessment methods can be categorized as full reference (FR), reduced reference (RR), and no reference (NR) depending on whether a reference, partial information about a reference, or no reference is used for calculation. Quality assessment without a reference is challenging. A no-reference metric is not relative to a reference image, but rather an absolute value is computed based on some characteristics of the test image.
Of particular interest to this work is the no-reference noisiness objective metric. Noisiness and blurriness are two key distortions in multiple applications, and typically there is a tradeoff to balance between noisiness and blurriness. For example, in soft-thresholding for image denoising [1], the image could be blurry when the threshold is high, while the image could remain noisy when the threshold is low. Also, in Wiener-based super-resolution [2], too much regularization will result in less noise at the expense of more blur. The reconstructed image could be blurry when the auto-correlation function is modeled to be too flat, while the reconstructed image could be noisy when the auto-correlation function is modeled to be too sharp. No-reference image sharpness/blur metrics have been widely discussed [3,4]. However, these image sharpness/blur metrics typically fail in the presence of noise. The sharpness metric may increase when noise increases. A no-reference noise-immune image sharpness metric was also proposed [5]. Furthermore, all the edge-based sharpness metrics can be easily applied in the wavelet domain as described in [5] to provide resilience to noise. Still, it lacks the ability to assess the impairment due to noise. For visual quality assessment of noisiness, many full-reference metrics are presented in [6], such as peak signal-to-noise ratio (PSNR), multi-scale structural similarity (MS-SSIM) [7], noise quality measure (NQM) [8], and information fidelity criterion (IFC) [9]. http://jivp.eurasipjournals.com/content/2014/1/5 However, these full-reference metrics require the reference image for calculation. There is a need to develop a no-reference noisiness quality metric. Furthermore, such noisiness metric could be further combined with the noreference blur metrics [3,4] to provide a better prediction of image quality for several applications including superresolution, image restoration, and other multiply distorted images. A global estimate of image noise variance was used as a no-reference noisiness metric in [10]. The histogram of the local noise variances is used to derive the global estimate. However, the locally perceived visibility of noise is not considered. Similarly in [11], noisiness is expressed by the sum of estimated noise amplitudes and the ratio of noise pixels. Both the metrics of [10,11] do not account for the effects of locally varying noise on the perceived noise impairment and they do not exploit the characteristics of the human visual system (HVS).
To tackle this issue, this paper firstly presents a fullreference image noisiness metric which integrates perceptually weighted local noise into a probability summation model. This proposed metric can predict the perceptual noisiness in images with high accuracy. In addition, a noreference objective noisiness metric is derived based on local noise standard deviation, local perceptual weighting, and probability summation. The experimental results show that the proposed FR and NR metrics show better and more consistent performance across databases and distortion types, when compared with several very recent FR and NR metrics.
The remainder of this paper is organized as follows. A perceived noisiness model based on probability summation is presented first followed by details on the contrast sensitivity thresholds computation. A full-reference perceptually weighted noise (FR-PWN) metric is proposed next based on perceptual weighting using the computed contrast sensitivity thresholds and probability summation. After that, a no-reference perceptually weighted noise (NR-PWN) metric is further derived. Performance results and comparison with existing metrics are presented followed by a conclusion.

Perceptual noisiness model based on probability summation
The PSNR simply calculates the difference point by point. However, the human visual system should be taken into consideration since the visual impairment due to the same noise could be perceived differently based on the local characteristics of the visual content. Contrast is a key concept in vision science because the information in the visual system is represented in terms of contrast and not in terms of the absolute level of light. So, the relative changes in luminance are important rather than the absolute ones [3]. The contrast sensitivity threshold measures the smallest contrast or the just-noticeable difference (JND) that yields a visible signal over a uniform background. The proposed metric makes use of JND for calculating the probability of noise detection. Even when the noise is uniform, the impact of the noise will be more visible in image regions with a relatively lower JND. Consider the noisy signal y as where y (i, j) is the original undistorted image. The probability of detecting a noise distortion at location (i, j) can be modeled as an exponential having the following form where JND(i, j) is the JND value at (i, j) and it depends on the mean intensity in a local neighborhood region surrounding pixel (i, j). β is a parameter whose value is chosen to maximize the correspondence of (2) with the experimentally determined psychometric function for noise detection. In psychophysical experiments that examine summation over space, a value of about 4 has been observed to correspond well to probability summation [12]. A less-localized probability of noise detection can be computed by adopting the 'probability summation' hypothesis which pools the localized detection probabilities over a region of interest, R [13]. The probability summation hypothesis is based on the following two assumptions: (1) A noise distortion is detected if and only if at least one detector senses the presence of a noise distortion; (2) The probabilities of detection are independent; i.e., the probability that a particular detector will signal the presence of a distortion is independent of the probability that any other detector will. The measurement of noise detection in a region R is then given by Substituting (2) into (3) yields where From (4), it can be seen that P noise (R) increases if D R increases and vice versa. So D R can be used as a noisiness metric over region R. However, the probability of noise detection does not directly translate to noise annoyance level. In this work, the β parameter in (4) and (5) is http://jivp.eurasipjournals.com/content/2014/1/5 replaced with α = β × s, which has the effect of steering the slope of the psychometric function in order to translate noise detection levels into noise annoyance levels. The factor s was found experimentally to be 1/16 resulting in a value of 0.25 for α. More details about how JND(i, j) is computed is given in the Section 'Perceptual contrast sensitivity threshold model and JND computation'.

Perceptual contrast sensitivity threshold model and JND computation
Multiple parameters including screen resolution, the viewing distance, the minimum display luminance, and the maximum display luminance are considered in the contrast sensitivity model [14]. The thresholds are computed locally for each block. Firstly, the contrast sensitivity threshold t 128 is generated for a region with a mean grayscale value of 128 as follows: where L min and L max are the minimum and maximum display luminances, M g is the total number of gray scale levels, and T is given by the following parabolic approximation [15]: g 0,1 = log 10 T min + K log 10 1 2Nω y − log 10 f min 2 , (8) In (8) and (9), T min is the luminance threshold at frequency, f min , where the threshold is minimum. ω x and ω y represent, respectively, the horizontal width and the vertical height of a pixel in degrees of visual angle, K is the steepness of the parabola. N is the local neighborhood size and is set to 8. T min , f min , and K can be computed as [15]: The values of the constants in (10) - (12) are [15] L T = 13.45 cd/m 2 , S 0 = 94.7, α T = 0.649, α f = 0.182, f 0 = 6.78 cycle/deg, L f = 300 cd/m 2 , K 0 = 3.125, α K = 0.0706 and L K = 300 cd/m 2 . Equations 10 to 12 give T min , f min , and K as functions of local background luminance L. For a background intensity value of 128, given a gammacorrected display, the corresponding local background luminance is computed as follows: where L min and L max denote the minimum and maximum luminances of the display. Once the JND for a region with mean grayscale value of 128, t 128 , is calculated using (6), the JND for regions with other mean grayscale values are approximated as follows [16]: where I n 1 ,n 2 is the intensity level at pixel location (n 1 , n 2 ) in a N×N region surrounding pixel (i, j). It should be noted that the indices (n 1 , n 2 ) are used to denote the location with respect to the top left corner of the N×N region, while the indices (i, j) are used to denote the location with respect to the top left corner of the whole image. Mean(I n 1 ,n 2 ) is the mean value over the considered N×N region surrounding pixel (i, j). α T is a correction exponent that controls the degree to which luminance masking occurs and is set to α T = 0.649, as given in [16]. JND(i, j) in (5) is computed using (14). In our implementation, N = 8 was used for the N×N region.

Full-reference noisiness metric
This work firstly presents a full-reference noisiness metric based on the probability summation model presented in the previous sections. Figure 1 shows the block diagram of the proposed full-reference FR-PWN metric. The input image is first divided into blocks of M×M. The block will be the region of interest R b . The block size is chosen to correspond with the foveal region. Let r be the visual resolution of the display in pixels per degree, v the viewing distance in centimeters, and d the display resolution in pixels per centimeter. Then the visual resolution can be calculated as follows [17]: In the HVS, the foveal region has the highest visual acuity and corresponds to about 2°of visual angle. The number of pixels contained in the foveal region can be computed as (2 r ) 2 [17]. For example, for a viewing distance of 60 cm and 31.5 pixels/cm display, the number of pixels contained in the foveal region is (64) 2 , corresponding to a block size of 64×64. Using (5), the perceived noise distortion within a block R b is given by where JND(i, j) is the JND at location (i, j) and is computed using (14). Using the probability summation model as discussed previously, the noisiness measure D for the whole image I is obtained by using a Minkowski metric for inter-block pooling as follows: The resulting distortion measure, D, normalized by the number of blocks, is adopted as the proposed fullreference metric FR-PWN. This full-reference metric not only works for noisiness, but could also work for other additive distortions.

No-reference noisiness metric
In the previous section, a full-reference quality metric is presented based on the probability summation model and JND. However, in many cases, the reference image is not available, so error(i, j) in (16) can not be computed. Therefore, there is a need to develop a no-reference noisiness quality metric. Figure 2 shows the block diagram which summarizes the proposed no-reference NR-PWN metric. From (14), it can be seen that JND(i, j) depends on the local mean of the neighborhood surrounding (i, j). For the proposed NR metric, the local mean for a pixel (i, j) belonging to a region R N is taken to be the mean of region R N and is denoted by mean(R N ). Consequently, Equation 14 can be written as follows: Now only one JND(R N ) will be calculated for all pixel (i, j) belonging to the same R N , and different JND(R N ) will be calculated separately for each R N within the considered region of interest block R b . The size of the block R b is chosen to approximate a foveal region (e.g., 64 × 64 as discussed previously). Using p,q as the indices within a local neighborhood R N , the proposed NR metric is derived from the presented FR metric (16) as follows: In (19) where N×N is the size of each local neighborhood R N . Also, if error(p, q) is a Gaussian distribution process with a mean of 0 and a standard deviation of σ R N , using the central absolute moments of a Gaussian distribution process [18], it can be shown that (20) where (t) is the gamma function Using (20), D R b in (19) can be written as follows: For a given α, define a constant C as Then, the proposed NR noisiness metric over the region R b is given by As in (17), the noisiness metric over the image I can be computed as follows: The resulting noise measure D, normalized by the number of blocks, is adopted as the proposed no-reference NR-PWN metric.
In (24), the noise variance σ R N is estimated directly from the test image, without the reference image. Multiple methods are available to estimate the noise variance, such as fast noise variance estimation (FNV) [19] and generalized cross validation (GCV)-based method [20,21]. In our implementation, the GCV method was used for computing the local noise variance. Similar results were also obtained using the FNV [19] noise estimation method.

Performance results
The performance of the proposed FR-PWN and NR-PWN metrics is assessed using the LIVE [6] and TID2008 [22] databases.The LIVE database [6] consists of 29 RGB color image. The images are distorted using different distortion types: JPEG2000, JPEG, Gaussian blur, white noise, and bit errors. The difference mean opinion score (DMOS) for each image is provided. The white noise part of the LIVE database includes 174 images with a noise standard deviation ranging from 0 to 2. White noise was added to the RGB components of images after scaling between 0 and 1. All of the white noise images (174 images) from the LIVE database are used in our experiments. The TID2008 database [22] consists of 25 reference images (512 × 384) and 1,700 distorted images. The images are distorted using 17 types of distortions, including additive Gaussian noise, high-frequency noise, JPEG2000, and Gaussian blur. The MOS was obtained using a total of 838 observers with 256,428 comparisons of the visual quality of distorted images. All of the additive Gaussian noise image (100 images) and high-frequency noise images (100 images) from the TID2008 database are used in our experiments. As mentioned in [22], additive zero-mean noise is often present in images and it is commonly modeled as a white Gaussian noise. This type of distortion is included in most studies of quality metric effectiveness. Highfrequency noise is an additive non-white noise which can be used for analyzing spatial frequency sensitivity of the HVS [23]. High-frequency noise is typical in lossy image compression and watermarking.
To measure how well the proposed metrics correlate with the provided subjective scores, the correlation coefficients adopted by VQEG [24] are used, including the Pearson's linear correlation coefficient (PLCC) and the Spearman rank-order correlation coefficient (SROCC). A four-parameter logistic function as suggested in [24] is used prior to computing the Pearson's linear correlation coefficient: where M i is the quality metric for image i, MOS P i is the predicted MOS or DMOS. Figure 3 shows the DMOS http://jivp.eurasipjournals.com/content/2014/1/5

Figure 3
Correlation between the predicted score of NR-PWN and DMOS using the LIVE database.
score and predicted DMOS obtained using NR-PWN for the LIVE database. Table 1 shows the evaluation results for the LIVE database. In addition to the proposed FR-PWN and NR-PWN metrics, the performance results of various existing metrics are presented for comparison, including seven full-reference metrics, DCTune [25], picture quality scale (PQS) [26], NQM [8], Fuzzy S7 [27], blockwise spectral distance measure (BSDM) [28], MS-SSIM [7], IFC [9], one reduced reference metric quality-aware images (QAI) [29], and seven no-reference metrics, blind image integrity notator using DCT statistics (BLINDS-II) (SVM) [30], BLINDS-II (Prob.) [30], hybrid no-reference (HNR) [31], blind/referenceless image spatial quality evaluator (BRISQUE) [32], naturalness image quality evaluator (NIQE) [33], blind image quality index (BIQI) [34], and learning a blind measure of perceptual image quality (LBIQ) [35]. The benchmarks of full-reference metrics are obtained from [6], and the others are obtained from their respective authors or available implementations. The shown 'N/A' in Table 1 means the value is not provided in the literature. Table 2 shows the performance of the proposed FR-PWN and NR-PWN metrics using images with different types of distortion as provided by the TID2008 database [22]. The proposed metric is compared with three full-reference metrics DCTune [25], NQM [8], MS-SSIM [7], and six very recent no-reference metrics that reported results for TID2008: BLINDS-II (SVM) [30], BLINDS-II (Prob.) [30], BRISQUE [32], NIQE [33], general regression neural network (GRNN) [36], and Li et al. [37]. The benchmarks of full-reference metrics are obtained from [22], and the others are obtained from their respective authors or available implementations. The shown N/A in Table 2 means the value is not provided in the literature. The proposed metric uses the same parameters as used with the LIVE database without any training. From Table 1, it can be observed that the proposed FR-PWN metric outperforms the existing FR metrics for the LIVE database while achieving a similar performance as the NQM [8] metric. Table 2 shows that the proposed FR-PWN metric outperforms the existing FR metrics for the TID2008 database, on both Gaussian noise and highfrequency noise. The proposed NR-PWN metric comes close in performance to the proposed FR-PWN metric for both the LIVE and the TID2008 databases. In particular, Table 1 shows that the proposed NR-PWN metric performs better than existing NR metrics except for the Blinds-II and BRISQUE metrics in terms of PLCC. The proposed NR-PWN metric outperforms all the considered NR metrics in terms of SROCC and even existing FR metrics except the full-reference NQM [8] for the LIVE database. Table 2 shows that the proposed NR-PWN metric surpasses existing NR metrics except BRISQUE [32] for additive Gaussian noise, and that it significantly outperforms existing FR and NR metrics for high-frequency noise. Particularly, it should be noted that the performance of BRISQUE [32] drops dramatically on high-frequency noise and is significantly lower than the proposed metric. In addition, many of the shown state-ofthe-art metrics including BLINDS-II [30], NIQE [33], and BRISQUE [32] use 80% of the data for training [30,32,33]. Consequently, these may not perform well on new distortions outside the training set, such as high-frequency noise ( Table 2). In contrast, the proposed NR-PWN does not require training and still performs well on this new distortion.
Furthermore, it is worth indicating that as shown in Tables 1 and 2, the existing metrics exhibit differences in performance across different databases and types of distortions. It is noted in [38] that the performance of many image quality metrics could be quite different across databases. The difference in performance can be attributed to the differences in quality range, distortions, and contents across databases. Despite this, the results obtained show that the proposed FR-PWN and NR-PWN metrics achieve consistently a good performance across noise types (white noise and high-frequency noise) and across databases as compared to the existing quality metrics. For example, the proposed FR-PWN metric exhibits a performance similar to NQM [8] for the LIVE database, while it significantly outperforms NQM [8] for white noise images from TID2008. Also, the existing BLINDS-II [30] performs fairly well for the LIVE database, but its performance significantly decreases when applied to TID2008. It is also interesting to note that although the mathematical derivations for the proposed NR-PWN is based on white noise, the proposed NR-PWN metric performs consistently well for high-frequency noise, a non-white noise.
The performance results presented in Tables 1 and 2 for the proposed NR-PWN metric are obtained using the GCV method [20,21] for local variance estimation. If the local variance is estimated using the FNV method [19], the resulting SROCC values are 0.9627 for the LIVE database additive Gaussian noise, 0.7850 for the TID2008 database additive Gaussian noise, and 0.9210 for the TID2008 database high-frequency noise, respectively.
Finally, the calculation of the proposed FR-PWN and NR-PWN metrics involves parameters of viewing conditions such as maximum luminance L max of the monitor. However, the performance of the proposed metrics are resilient to different L max values. In Tables 1 and 2, the proposed metrics are calculated using L max = 175 cd/m 2 . The L max in real viewing conditions may vary from 100 cd/m 2 for CRT monitors to 300 cd/m 2 for LCD monitors. Table 3 shows the performance of the proposed metric in terms of SROCC using different values of L max , for both the LIVE and the TID2008 databases. It can be observed that the proposed metrics are not sensitive to the selection of L max .

Conclusions
This paper proposed both a full-reference and a noreference noisiness metrics. The no-reference noisiness metric is derived from the proposed full-reference metric and integrates noise variance estimation and perceptual contrast sensitivity thresholds into a probability summation model. The proposed metrics can predict the relative noisiness in images based on the probability of noise detection. Results show that the proposed metrics achieve a consistently good performance across noise types and across databases as compared to the existing quality metrics. Further work can be performed to develop a noreference quality metric for multiply distorted images.