 Research
 Open Access
 Published:
DIBRsynthesized image quality assessment based on morphological multiscale approach
EURASIP Journal on Image and Video Processing volume 2017, Article number: 4 (2016)
Abstract
The depthimagebased rendering (DIBR) algorithms used for 3D video applications introduce new types of artifacts mostly located around the disoccluded regions. As the DIBR algorithms involve geometric transformations, most of them introduce nonuniform geometric distortions affecting the edge coherency in the synthesized images. Such distortions are not handled efficiently by the common image quality assessment metrics which are primarily designed for other types of distortions. In order to better deal with specific geometric distortions in the DIBRsynthesized images, we propose a fullreference metric based on multiscale image decomposition applying morphological filters. Using nonlinear morphological filters in multiscale image decomposition, important geometric information such as edges is maintained across different resolution levels. Edge distortion between the multiscale representation subbands of the reference image and the DIBRsynthesized image is measured precisely using mean squared error. In this way, areas around edges that are prone to synthesis artifacts are emphasized in the metric score. Two versions of morphological multiscale metric have been explored: (a) Morphological Pyramid Peak SignaltoNoise Ratio metric (MPPSNR) based on morphological pyramid decomposition, and (b) Morphological Wavelet Peak SignaltoNoise Ratio metric (MWPSNR) based on morphological wavelet decomposition. The performances of the proposed metrics have been tested using two databases which contain DIBRsynthesized images: the IRCCyN/IVC DIBR image database and MCL3D stereoscopic image database. Proposed metrics achieve significantly higher correlation with human judgment compared to the stateoftheart image quality metrics and compared to the tested metric dedicated to synthesisrelated artifacts. The proposed metrics are computationally efficient given that the morphological operators involve only integer numbers and simple computations like min, max, and sum as well as simple calculation of MSE. MPPSNR has slightly better performances than MWPSNR. It has very good agreement with human judgment, Pearson’s 0.894, Spearman 0.77 when it is tested on the MCL3D stereoscopic image database. We have demonstrated that PSNR has particularly good agreement with human judgment when it is calculated between images at higher scales of morphological multiscale representations. Consequently, simplified and in essence reduced versions of multiscale metrics are proposed, taking into account only detailed images at higher decomposition scales. The reduced version of MPPSNR has very good agreement with human judgment, Pearson’s 0.904, Spearman 0.863 using IRCCyN/IVC DIBR image database.
Introduction
The advanced 3D video (3DV) systems are mostly based on multiview video plus depth (MVD) format [1] as the recommended 3D video format adopted by the moving picture experts group (MPEG). In the 3DV system, smaller number of captured views is transmitted and greater number of views is generated at the receiver side from the transmitted texture views and their associated depth maps using depthimagebased rendering (DIBR) technology. DIBR techniques can be used to generate views for different 3D video applications: free viewpoint television, 3DTV, 3D technology based entertainment products, and 3D medical applications. The perceptual quality of the synthesized view is considered as the most significant evaluation criterion for the whole 3D video processing system. Reliable quality assessment metric for synthesized views is of a great importance for the 3D video technology development. The use of subjective tests is expensive, time consuming, cumbersome, and practically no feasable in systems where realtime quality score of an image or video sequence is needed. Objective metrics are intended to predict human judgment. The reliability of objective metrics is based on their correlation to subjective assessment results.
The evaluation of DIBR system depends on the application. The main difference between free viewpoint video (FVV) and 3DTV is the stereopsis phenomenon (fusion of left and right views in human visual system) existing in 3DTV. FVV does not have to be used in 3D context. It can be applied in 2D context. In this paper, the quality assessment of still images from MVD video sequences in both 2D and 3D contexts as a first step of 3D quality assessment is concerned. The evaluation of still images is important scenario in the case when the user switches the video in pause mode [2].
For the comparision of DIBR algorithms, virtual views synthesized from the uncompressed data which contain only synthesis artifact need to be evaluated. When encoding either depth data or color sequences before performing the synthesis, compressionrelated artifacts are combined with synthesis artifact. In this paper, the distortions introduced only by view synthesis algorithms are evaluated using the IRCCyN/IVC DIBR image dataset [3, 4] and part of the MCL3D image dataset [5, 6].
DIBR algorithms introduce new types of artifacts mostly located around disoccluded regions [2]. They are not scattered in the entire image such as 2D video compression distortions. As DIBR algorithms involve geometric transformations, most of them introduce mainly geometric distortions affecting edges coherency in the synthesized images. These artifacts are consequently challenging for standard quality metrics, usually tuned for other types of distortions. In order to better deal with specific geometric distortions in DIBRsynthesized images, we propose multiscale image quality assessment metric based on morphological filters in multiresolution image decomposition. Due to multiscale character of primate visual system [7], the introduction of multiresolution image decomposition in the image quality assessment contributes to the improvement of metric performances relative to singleresolution method. Introduced nonlinear morphological filters in multiresolution image decomposition maintain important geometric information such as edges on their true positions, neither drifted nor blurred, across different resolution levels [8]. Edge distortion between appropriate subbands of the multiscale representations of the reference image and the DIBRsynthesized image is precisely measured pixelbypixel using mean squared error (MSE). In this way, areas around edges that are prone to synthesis artifacts are emphasized in the metric score. Mean squared errors of subbands are combined into multiscale mean squared error, which is transformed into multiscale peak signaltonoise ratio measure. More precisely, two types of morphological multiscale decompositions for the multiscale image quality assessment (IQA) have been explored: morphological bandpass pyramid decomposition in the Morphological Pyramid Peak SignaltoNoise Ratio measure (MPPSNR) and morphological wavelet decomposition in the Morphological Wavelet Peak SignaltoNoise Ratio measure (MWPSNR). Morphological bandpass pyramid decomposition can be interpreted as a structural image decomposition tending to enhance image features such as edges which are segregated by scale at the various pyramid levels [9]. Using nonlinear morphological wavelet decomposition, geometric structures such as edges are better preserved in the lower resolution images compared to the case when the linear wavelets are used in the decomposition [10]. Both separable and true nonseparable morphological wavelet decompositions using the lifting scheme have been investigated.
Both measures, MPPSNR and MWPSNR, are highly correlated with the judgment of human observers, much better than standard IQA metrics and much better than their linear counterparts. They have better performances than tested metric dedicated to synthesisrelated artifacts also. Since the morphological operators involve only integers and only max, min, and addition in their computation, as well as simple calculation of MSE, the proposed morphological multiscale metrics are of low computational complexity.
Moreover, it is experimentaly shown that PSNR has very good agreement with human judgment when it is calculated for the subbands at higher morphological decomposition scales. We propose the reduced versions of morphological multiscale measures, reduced MPPSNR, and reduced MWPSNR, using only detail images from higher decomposition scales. The performances of the reduced versions of the morphological multiscale measures are improved comparing to their full versions.
In the next section, the distortion of the DIBRsynthesized view is shortly described. Previous work on the quality assessment of the DIBRsynthesized views and multiscale image quality assessment is also shortly reviewed in Section 2. In Section 3, we describe two versions of the proposed multiscale metric, based on two types of multiresolution decomposition schemes, morphological pyramid, and morphological wavelets. Description of the distortion computation stage and pooling stage of the proposed multiscale measures is given also in Section 3. The performances of MPPSNR and MWPSNR and discussion of results are presented in Section 4, while the conclusion is given in Section 5.
Related works
Distortion in the DIBRsynthesized view
The synthesis process changes the pixels position in the synthesized image and induces new types of distortion in DIBRsynthesized views. View synthesis noise mainly appears along object edges. Typical DIBR artifacts include object shifting, geometric distortions, edge displacements or misalignments, boundary blur, and flickering. Incorrect depth map induces object shifting in the synthesized image. Object shifting artifact or ghost artifact manifests as slight translation or resize of an image regions due to depth map errors. A large number of tyny geometric distortions are caused by the depth inaccuracy and the numerical rounding operation of pixel positions. Geometric distortions appear in the synthesized images because the pixels are projected to wrong positions. Blurry regions appear due to inpainting method used to fill the disoccluded areas. Incorrect rendering of textured areas appears when inpainting method fails in filling complex textured areas. When the objects move, the distortion around edges is more noticeable. The view synthesis distortion flickering locates on the edge of the foreground object which has a movement. Flickering can be observed as significant and highfrequency alternated variation between different luminance levels [11]. The temporal flicker distortion is the most significant difference between the traditional 2D video and the synthesized video. Some of the typical artifacts due to DIBR synthesis are shown on Fig. 1.
Quality assessment of DIBRsynthesized view
The evaluation of DIBR views synthesized from uncompressed data using standard image quality metrics has been discussed in literature for still images from FVV in 2D context [3] using IRCCyN/IVC DIBR image database. It has been demonstrated that 2D quality metrics originally designed to address image compression distortions are very far to be effective to assess the visual quality of synthesized views.
Fullreference objective image quality assessment metrics, VSQA [12], and 3DswIM [13], have been proposed to improve the performances obtained by standard quality metrics in the evaluation of the DIBRsynthesized images. Both metrics are dedicated to synthesisrelated artifacts without compressionrelated artifacts and both metrics are tested using IRCCyN/IVC DIBR images dataset. VSQA [12] metric dedicated to view synthesis quality assessment is aimed to handle areas where disparity estimation may fail. It uses three visibility maps which characterize complexity in terms of textures, diversity of gradient orientations, and presence of high contrast. SSIMbased VSQA metric achieves the gain of 17.8 % over SSIM in correlation with subjective measurements. 3DswIM [13], relies on a comparision of statistical features of wavelet subbands of the original and DIBRsynthesized images. Only horizontal detail subbands from the first level of Haar wavelet decomposition are used for the degradation measurement. A registration step is included before the comparison to ensure shiftingresilience property. A skin detection step weights the final quality score in order to penalize distorted blocks containing skinpixels based on the assumption that a human observer is most sensitive to impairments affecting human subjects. It was reported that 3DswIM metric outperforms the conventional 2D metrics and tested DIBRsynthesized views dedicated metrics.
Edgebased structural distortion indicator addressing the distortion related to DIBR systems is proposed in [14]. The method relies on the analysis of edges in the synthesized view. The proposed method does not assess the image quality, but it is able to detect the structural distortion. Since it does not take the color consistency into account, the method remains a tool for assessing the structural consistency of an image.
Visionbased quality measures for 3D DIBRbased video, both fullreference FR3VQM [15], and noreference NR3VQM [16] are proposed to evaluate the quality of stereoscopic 3D video generated by DIBR. Both measures are a combination of three measures: temporal outliers, temporal inconsistencies, and spatial outliers, using ideal depth. Ideal depth is derived for both noreference and for fullreference metric for distortionfree rendered video. 3VQM metrics show better performances than PSNR and SSIM using a database of DIBRgenerated video sequences.
Quality metric proposed in [17] is designed for the evaluation of synthesized images which contain artifacts introduced by the rendering process due to depth map errors. It consists of two parts. One part is the calculation of the conventional 2D metric after the consistent object shifts. After shift compensation, the 2D QA model matches the subjective quality score better. The other part is the calculation of the structural score by the Hausdorff distance. The Hausdorf distance identify the degree of the inconsistent object shift or ghosttype artifact at object boundaries. The proposed metric shows better performances than traditional IQA metrics in the evaluation of synthesized stereo images from MVD video sequences.
SIQE metric [18] proposed to estimate the quality of DIBRsynthesized images compares the statistical characteristics of the synthesized and the original views estimated using the divisive normalization transform. In the evaluation of compressed MVD video sequences, it achieves high correlation with widely used image and video quality metrics.
A fullreference video quality assessment of synthesized view with texture/depth compression presented in [11] focuses on the temporal flicker distortion due to depth compression distortion and the view synthesis process. It is based on two quality features which are extracted from both spatial and temporal domains of the synthesized sequence. The first feature focuses on capturing the temporal flicker distortion and the second feature is used to measure the change of the spatiotemporal activity in the synthesized sequence due to blurring and blockiness distortion caused by texture compression. The performances of the proposed metric evaluated on the synthesized video quality database SIAT [11] are better than the performances of the commonly used image/video quality assessment methods.
Multiscale image quality assessment
As in most other areas of image processing and analysis, multiresolution methods have improved performances relative to singleresolution methods also for the image quality assessment. Pyramids and wavelets are among the most common tools for constructing multiresolution signal decomposition schemes used in image processing and computer vision. Both redundant image pyramid representation and nonredundant image wavelet representations have been explored for multiscale image quality assessment metrics.
Multiscale structural similarity measure, MSSSIM [19] is based on linear lowpass pyramid decomposition. Multiscale image quality measures using information content weighted pooling, IWSSIM, and IWPSNR [20], use Laplacian pyramid decomposition [21]. CWSSIM [22] simultaneously insensitive to luminance and contrast changes and small geometric distortions of image is based on multiorientation steerable pyramid decomposition using multiscale bandpassoriented filters.
It has been shown that the local contrast in different resolutions can be easily represented in terms of Haar wavelet transform coefficients and computational models of visual mechanisms were incorporated into a quality measurement system [23]. Experiments have shown that Haar filters have good ability to simulate the human visual system (HVS) and the proposed metric is successful in measuring compressed image artifacts.
Errorbased image quality metric using Haar wavelet decomposition has been proposed in [24]. It has been reported that Haar wavelet provided more accurate quality scores than other wavelet bases. PSNR has been calculated between the edge maps calculated from detail subbands as well as between approximation subbands of the original and the distorted images. These two PSNR have been linearly combined to the overall quality score. The proposed metric predict quality scores more accurately than the conventional PSNR and can be used efficiently in realtime applications.
Reducedreference image quality assessment based on multiscale geometric analysis (MGA) to mimic multichannel structure of HVS, contrast sensitivity function to reweights MGA coefficients to mimic nonlinearities in HVS and the just noticeable difference threshold to remove visually insensitive MGA coefficients has been presented in [25]. The quality of the distorted image was measured by comparing the normalized histograms of the distorted and the reference images. MGA was utilized to decompose images by a series of transforms including wavelet, curvelet, bandelet, contourlet, waveletbased contourlet, hybrid wavelets, and directional filter banks. MGA can capture the characteristics of image, e.g., lines, curves, contour of object. IQA based on MGA and IQ metric using Haar wavelet decomposition [24] have been evaluated on the database which contains compressed, white noisy, Gaussianblurred, and fastfading Rayleigh channel noisy images.
Proposed morphological multiscale metric
Multiscale image quality assessment (IQA) framework can be described as threestage process. In the first stage, both the reference and the distorted images are decomposed into a set of lower resolution images using multiresolution decomposition. In the second stage, image quality/distortion maps are evaluated for all subbands at all scales. In the third stage, a pooling is employed to convert each map into a quality score, and these scores are combined into the final multiscale image quality measure score.
The key stage of the multiscale image quality assessment may be how to represent images effectively and efficiently, so it is necessary to investigate various kinds of transforms. Most of the current multiscale IQA metrics use linear filters in the multiresolution decomposition. In this paper, we propose to use nonlinear morphological operators in the multiscale decompositions in the first stage of multiscale IQA framework, Fig. 2, in order to better deal with specific geometric distortions in DIBRsynthesized images. Introduced nonlinear morphological filters used in the multiscale image decomposition maintain important geometric information such as edges on their true positions, across different resolution levels [8]. More precisely, we investigate two types of morphological multiscale decompositions in the first stage of multiscale IQA framework: morphological bandpass pyramid decomposition in MPPSNR and morphological wavelet decomposition in MWPSNR. In the second stage of the multiscale IQA framework, Fig. 2, we propose to calculate squared error maps between the appropriate images of the multiscale representations of the two images, the reference image and the DIBRsynthesized image, in order to measure precisely, pixelbypixel, the edge distortion. In this way, the areas around edges that are prone to synthesis artifacts are emphasized in the metric score. In the third stage of IQA multiscale framework, MSE is calculated from each squared error map. MSE of all multiscale representation images are combined into multiscale mean squared error, which is transformed into morphological multiscale peak signaltonoise ratio measure.
Morphological multiscale image decomposition
The importance of analyzing images at many scales arises from the nature of images themselves [26]. Scenes contain objects of many sizes and these objects contain features of many sizes. Objects can be at various distances from the viewer. Any analysis procedure that is applied only at a singlescale may miss information at other scales. The solution is to carry out analysis at all scales simultaneously. Psychophysics and physiological experiments have shown that multiscale transforms seem to appear in the visual cortex of mammals [27].
A multiscale representation is completely specified by the transformation from a finer scale to a coarser scale. In linear scalespaces the operator for changing scale is a convolution by a Gaussian kernel. After the convolution with Gaussian kernel the images are uniformly blurred, also the regions of particular interest like the edges [28]. This is a drawback as the edges often correspond to the physical boundaries of objects. The edge and contour information may be the most important of an image’s structure for human to capture the scene. To overcome this issue, nonlinear multiresolution signal decomposition schemes based on morphological operators have been proposed to maintain edges through scales [8].
In morphological image processing, geometric properties such as size and shape are emphasized rather than the frequency properties of signals. Mathematical morphology [29, 30] is a settheoretic method for image analysis which provides a quantitative description of geometric structure of an image. It considers images as sets which permits geometryoriented transformations of the images. The structuring element offers flexibility because it can be designed in different shapes and sizes according to the purpose. Morphological filters are nonlinear signal transformations that locally modify geometric signal features.
In the first stage of morphological multiscale IQA framework, we have explored two types of multiscale image decomposition using morphological pyramid and morphological wavelets.
Multiscale image decomposition using morphological pyramid
The image pyramid offers a flexible, convenient multiresolution format that matches the multiple scales found in the visual scenes and mirrors the multiple scales of processing in the human visual system [26]. Pyramid representations have much in common with the way people see the world, i.e., primate visual systems achieve a multiscale character [7].
In this paper, we propose to use morphological bandpass pyramid (MBP) decomposition in the first stage of morphological multiscale IQA framework. Morphological bandpass pyramid is generated using the Laplacian type pyramid decomposition scheme [21], but instead of linear filters, morphological filters are used. We propose to use morphological operator erosion (E) for lowpass filtering in analysis step and morphological operator dilation (D) for interpolation filtering in synthesis step leading to the morphological bandpass pyramid decomposition erosion/dilation (MBP ED) introduced in [31] and reviewed in [32]. One level of the proposed MBP ED pyramid is shown on Fig. 3.
In the MBP ED scheme, Fig. 3, a lower resolution image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{j}}+\mathsf{1}} \) is obtained by applying morphological operator erosion on the previous pyramid level image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{j}}} \) and downsampling the eroded image by factor 2 on both image dimensions (σ ^{↓}) (1). We’ve used the square structuring element of size (2r+1) × (2r+1), r=1,…6 for erosion.
The erosion as the analysis operator removes fine details smaller than the structuring element. A detail image is derived by subtracting from each level an interpolated version of the next coarser level. The image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{j}}+\mathsf{1}} \) of the next pyramid level is upsampled by factor 2 on both dimensions (σ ^{↑}) leading to the image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{j}}\mathit{\mathsf{U}}} \). Morphological operator dilation is applied on the upsampled image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{j}}\mathit{\mathsf{U}}} \) to produce expanded image \( {\widehat{\mathit{\mathsf{s}}}}_{\mathit{\mathsf{j}}} \). The detail image \( {\mathit{\mathsf{d}}}_{\mathit{\mathsf{j}}} \)is obtained as the difference of the pyramid image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{j}}} \) and expanded image from the next pyramid level \( {\widehat{\mathit{\mathsf{s}}}}_{\mathit{\mathsf{j}}} \):
Using square structuring element, morphological reduce and expand filtering can be implemented more efficiently separably by rows and columns using the structuring elements of size \( \mathsf{1}\times \left(\mathsf{2}\mathit{\mathsf{r}}+\mathsf{1}\right) \) for rows and \( \left(\mathsf{2}\mathit{\mathsf{r}}+\mathsf{1}\right)\times \mathsf{1} \) for columns.
Morphological bandpass pyramid with M decomposition levels consists of detail (error) images of decreasing size \( {\mathit{\mathsf{d}}}_{\mathit{\mathsf{j}}} \), j = 0, … M1 and the coarse lowest resolution image \( {\mathit{\mathsf{s}}}_{\mathit{\mathsf{M}}} \) [9]. MBP ED pyramid generated using SE of size 7 × 7 of the synthesized frame from the video sequence Newspaper is shown on Fig. 4.
MBP ED pyramid based on adjunction satisfies the property that the detail signal is always nonnegative. At any scale change, maximum luminance at the coarser scale is always lower than the maximum luminance at the finer scale, the minimum is always higher. Morphological bandpass pyramid decomposition can be interpreted as a structural image decomposition tending to enhance image features such as edges which are segregated by scale at the various pyramid levels [9]. Enhanced features are segregated by size: fine details are prominent in the lower level images while progressively coarser features are prominent in the higher level images. MBP ED pyramid using structuring element of size 2 × 2 is morphological Haar pyramid [31]. MBP satisfies pyramid condition [31] which states that synthesis of a signal followed by analysis returns the original signal, meaning that no information is lost by these two consecutive steps and the original image can be perfectly reconstructed from the pyramid representation. Perfect reconstruction, while not mandatory for image quality assessment is a valuable property for a representation in early vision not because a visual system needs to literally reconstruct the image from its representation but rather because it guarantees that no information has been lost, ie that if two images are different then their representations are different also [7]. There is neurophysiological evidence that the human visual system uses a similar kind of decomposition [33]. There is inherent congruence between the morphological pyramid decomposition scheme and human visual perception [9].
Multiscale image decomposition using morphological wavelets
Most current image quality assessment methods based on discrete wavelet transform use linear wavelet kernels [23, 24, 34]. In this paper, we propose to use morphological wavelet decomposition in order to better preserve geometric structures such as edges in the lower resolution images. The morphological wavelet transforms introduced in [10] and reviewed in [32] are nonlinear wavelet transforms that use min and max operators. Due to nonlinear nature of the morphological operators, important geometric information such as edges are well preserved across different resolution levels. A general and flexible approach for the construction of nonlinear morphological wavelets in the spatial domain is provided by the lifting scheme using morphological lifting operators in prediction (P) step and update (U) step [35], Fig. 5 . We have explored both separable and true nonseparable morphological wavelet decompositions using the lifting scheme.
Separable 2D discrete wavelet transform (DWT) is implemented by cascading two 1D DWT along the vertical and horizontal directions [36] producing three detail subbands and approximation signal. Separable wavelet decompositions using 1D morphological Haar wavelet (minHaar) and 1D morphological wavelet using minlifting scheme (minLift) [10, 37] are explored. Their linear counterparts, Haar wavelet and biorthogonal wavelet of CohenDaubechiesFeauveau (cdf (2,2)) [38], are also tested for comparision.
Nonseparable sampling opens a possibility of having schemes better adapted to the human visual system [39]. Nonseparable 2D morphological wavelet decomposition on a quincunx lattice using the minlifting scheme (minLiftQ) [40] is also explored. Nonseparable wavelet decomposition with linear wavelet of CohenDaubechiesFeauveau on a quincunx lattice (cdf(2,2)Q) [41] is implemented for comparision.

1D Morphological Haar min wavelet transformation (minHaar)
One of the simplest example of nonlinear morphological wavelets is the morphological Haar wavelet (minHaar) [10]. It is very similar structure to the linear Haar but it uses nonlinear morphological operator erosion (by taking the minimum over two samples) in the update step of the lifting scheme [32, 37]. An illustration of one step of the wavelet transform with minHaar wavelet using the lifting scheme is shown on Fig. 6. Initially, the signal x (the first row in Fig. 6) is splitted to the even samples array (white nodes) and odd samples array (black nodes). The detail signal d (middle row in Fig. 6) is calculated as the difference of the odd array and the even array (3). The lower resolution signal s (bottom row in Fig. 6) is calculated from the even array and detail signal (4).
The morphological Haar wavelet decomposition scheme may do a better job in preserving edges as compared to linear case [10]. The morphological Haar wavelet has some specific invariance properties. Besides of being translation invariant in the spatial domain, it is also grayshift invariant and graymultiplication invariant [37].

1D Morphological wavelet transformation using minlifting scheme (minLift)
Minlifting scheme [10] is constructed using two nonlinear lifting steps: nonlinear prediction and nonlinear update, both using operator erosion (by taking the minimum over two/three samples). After splitting the signal x to an odd samples array (black nodes in the first row of Fig. 7) and an even samples array (white nodes in the first row of Fig. 7), each sample of the detail signal d (second row on Fig. 7) is calculated according to (5). The update step is chosen in such a way that local minimum of the input signal is mapped to scaled signal and a sample of the approximation signal s (third row on Fig. 7) is calculated according to (6).
Morphological wavelet decomposition using minLift wavelet is both grayshift invariant and graymultiplication invariant [37]. Minlifting scheme has the nice property that it preserves local minima of a signal, respectively, over several scales. It does not generate any new local minima. The detail signal is almost zero at areas of smooth gray level variation and sharp gray level variations are mapped to positive detail signal values (white). As an illustration of the wavelet decomposition using morphological minLift wavelet, the oriented wavelet subbands from the first decomposition level which contain vertical, horizontal, and corner details are shown on Fig. 8 for the synthesized frame from the video sequence Newspaper.

Nonseparable morphological wavelet transformation with quincunx sampling using minlifting scheme (minLiftQ)
Twodimensional nonseparable morphological wavelet decomposition on a quincunx lattice using the minlifting scheme minLiftQ [40] is analog to separable morphological wavelet decomposition using minLift wavelet. Nonseparable 2D wavelet transform on a quincunx lattice using the lifting scheme is performed through odd and even steps alternately, producing a detail subband at each step and an approximation image which is decomposed further. Each step, odd and even, is implemented using the lifting scheme which consists of three parts: splitting, prediction and update. In the odd step, the image pixels are splitted in two subsets, both on quincunx lattice, Fig. 9 upper row, one subset with white pixels, x and the other subset with black pixels, y. The pixel of the error signal d is calculated using the minimum of the four nearest pixels in the horizontal and vertical directions (7), Fig. 9 bottom row left, and the lower resolution signal s is updated from the four nearest detail signal pixels (8), Fig. 9 bottom row right.
In the even step, the signal on the quincunx lattice is separated on two subsets, both on Cartesian lattice, one subset with white pixels x and the other subset with gray pixels y, Fig. 10 upper row. The pixel of the error signal d is calculated from the four nearest pixels on diagonal directions (7), Fig. 10 bottom row left, and the lower resolution signal s is updated from four nearest detail signal pixels on diagonal directions (8), Fig. 10 bottom row right.
Owing to the symmetry in the quincunx grid, the nonseparable transform is insensitive to edge directions and image orientation. Nonoriented wavelet subbands from the first level of nonseparable wavelet decomposition with quincunx sampling using morphological minLiftQ wavelet of the synthesized frame from the video sequence Newspaper are shown on Fig. 11. The detail image from the odd step is rotated \( {\mathsf{45}}^{\circ } \) before display. The detail images are almost zero at areas of smooth gray level variation. Sharp gray level variations are mapped to positive (white) detail image values.
Distortion computation and pooling stage
Mean squared error (MSE) and peak signaltonoise ratio (PSNR) are the most widely used objective image distortion/quality metrics. They are probably the simplest way to quantify the similarity between two images. The mean squared error remains the standard criterion for the assessment of signal quality and fidelity. It has many attractive features: simplicity, parameter free, memoryless [42]. The MSE is an excellent metric in the context of optimization. Moreover, competing algorithms have most often been compared using MSE/PSNR [42]. It is shown that MSE has poor performances in some cases (contrast strech, mean luminance shift, contamination by additive white Gaussian noise, impulsive noise distortion, JPEG compression, blur, spatial scaling, spatial shift, rotation) when it is used as a singlescale metric on the full resolution images in the base band [42, 43].
In this paper, we propose to use MSE for distortion measurement between pyramid images in MPPSNR and between wavelet subbands in MWPSNR. In the second stage of the multiscale IQA framework we use squared error maps between the morphological multiscale representations of the two images: the reference image and the DIBRsynthesized image. Squared error maps calculated pixelbypixel show wrong displacement of the object edges induced by DIBR process through different scales of multiscale representations. From the squared error maps, mean squared errors are calculated and combined into the multiscale mean squared error which is transformed into multiscale peak signaltonoise ratio in the third stage of the multiscale IQA framework.
The calculation of MPPSNR
When the morphological pyramid decomposition is used in the first stage of morphological multiscale IQA framework, Fig. 12, multiscale pyramid mean squared error MP_MSE is calculated as weighted product of \( \mathit{\mathsf{M}}\mathit{\mathsf{S}}{\mathit{\mathsf{E}}}_{\mathit{\mathsf{j}}} \) values at all pyramid levels (9).
where equal value weights \( {\beta}_{\mathit{\mathsf{j}}}=\frac{\mathsf{1}}{\mathit{\mathsf{M}}+\mathsf{1}} \)are used, M is the number of decomposition levels and M + 1 is the number of pyramid images. Finally, MP_MSE is transformed into Morphological Pyramid Peak SignaltoNoise Ratio MP_PSNR (10).
where R is the maximum dynamic range of the image.
The calculation of MWPSNR
When the morphological wavelet decomposition is used in the first stage of morphological multiscale IQA framework, multiscale wavelet mean squared error (MWMSE) is calculated as weighted sum of \( \mathit{\mathsf{M}}\mathit{\mathsf{S}}{\mathit{\mathsf{E}}}_{\mathit{\mathsf{j}}\mathit{\mathsf{i}}} \) values for all subbands at all scales of the two wavelet representations as final pooling (11).
where equal value weights \( {\beta}_{\mathit{\mathsf{j}}\mathit{\mathsf{i}}}=\frac{\mathsf{1}}{\mathit{\mathsf{M}}\cdot \mathit{\mathsf{D}}+\mathsf{1}} \) are used. M is the number of decomposition levels, D is the number of detail subbands at one decomposition level. In the case of separable wavelet transforms, D = 3, Fig. 13, while for the nonseparable wavelet decomposition, D = 2, \( \mathit{\mathsf{M}}\mathit{\mathsf{S}}{\mathit{\mathsf{E}}}_{\mathit{\mathsf{j}}\mathit{\mathsf{i}}} \) is the mean value of the squared error map of the subband i at decomposition level j.
Finally, multiscale metric Morphological Wavelet Peak SignaltoNoise Ratio, MWPSNR, is calculated as:
Results
In this section, experimental setup for the validation of proposed morphological multiscale measures is described. The performances of two versions of the proposed morphological multiscale metric, the Morphological Pyramid Peak SignaltoNoise Ratio measure, MPPSNR, and the Morphological Wavelet Peak SignaltoNoise Ratio measure, MWPSNR, are presented and discussed. Moreover, the PSNR performances by multiscale decomposition subbands are analyzed. It is shown experimentally that PSNR has very good agreement with human judgment when it is calculated for the images at higher morphological decomposition scales. Therefore, we propose the reduced versions of the morphological multiscale measures, reduced MPPSNR, and reduced MWPSNR, using only detail images from higher decomposition scales. The performances of the reduced morphological multiscale measures are presented also.
Since the morphological operators used in morphological multiresolution decomposition schemes involve only integers and only max, min, and addition in their computation the calculation of morphological multiresolution decompositions have low computational complexity. The calculation of MSE is of low computational complexity also. Therefore, the calculation of both measures, MPPSNR and MWPSNR, is not computationaly demanding.
Experimental setup
To compare the performances of the image quality measures the following evaluation metrics are used: root mean squared error between the subjective and objective scores (RMSE), Pearson’s correlation coefficient with nonlinear mapping between the subjective scores and objective measures (PCC) and Spearman’s rank order correlation coefficient (SCC). The calculation of DMOS from given MOS and nonlinear mapping between the subjective scores and objective measures are done according to test plan for evaluation of video quality models for use with high definition TV content by VQEG HDTV group [44].
The performances of the metrics MPPSNR and MWPSNR are evaluated using two publicly available databases which contain DIBRsynthesized images: the IRCCyN/IVC DIBR image database [3, 4] and part of the MCL3D stereoscopic image database [5, 6].
The IRCCyN/IVC DIBR image quality database
The IRCCyN/IVC DIBR image quality database contains frames from three multiview video sequences: Book arrival (1024 × 768, 16 cameras with 6.5 cm spacing), Lovebird1 (1024 × 768, 12 cameras with 3.5 cm spacing) and Newspaper (1024 × 768, 9 cameras with 5 cm spacing). The selected contents are representative and used by MPEG also. For each sequence four virtual views are generated on the positions corresponding to those positions obtained by the real cameras using seven depthimagebased rendering algorithms, named A1A7 [45–50]. One key frame from each synthesized sequence is randomly chosen for the database. For these key frames subjective assessment in form of mean opinion scores (MOS) is provided. The difference mean opinion scores (DMOS) is calculated as the difference between the reference frame’s MOS and the synthesized frame’s MOS. In the algorithm A1 [45], the depthimage is preprocessed by a lowpass filter. Borders are cropped and then the image is interpolated to reach its original size. The algorithm A2 is based on A1 except that the borders are not cropped but inpainted by the method described in [46]. The algorithm A3 [47] use inpainting method [46] to fill in the missing parts in the virtual image which introduces blur in the disoccluded area. This algorithm was adopted as the reference software for MPEG standardization experiments in 3D Video group. The algorithm A4 performs holefilling method aided by depth information [48]. The algorithm A5 uses a patchbased texture synthesis as the holefilling method [49]. The algorithm A6 uses depth temporal information to improve synthesis in the disoccluded areas [50]. The frames generated by algorithm A7 contain unfilled holes. Due to very noticeable object shifting artifacts in the frames generated by algorithm A1, these frames are excluded from the tests. The focus remains on images synthesized using A2–A7 DIBR algorithms and without registration procedure for alignment of the synthesized and the original frames. The results presented in Sections 4.2–4.4 for the IRCCyN/IVC DIBR database are based on the mixed statistics of the DIBR algorithms A2A7.
The MCL3D stereoscopic image quality database
The part of the stereoscopic image quality database MCL3D which contains 36 stereopairs generated using four DIBR algorithms and associated mean opinion score (MOS) values is used for testing. These stereoscopic image pairs are rendered from nine imageplusdepth sources: Baloons, Kendo and Lovebird1 of resolution 1024 × 728 and Shark, Microworld, Poznan street, Poznan Hall2, Gt_fly, Undo_dancer of resolution 1920 × 1088.
For each source, three views are used for the calculation of the metric score, Fig. 14. Original textures (T1, T2, T3) and their associated depth maps (D1, D2, D3) are obtained by selecting key frames from each of nine multiview test sequences associated with depth maps. From the middle view (T2, D2), using one of the four DIBR algorithms, the stereoscopic image pair (SL, SR) is generated. The textures from the outer views, (T1, T3) are used as the reference stereo pair. We have calculated IQA metric score between the DIBRsynthesized stereopair (SL, SR) and the reference stereopair (T1,T3). The score for the stereo pair is calculated as the average of the left and right image scores.
In the generation of the MCL3D database, four DIBR algorithms are used: DIBR with filtering, A1 [45], DIBR with inpainting, A2 [46], DIBR without holefilling, A7 and DIBR with hierarchical holefilling (HHF), A8 [51]. HHF uses pyramidlike approach to estimate the hole pixels from lower resolution estimates of the 3D wrapped image yielding to the virtual images that are free of any geometric distortions. Adding the depth adaptive preprocessing step before applying the hierarchical holefilling, the edges and texture around the disoccluded areas can be sharpened and enhanced. The results presented in sections 4.2 – 4.4 for the MCL3D database are based on the mixed statistics of four DIBR algorithms A1, A2, A7, and A8. The original image Shark and the left images from the stereopairs synthesized using four DIBR algorithms (A1, A2, A7, A8) are shown on Fig. 15 from top to bottom and from left to right.
Analysis of MPPSNR performances
In this section, the performances of the Morphological Pyramid Peak SignaltoNoise Ratio measure, MPPSNR, are analyzed. Morphological bandpass pyramid decomposition using morphological operator erosion for lowpass filtering in analysis step and morphological operator dilation for interpolation filtering in synthesis step (MBP ED) is applied on the reference image and the DIBRsynthesized image. The influence of different size and shape of structuring element used in morphological operations and different number of decomposition levels in MBP ED pyramid decompositions on MPPSNR performances are explored. For comparison with linear case, MPPSNR performances are calculated using Laplacian pyramid decomposition with linear filters. In addition, PSNR performances calculated between two pyramids’ images on different pyramid scales are investigated. The reduced version of MPPSNR using only lower resolution images from higher pyramid scales is proposed and its performances are analyzed.
The shape and the size of the structuring element (SE) used in morphological filtering determine which geometrical features are preserved in the filtered image especially the direction of object’s enlargement or shrinking. Using square structuring element the objects are enlarged or shrinked equally in all directions. Squaredshaped structuring element is suitable to detect straight lines while round SE is suitable to detect circular features. The MPPSNR performances using different shapes of structuring element (square, round, rhomb and cross type structuring element, Fig. 16) for morphological filtering in analysis step are evaluated. Better performances of MPPSNR are achieved with square or round type SE than by rhomb or cross type SE. The results are similar with square and round type structuring element, but the computational complexity is significantly lower when the square structuring element is used. Namely, in that case separable pyramid decomposition by rows and columns with downsampling after each step can be easily implemented. In the images from the two chosen databases, straight lines are dominant and squaredshaped structuring element is chosen.
Moreover, the impact of structuring element size used in morphological operations and the number of decomposition levels in MBP ED pyramid decompositions on MPPSNR performances is investigated. MPPSNR performances are calculated using MBP ED pyramid decomposition with different number of decomposition levels (1–7 for IRCCyN/IVC DIBR database and 1–8 for MCL3D database) and with square structuring elements of different sizes from 2 × 2 to 13 × 13. More features are removed from the image at each decomposition level as larger structuring element is used. The number of decomposition levels for the best MPPSNR performances depends on the size of structuring element.
The performances of MPPSNR using SE of different sizes and the best number of decomposition levels for that size of SE are shown in the upper part of Table 1. For the IRCCyN/IVC DIBR database, the MPPSNR performances show improvement with enlargement of the structuring element. The MPPSNR performances are noticable better for SE of size 5 × 5 and higher. Matlab implementation of MPPSNR is available online [52].
In the case of MCL3D database, the operation sum is used in the calculation of MPMSE (9) as better performances of MPPSNR are achieved. For the MCL3D database, there is just a slight improvement of MPPSNR performances with the enlargement of the structuring element. Scatter plot of MPPSNR using SE of size 3 × 3 versus MOS for MCL3D database is shown in Fig. 17. Each point represents one stereopair from the database.
For the comparison with linear case, the image decomposition is performed using Laplacian pyramid with linear filters. Simple and efficient binomial filters [53] as approximation of a Gaussian filters are used. Binomial filters’ coefficients are from Pascal’s triangle, normalized with their sum. Twodimensional filter is implemented as cascade of onedimensional filters. The MPPSNR performances using pyramid decompositions with linear filters are similar for all filter lengths. For the IRCCyN/IVC DIBR database, Pearson’s correlation varies from 0.771 for the linear filter of length 2 to 0.799 for the linear filter of length 13. For the MCL3D database, Pearson’s correlation varies from 0.322 for the linear filter of length 2 to 0.377 for the linear filter of length 3. Pearson’s correlation coefficients of MPPSNR versus DMOS for different filter lengths used in linear pyramid decomposition and for different sizes of SE used in morphological pyramid decomposition are shown on Fig. 18, left for the IRCCyN/IVC DIBR database and right for the MCL3D database. The results on Fig. 18 are based on the mixed statistics of the DIBR algorithms A2–A7 for the IRCCyN/IVC DIBR database and A1, A2, A7, A8 for the MCL3D database. MPPSNR using pyramid decomposition with morphological filters has much better performances than MPPSNR using pyramid decomposition with linear filters.

Analysis of PSNR performances by pyramid images
It is shown in [54] that better performances of IQA metrics PSNR and SSIM are achieved when these metrics are calculated for the lower resolution images after lowpass filtering and downsampling than for the full resolution images. The downsampling scale depends on the image size and the viewing distance. We have investigated PSNR performances for the detail images of the morphological bandpass pyramid at different pyramid scales. The reference image and the DIBRsynthesized image are decomposed into a set of lower resolution pyramid images using morphological bandpass erosion/dilation pyramid decomposition. At each pyramid scale, PSNR is calculated between the detail images of the two pyramids, the reference image pyramid and the DIBRsynthesized image pyramid.
For the IRCCyN/IVC DIBR database, Pearson’s correlation coefficients of PSNR versus DMOS for pyramid images by pyramid scales using structuring elements of different sizes are shown on Fig. 19.
The smallest PCC is for the first pyramid scale (d _{0}) for all sizes of SE. Higher value PCC is for the middle and high scales. For the morphological pyramid decomposition using SE of size 2 × 2 and 3 × 3, the highest PCC is at scale 5 (d _{4}). For the SE of size 5 × 5, the best PSNR performances are obtained at pyramid scale 4 (d _{3}). For the pyramid decomposition with larger SE, the best PSNR performances are obtained at scale 3 for detail images \( {\mathit{\mathsf{d}}}_{\mathsf{2}} \). Also, PSNR performances at middle and higher pyramid scales are much better than the PSNR performances for the case when the PSNR is calculated between the original and the DIBRsynthesized images without decomposition, in the base band. The best PSNR performances by pyramid images for different sizes of SE used in morphological pyramid decomposition are shown in Table 2. For the morphological pyramid decomposition using SE of size 3 × 3, the best PSNR performances are achieved for the detail image at pyramid level 5, Pearson correlation coefficient 0.89 and Spearman correlation coefficient 0.867.
For the MCL3D database, Pearson’s correlation coefficients of PSNR versus MOS for pyramid images at all pyramid scales using structuring elements of different sizes are shown on Fig. 20. For this database, smaller differences between PCC for pyramid images at different scales exist. The smallest PCC is at the first scale (detail images \( {\mathit{\mathsf{d}}}_{\mathsf{0}} \)) and the highest PCC is for the aproximation images at the highest scale. The best pyramid image PSNR performances for different sizes of SE used in morphological pyramid decomposition are shown in Table 2.
For both databases,it is shown that PSNR shows very good agreement with human quality judgments when it is calculated at higher scales of MBP ED pyramid, much better than for the full resolution images in the base band. Matlab implementation of PSNR by morphological pyramid images is available online [52].

The performances of the reduced version of MPPSNR
Based on the results of PSNR performances calculated separately by pyramid scales, we propose reduced version of MPPSNR using only pyramid images with higher PCC values of PSNR towards subjective scores. Reduced version of MP_MSE is calculated as the weighted sum of the used subbands’ MSE (9).
For the IRCCyN/IVCDIBR database, the reduced version of MPPSNR is calculated using only three detail images with higher PCC values of PSNR towards DMOS. The performances of the reduced versions of MPPSNR using equal value weights are presented in the bottom left part of Table 1. Reduced version of MPPSNR has better performances than its full version: from 1.74 % when the MBP ED pyramid decomposition with SE of size 11 × 11 is used to 6.75 % when the MBP ED pyramid decomposition with SE of size 3 × 3 is used. HVS visually integrates an image edges in a coarsetofinescale (globaltolocal) fashion [34]. Visual cortex cells integrate activity across spatial frequency in an effort to enhance the representation of edges. Because the edges are visually integrated in a coarsetofinescale order, the visual fidelity of an image can be maintained by preserving coarse scales at the expense of fine scales. Reduced version of MPPSNR is computationaly more efficient than its full version as the MSE is only calculated for lower resolution pyramid images. The reliable and fast evaluation is obtained with reduced version MPPSNR using MBP ED pyramid with SE of size 5 × 5 (Pearson’s 90.39 %, Spearman 86.3 %). Scatter plot of nonlinearly mapped reduced MPPSNR versus subjective DMOS for that case is shown in Fig. 21. Each point represents one frame from the database. Matlab implementation of reduced version of MPPSNR is available online [52].
For the MCL3D database, the reduced version of MPPSNR is calculated without detail images from the first three pyramid scales when the SE of size less than 7 × 7 is used. When the SE of size 7 × 7 and bigger is used, only the pyramid image from the first scale is omitted in the calculation of the reduced version of MPPSNR. The performances of the reduced versions of MPPSNR using equal value weights are presented in the bottom right part of Table 1. Only marginal improvement is achieved using reduced version of MPPSNR for MCL3D database.
Analysis of MWPSNR performances
In this section, the performances of the Morphological Wavelet Peak SignaltoNoise Ratio measure, MWPSNR, are analyzed. MWPSNR uses morphological wavelet decomposition of the reference and the DIBRsynthesized images. Both separable morphological wavelet decompositions using morphological Haar min wavelet (minHaar) and minlifting wavelet (minLift) and nonseparable morphological wavelet decomposition with quincunx sampling using minlifting wavelet (minLiftQ) are investigated. Separable morphological wavelet decompositions are computationally less expensive than nonseparable wavelet decompositions. Also, they are less expensive than morphological pyramid decompositions for the same filter length. The influence of different number of wavelet decomposition levels on MWPSNR performances are explored. For the comparison with linear wavelet decompositions, MWPSNR performances are calculated using separable linear wavelet decompositions using Haar wavelet (Haar) and CohenDaubechiesFeauveau wavelet cdf(2,2) and nonseparable linear wavelet decomposition with quincunx sampling using cdf(2,2)Q. PSNR performances calculated by wavelet subbands through decomposition scales are investigated. The reduced version of MWPSNR using only wavelet subbands with better PSNR performances is analyzed.
The number of decomposition levels has been varied between 1 and 8 and the configurations with the best MWPSNR performances have been chosen. The best MWPSNR performances have been achieved using separable wavelet transformations in M = 7 levels producing 22 subbands. Using nonseparable wavelet transformation with quincunx sampling for the IRCCyN/IVC DIBR database, the best MWPSNR performances have been achieved also with M = 7 levels producing 15 subbands. For the MCL3D database the best MWPSNR performances using nonseparable wavelet transformation have been achieved with M = 4 levels producing nine subbands. Equal value weights are used in the calculation of MWMSE (11). Matlab implementation of MWPSNR is available online [55].
The performances of MWPSNR for different wavelet transformations are presented in the upper part of Table 3. The performances of MWPSNR using morphological wavelet transforms are better than the performances of MWPSNR using linear wavelet transforms. The best MWPSNR performances have been obtained using separable wavelet decomposition with morphological Haar wavelet which is of the lowest computational complexity: for the IRCCyN/IVC DIBR database, Pearson 0.85, Spearman 0.77 and for the MCL3D database, Pearson 0.87, Spearman 0.70. Scatter plot of MWPSNR using separable wavelet decomposition with morphological Haar wavelet versus MOS for MCL3D database is shown on Fig. 22.

Analysis of PSNR performances by wavelet subbands
We have investigated PSNR performances by wavelet subbands at different wavelet decomposition scales. The reference image and the DIBRsynthesized image are decomposed into a sets of lower resolution subbands using morphological wavelet decomposition. At each decomposition scale, for each wavelet subband, PSNR is calculated between the subbands of the two wavelet representations, the reference image wavelet representation and the DIBRsynthesized image wavelet representation. Pearson’s correlation coefficient (PCC) of PSNR to subjective scores is calculated for each subband for three types of morphological wavelets: minHaar, minLift and minLiftQ. Matlab implementation of PSNR by morphological wavelet subbands is available online [55].
For the IRCCyN/IVC DIBR database, Fig. 23, Pearson’s correlation coefficients calculated for wavelet subbands on decomposition levels 4–7 are higher than Pearson’s correlation coefficients calculated for wavelet subbands on decomposition levels 1–3. For the MCL3D database, smaller differences by wavelet subbands between Pearson’s correlation coefficients can be noticed, Fig. 24.
Moreover, the best PSNR performances by wavelet subbands for each wavelet decomposition are shown in Table 4. For instance, for the IRCCyN/IVC DIBR database for the separable wavelet decomposition using morphological minLift wavelet, the best PSNR performances are obtained for subband on the scale 6 with vertical details (d61), PCC 0.887 and SCC 0.828. Also, for all tested wavelets, the PSNR of the wavelet subband with the highest PCC show much better performances than PSNR calculated between the reference image and the DIBRsynthesized image without decomposition in the base band.

Analysis of the reduced version MWPSNR performances
Based on the PSNR performances by subbands for the IRCCyN/IVC DIBR database given in Fig. 23, it can be concluded that the PSNR performances of wavelet subbands at decomposition levels 4–7 are much better than the subband PSNR performances on levels 1–3. Therefore, we propose reduced version of MWPSNR using only these higher level subbands. Reduced versions of MW_MSE is calculated as weighted sum of the used subbands' MSE. For the separable wavelet decomposition, the reduced version of MWPSNR is calculated using only 11 subbands from levels 4–7 with indices 41–72. For the nonseparable wavelet decomposition with quincunx sampling, reduced version of MWPSNR is calculated using 6 subbands from decomposition levels 4–7 with indices 42–71. Matlab implementation of the reduced version of MWPSNR is available online [55]. The performances of the reduced MWPSNR are presented in the bottom left part of Table 3. It is shown that for each wavelet type, the performances of the reduced version MWPSNR are better than the performances of the full version MWPSNR: 3.1 % for minHaar, 1.43 % for minLift and 3.08 % for minLiftQ. The best reduced version MWPSNR performances are obtained using separable wavelet decomposition with morphological minHaar wavelet, Pearson’s 88.5 %, Spearman 82.98 %. Scatter plot of nonlinearly mapped reduced MWPSNR versus subjective DMOS for that case is shown in Fig. 25.
For the MCL3D database, only marginal improvement is achieved using reduced version of MWPSNR, Table 3 bottom right.
Summary of the results
The performances of the selected proposed metrics, the commonly used 2D image quality assessment metrics and the metric dedicated to synthesisrelated artifacts, 3DswIM [13], are presented in Table 5. The considered commonly used 2D metrics are: PSNR, universal quality index UQI [56], structural similarity index SSIM [57], multiscale structural similarity MSSSIM [19], information weighted IWPSNR [20], and IWSSIM [20]. Singlescale structural similarity SSIM [57] is calculated between the original and the synthesized images using the given matlab code [58]. 3DswIM [13] is calculated using the given matlab pcode [59]. Selected versions of the proposed metrics using morphological pyramid decompositions presented in Table 5 are: PSNR calculated on scale 5 of the MBP ED pyramid representations using SE of size 3 × 3; reduced version of MPPSNR using SE of size 5 × 5 in pyramid MBP ED decomposition; full versions of MPPSNR using SE of size 5 × 5. The selected proposed metrics using morphological wavelet decompositions shown in Table 5 are: PSNR calculated on scale 6 between wavelet subbands with vertical details of the two wavelet representations using minLift wavelet for the IRCCyN/IVC DIBR database and PSNR calculated on scale 7 between approximation wavelet subbands using minLift wavelet for the MCL3D database; reduced and full versions of MWPSNR using minHaar wavelet. The performances of the proposed metrics are much better than the performances of the commonly used 2D metrics and better than the performances of the metric dedicated to synthesisrelated artifacts, 3DswIM. The Pearson’s correlation coefficients of the selected commonly used 2D metrics, the metric dedicated to synthesisrelated artifacts, 3DswIM, and the reduced versions of MPPSNR and of MWPSNR are shown on Fig. 26.
Conclusions
Most of the depthimagebased rendering (DIBR) techniques produce images which contain nonuniform geometric distortions affecting the edge coherency. This type of distortions are challenging for common image quality assessment (IQA) metrics. We propose fullreference metric based on multiscale decomposition using morphological filters in order to better deal with specific geometric distortions in the DIBRsynthesized images. Introduced nonlinear morphological filters in multiresolution image decomposition maintain important geometric information such as edges across different resolution scales. The proposed metric is dedicated to artifact detection in DIBRsynthesized images by measuring the edge distortion between the multiscale representations of the reference image and the DIBRsynthesized image using MSE. We have explored two versions of morphological multiscale metric, Morphological Pyramid Peak SignaltoNoise Ratio measure, MPPSNR, based on morphological pyramid decomposition and Morphological Wavelet Peak SignaltoNoise Ratio measure, MWPSNR, based on morphological wavelet decomposition. The proposed metrics are evaluated using two databases which contain images synthesized by DIBR algorithms: IRCCyN/IVC DIBR image database and MCL3D stereoscopic image database. Both metric versions demonstrate high improvement of performances over standard IQA metrics and over tested metric dedicated to synthesisrelated artifacts. Also, they have much better performances than their linear counterparts for the evaluation of DIBRsynthesized images. MPPSNR has slightly better performances than MWPSNR. For the MCL3D database, MPPSNR achieves Pearson 0.888 and Spearman 0.756 using MBP ED pyramid decomposition with square structuring element of size 7 × 7 in 4 levels. For the same database, MWPSNR achieves Pearson 0.87 and Spearman 0.707 using separable wavelet decomposition with morphological Haar wavelet in 7 levels.
It is shown that PSNR has particularly good agreement with human judgment when it is calculated between the appropriate detail images at higher decomposition scales of the two morphological multiscale image representations. For IRCCyN/IVC DIBR images database, PSNR calculated on scale 5 of the MBP ED pyramid image representations using structuring element of size 3 × 3 has very good performances, Pearson’s 0.89 and Spearman 0.86. For MCL3D database, PSNR calculated on scale 4 of the MBP ED pyramid image representations using square structuring element of size 7 × 7 achieves Pearson’s 0.88, Spearman 0.82. For IRCCyN/IVC DIBR images database, it has been shown that reduced versions of multiscale metrics, reduced MPPSNR and reduced MWPSNR, can be used for the assessment of DIBRsynthesized frames with high reliability. Reduced version of MPPSNR using morphological pyramid decomposition MBP ED with square structuring element of size 5 × 5 achieves the improvement 15.2 % of correlation over PSNR (Pearson’s 0.904, Spearman 0.863) and reduced version of MWPSNR using morphological wavelet decomposition with minHaar wavelet gains the improvement of 13.3 % of correlation over PSNR (Pearson’s 0.885, Spearman 0.829).
Since the morphological operators involve only integers and only min, max, and addition in their computation, as well as simple calculation of MSE, the multiscale metrics MPPSNR and MWPSNR are computationally efficient procedures. They provide reliable DIBRsynthesized image quality assessment even without any parameter optimization and precise registration procedure.
Abbreviations
cdf(2,2), linear, biorthogonal (2,2) of CohenDaubechiesFeauveau wavelet; cdf(2,2)Q, nonseparable linear (2,2) of CohenDaubechiesFeauveau wavelet with quincunx sampling; DIBR, DepthImageBased Rendering; DMOS, Difference Mean Opinion Score; FVV, Free Viewpoint Video; Haar, Haar wavelet transformation; IQA, Image Quality Assessment; MBP ED, morphological bandpass pyramid erosion/ dilation; minHaar, morphological Haar min wavelet transformation; minLift, morphological minlifting wavelet transformation; minLiftQ, nonseparable morphological minlifting wavelet transformation with quincunx sampling; MOS, Mean Opinion Score; MPPSNR, Morphological Pyramid Peak SignaltoNoise Ratio metric; MSE, Mean Squared Error; MWPSNR, Morphological Wavelet Peak Signalto Noise ratio metric; PCC, Pearson’s Correlation Coefficient; PSNR, Peak SignaltoNoise ratio; SE, structuring element for morphological operations
References
K Mueller, P Merkle, T Wiegand, 3D video representation using depth maps. Proc. IEEE 99(4), 643–656 (2011)
E Bosc, P Le Callet, L Morin, M Pressigout, Visual quality assessment of synthesized views in the context of 3DTV, in 3DTV system with depthimagebased rendering, ed. by C Zhu, Y Zhao, L Yu, M Tanimoto (Springer, New York, 2013), pp. 439–473
E Bosc, R Pepion, P Le Callet, M Koppel, P NdjikiNya, M Pressigout, L Morin, Towards a new quality metric for 3d synthesized view assessment. IEEE Journal on Selected Topics in Signal Processing 5(7), 1332–1343 (2011)
IRCCyN/IVC DIBR image quality database. ftp://ftp.ivc.polytech.univnantes.fr/IRCCyN_IVC_DIBR_Images
R Song, H Ko, CCJ Kuo, MCL3D: a database for stereoscopic image quality assessment using 2Dimageplusdepth source, 2014. http://arxiv.org/abs/1405.1403
MCL3D stereoscopic image quality database. http://mcl.usc.edu/mcl3ddatabase
E. Adelson, E. Simoncelli, W. Freeman, Pyramids and multiscale representations. Proc. European Conf. on Visual Perception, Paris (1990)
P Maragos, R Schafer, Morphological systems for multidimensional signal processing. Proc. IEEE 78(4), 690–710 (1990)
A Toet, A morphological pyramidal image decomposition. Pattern Recogn. Lett. 9(4), 255–261 (1989)
H Heijmans, J Goutsias, Multiresolution signal decomposition schemesPart II: morphological wavelets. IEEE Trans. Image Process. 9(11), 1897–1913 (2000)
X Liu, Y Zhang, S Hu, S Kwong, CCJ Kuo, Q Peng, Subjective and objective video quality assessment of 3D synthesized views with texture/depth compression distortion. IEEE Trans. Image Process. 24(12), 4847–4861 (2015)
P. Conze, P. Robert, L. Morin, Objective view synthesis quality assessment. Proc. SPIE 8288, Stereoscopic Displays and Applications XXIII (2012)
F Battisti, E Bosc, M Carli, P Le Callet, S Perugia, Objective image quality assessment of 3D synthesized views. Elsevier Signal Processing: Image Communication. 30(1), 78–88 (2015)
E Bosc, P Le Callet, L Morin, M Pressigout, An edgebased structural distortion indicator for the quality assessment of 3D synthesized views, Picture Coding Symposium, 2012, pp. 249–252
M. Solh, G. AlRegib, J.M. Bauza, 3VQM: A 3D video quality measure, 3VQM: a visionbased quality measure for DIBRbased 3D videos, IEEE Int. Conf. on Multimedia and Expo (ICME) (2011)
M. Solh, G. AlRegib, J.M. Bauza, A no reference quality measure for DIBR based 3D videos, IEEE Int. Conf. on Multimedia and Expo (ICME) (2011)
CT Tsai, HM Hang, Quality assessment of 3D synthesized views with depth map distortion, visual communications and image processing (VCIP), 2013
M.S. Farid, M. Lucenteforte, M. Grangetto, Objective quality metric for 3d virtual views, IEEE Int. Conf. on Image Processing (ICIP) (2015)
Z. Wang, E. Simoncelli, A.C. Bovik, Multiscale structural similarity for image quality assessment. Asilomar Conference on Signals, Systems and Computers (2003)
Z Wang, Q Li, Information content weighting for perceptual image quality assessment. IEEE Trans. On Image Processing 20(5), 1185–1198 (2011)
PJ Burt, EH Adelson, The Laplacian pyramid as a compact image code. IEEE Trans. on Communications 31(4), 532–540 (1983)
Z. Wang, E. Simoncelli, Translation insensitive image similarity in complex wavelet domain. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal processing, 573–576 (2005)
Y.K. Lai, C.C. Jay, Kuo, Image quality measurement using the Haar wavelet. Proc. SPIE 3169, Wavelet Applications in Signal and Image Processing V, 127 (1997)
S Rezazadeh, S Coulombe, A novel wavelet domain errorbased image quality metric with enhanced perceptual performance. Int. J. Comput. Electrical Eng. 4(3), 390–395 (2012)
X Gao, W Lu, D Tao, X Li, Image quality assessment based on multiscale geometric analysis. IEEE Trans. Image Process. 18(7), 1409–1423 (2009)
E. Adelson, C. Anderson, J. Bergen, P. Burt, J. Ogden, Pyramid methods in image processing. RCA Engineer (1984)
S Mallat, Wavelets for a vision. Proc. IEEE 84(4), 604–614 (1996)
F Meyer, P Maragos, Nonlinear scalespace representation with morphological levelings. J. Vis. Commun. Image Represent. 11, 245–265 (2000)
G Matheron, Random sets and integral geometry (Wiley, New York, 1975)
J Serra, Introduction to mathematical morphology. J. on Comput. Vision, Graph. Image Process. 35(3), 283–305 (1986)
J Goutsias, H Heijmans, Nonlinear multiresolution signal decomposition schemes—Part I: morphological pyramids. IEEE Trans. Image Process. 9(11), 1862–1876 (2000)
D. SandićStanković, Multiresolution decomposition using morphological filters for 3D volume image decorrelation. European Signal Processing Conf. EUSIPCO, Barcelona (2011)
H. Heijmans, J. Goutsias, Some thoughts on morphological pyramids and wavelets. European Signal Processing Conf. EUSIPCO, Rodos (1998)
D Chandler, S Hemami, VSNR: a waveletbased visual signaltonoise ratio for natural images. IEEE Trans. Image Process. 16(9), 2284–2298 (2007)
H. Heijmans, J. Goutsias, Constructing morphological wavelets with the lifting scheme, Int. Conf. on Pattern Recognition and Information Processing, Belarus, 65–72 (1999)
S Mallat, Multifrequency channel decompositions of images and wavelet models. IEEE Trans. on Acoustics, Speech and. Signal Process. 37(12), 2091–2110 (1989)
H Heijmans, J Goutsias, Multiresolution signal decomposition schemes Part2: morphological wavelets. Tech. Rep. PNAR9905 (CWI, Amsterdam, The Netherlands, 1999)
I Daubechies, W Sweldens, Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 247–269 (1998)
J Kovacevic, M Vetterli, Nonseparable two and threedimensional wavelets. IEEE Trans. on Signal Processing 43(5), 1269–1273 (1995)
H. Heijmans, J. Goutsias, Morphological pyramids and wavelets based on the quincunx lattice. in Mathematical morphology and its applications to image and signal processing, ed. by J Goutsias, L Vincent, D Bloomberg, (Springer US, 2000), 273–281
G. Uytterhoeven, A. Bultheel, The redblack wavelet transform. Proc. of IEEE Benelux Signal Processing Symposium (1997)
Z Wang, A Bovik, Mean squared error: love it or leave it. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
Z. Wang, A. Bovik, L. Lu, Why is image quality assessment so difficult. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ASSP), 4, 3313–3316, Orlando FL, US (2002)
VQEG HDTV Group, Test plan for evaluation of video quality models for use with high definition tv content, 2009
C. Fehn, Depth image based rendering (DIBR), compression and transmission for a new approach on 3DTV. Proc. SPIE, Stereoscopic Displays and Applications XV, 5291, 93–104, San Jose, CA (2004)
A Telea, An image inpainting technique based on the fast matching method. J. Graph, GPU and Game Tools 9(1), 23–34 (2004)
Y Mori, N Fukushima, T Yendo, T Fujii, M Tanimoto, View generation with 3D warping using depth information for FTV. Signal Process. Image Commun. 24(1–2), 65–72 (2009)
K Muller, A Smolic, K Dix, P Merkle, P Kauff, T Wiegand, View synthesis for advanced 3D video systems. EURASIP Journal on Image and Video Processing 2008, 438148 (2008)
P. NdjikiNya, P. Koppel, M. Doshkov, H. Lakshman, P. Merkle, K. Muller, T. Wiegand, Depth image based rendering with advanced texture synthesis. IEEE Int. Conf. on Multimedia&Expo, 424–429, Suntec City (2010)
M. Koppel, P. NdjikiNya, M. Doshkov, H. Lakshman, P. Merkle, K. Muller, T. Wiegand, Temporally consistent handling of disocclusions with texture synthesis for depthimagebased rendering. IEEE Int. Conf. on Image Processing, 1809–1812, Hong Kong (2010)
M Solh, G AlRegib, Depth adaptive hierarchical hole filling for DIBRbased 3D videos, Proceedings of SPIE, 8290, 829004 (Burlingame, CA, US, 2012)
MPPSNR matlab pcode. https://sites.google.com/site/draganasandicstankovic/code/mppsnr
M Aubury, W Luk, Binomial filters. Journal of VLSI Signal Processing for Signal, Image and Video Technology 12(1), 35–50 (1995)
K Gu, M Liu, G Zhai, X Yang, W Zhang, Quality assessment considering viewing distance and image resolution. IEEE Trans. On Broadcasting 61(3), 520–531 (2015)
MWPSNR matlab pcode. https://sites.google.com/site/draganasandicstankovic/code/mwpsnr
Z Wang, AC Bovik, A universal image quality index. IEEE Signal Processing Letters 9(3), 81–84 (2002)
Z Wang, AC Bovik, HR Sheikh, E Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
SSIM matlab code. https://ece.uwaterloo.ca/~z70wang/research/ssim/ssim_index.m
3DSwIM matlab pcode. http://www.comlab.uniroma3.it/3DSwIM.html
Competing interests
The authors declare that they have no competing interests.
Author information
Authors and Affiliations
Corresponding author
Additional information
Acknowledgements
This work was partially supported by COST Action IC11053D ConTourNet, the Ministry of Education, Science and Technological Development of the Republic of Serbia under Grant TR32034 and by the Secretary of Science and Technology Development of the Province of Vojvodina under Grant 114451813/201503.
Authors’ contributions
DSS proposed the framework of this work, carried out the whole experiments, and drafted the manuscript. DK supervised the whole work, offered useful suggestions, and helped to modify the manuscript. PLC participated in the discussion of this work and helped to polish the manuscript. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
SandićStanković, D., Kukolj, D. & Le Callet, P. DIBRsynthesized image quality assessment based on morphological multiscale approach. J Image Video Proc. 2017, 4 (2016). https://doi.org/10.1186/s1364001601247
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1364001601247
Keywords
 DIBRsynthesized image quality assessment
 Multiscale IQA metric using morphological operations
 Geometric distortions
 Morphological pyramid
 Morphological wavelets