A novel texture-based asymmetric visibility threshold model for stereoscopic video coding

Du, Baozhen; Yu, Mei; Jiang, Gangyi; Zhang, Yun; Zhu, Tianzhi

doi:10.1186/s13640-018-0265-y

Research
Open access
Published: 26 April 2018

A novel texture-based asymmetric visibility threshold model for stereoscopic video coding

Baozhen Du^1,2,
Mei Yu¹,
Gangyi Jiang¹,
Yun Zhang^3,4 &
…
Tianzhi Zhu¹

EURASIP Journal on Image and Video Processing volume 2018, Article number: 27 (2018) Cite this article

3537 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Asymmetric stereoscopic video coding is becoming increasingly popular, as it can reduce the bandwidth required for stereoscopic 3D delivery without degrading the visual quality. Based on the perceptual theory of binocular suppression, the left and right views of stereoscopic video are coded with different levels of quality. However, existing asymmetric perceptual coding approaches on stereoscopic video mainly focus on the threshold of whole image distortion. It is not so reasonable to use a single unified rather than adaptable perception threshold for a random natural stereoscopic image as the texture complexity typically varies in different blocks of image. In this paper, we generated an asymmetrically distorted stereoscopic image set with different texture densities and conducted a large number of subjective perceptual experiments. A strong correlation between the asymmetrical visibility threshold and texture complexity is revealed from the subjective experiments, and a texture-based asymmetrical visibility threshold model (TAVT) is established. Then, the model is extended to the hierarchical B picture (HBP) coding architecture and an asymmetric stereoscopic video coding method is proposed based on the TAVT model. Experimental results show that the proposed method can effectively reduce the unnecessary perceptual redundancy without visual quality degradation. Especially, it is more efficient for high bitrate configuration.

1 Introduction

Nowadays, asymmetric coding is widely concerned in the field of stereoscopic video coding [1, 2].This approach is based on the perceptual theory of binocular suppression, which is a so-called masking effect on human visual perception [3]. It is noticed that in the stereoscopic vision the view with relatively better quality contributes more to the overall stereoscopic image quality [4]. Thus, theoretically, when one viewpoint of a stereoscopic image is coded in such a way that the high quality can be maintained, the other viewpoint of the stereoscopic image can then be compressed to a greater extent without inducing visible artifacts of stereoscopic perception. Therefore, the total bitrate of stereoscopic video can be reduced through the asymmetric coding.

Asymmetric stereoscopic video coding is mainly studied from two aspects: the mixed resolution and the asymmetric quantization coding. A mixed resolution concept is first brought forth by Perkins, which assumes that stereoscopic perception will not be affected when one view is of high quality and the other view of lower quality [5]. The concept has been widely adopted since then in stereoscopic-based video. In order to reduce visual discomfort and super-resolve the low quality views to the target full resolution, a virtual view assisted super resolution algorithm is proposed in [6]. The algorithm can recover the details in regions with edges while maintaining good quality at smooth areas by properly exploiting the high quality virtual view pixels and the directional correlation of pixels. The experimental results demonstrate the effectiveness of this algorithm with a peak signal-to-noise ratio (PSNR) gain of up to 3.85 dB. Researchers in [7] present a new depth encoding method called view up-sampling optimization to improve the quality of low resolution views. In the case of asymmetric blur-impaired images, the stereoscopic perception is dominated by the high quality view.

Based on the above considerations, it is found that the success of asymmetric coding depends on the type of encoding distortion. However, up to now much research works focus on the qualitative analysis of binocular suppression in stereoscopic perception. Only a few scholars have studied the problem of improving the efficiency of asymmetric coding from the aspect of the asymmetric quantization coding. Based on subjective perception experiment, researchers in [8] proposed the quantitative visibility threshold (VT) model and found that 2 dB will be a safe bound for asymmetric stereoscopic coding and where most of people cannot perceive the degradation of quality. The VT model in [8] can be further applied to asymmetric coding and save the total bitrate 9 to 34% [9].

How to set up a reasonable threshold model of stereo perception is the key to improve the efficiency of asymmetric coding. However, the visibility threshold proposed in [8] mainly focused on the whole image distortion, rather than the effect of the image content and local features on the visibility threshold. Obviously, if only a single unified perception threshold is used for natural stereoscopic images, visual perceptual redundancy cannot be maximally removed, because the texture, brightness, and contrast characteristics in different blocks of the image are usually typically diverse.

On the other hand, traditional two-dimensional (2D) image/video just noticeable distortion (JND) researches have shown masking effects of texture [10] and luminance [11] for human perception. Such researches have succeeded in characterizing effects of these factors on perceived thresholds using explicit JND models [12]. However, the mentioned JND model in [12] is originally developed for measuring the perceptible distortion of conventional 2D video which is not suitable for stereoscopic video. Unlike traditional 2D image/video, stereoscopic images/videos have additional spatial characteristics such as parallax [13] and depth [14] which also influence the perception of other characteristics such as texture of stereoscopic images/videos due to the effect of stereoscopic masking. Therefore, JND model of 2D image/video cannot easily be applied to stereoscopic image/video. In order to remove visual perceptual redundancy, it is crucial to establish the relationship between the visibility threshold in stereoscopic perception and image factors, such as texture and luminance.

Texture is an important factor in image content [15]. In this paper, we focus on revealing the effect of texture on perceptual visibility threshold in stereoscopic image. Given the left view be considered as the dominant view, a hypothesis is preconditioned that texture factor affects the asymmetrical visibility threshold, which means that human vision system cannot be aware of small textural quality change in stereoscopic image unless the degradation exceeds a threshold. Based on a subjective perceptual experiment, a TAVT model of quantization parameters (QP) is established to reveal the relationship between the asymmetrical visibility threshold and texture complexity. Finally, an asymmetric coding scheme with adaptive quantization parameter based on TAVT model is proposed, in which the left view is encoded with normal quality, while the right view is encoded depending on the visibility threshold.

The remainder of this paper is organized as follows. We firstly present a framework of proposed stereoscopic video coding system in Section 2. Section 3 describes the subjective perception experimental procedure designed to verify hypothesis and establish the TAVT model. Section 4 describes the proposed TAVT-based stereoscopic video encoder is presented in detail, and the experimental results are analyzed in Section 5. Finally, conclusions are drawn.

2 Framework of proposed stereoscopic video coding system

Figure 1 shows the framework of the proposed stereoscopic video coding system. Firstly, the TAVT model is built and validated based on well-designed subjective experiments, and then the proposed TAVT-based stereoscopic video encoder was developed. The original video is jointly coded into stereoscopic video bit stream by the proposed TAVT-based stereoscopic video encoder. After stored and transmitted to the client, bit stream is decoded and the reconstruction video sequence is displayed on the stereoscopic display device for viewers.

In this framework, the establishment of the TAVT model is primary and essential. The model mainly include three steps including the generation of test stereoscopic image sets, subject perceptual experiment, and non-linear fitting model, which will be presented in detail in Section 3. We apply the proposed TAVT into the traditional 3D video encoder and then a novel TAVT-based stereoscopic video encoder is derived with the target of maintaining the perceptual quality of the stereoscopic video and a high bitrate saving. The proposed TAVT stereoscopic video encoder will be introduced in detail in Section 4.

3 Texture-based asymmetrical visibility threshold model

To study the texture-based asymmetrical visibility threshold and verify our hypothesis, an appropriate subjective perceptual experiment is needed. We first generate the test stereoscopic image sets and implement the subjective experiment. Then, a TAVT model is given by non-linear fitting of perceptual thresholds. In addition, some details regarding the implementations are also presented. Finally, to verify the validity and performance, the TAVT model is tested with natural stereoscopic images at the end of this section.

3.1 Generation of test stereoscopic image set

With natural stereoscopic images, it is difficult to derive a set of stereoscopic images in which only the texture changes, whereas other factors (color, background luminance, disparity, and contrast) remain unchanged. Consequently, instead of using natural images to analyze the stereo-vision perceptual threshold, we use stereoscopic images with different textures generated by computer graphic software, i.e., three-dimensional modeling software Maya2015. In the experiments, we adopt a ball as the key object, M original stereoscopic images are generated with the same type of texture but with M rank densities, respectively. As such, the density of the ball in M original stereoscopic images varies from sparse to dense and the background is the same. These Maya-generated stereoscopic images are regarded the original stereoscopic images in our experiments. The left view images of four original stereoscopic images are shown in Fig. 2.

To generate the distorted stereoscopic images, high efficiency video coding software (HM14.0) [16] was used to encode the stereoscopic images with intra-frame configuration. Stereoscopic image pairs were formed by asymmetrically encoding the left and right view images. First, N QP grades for left view (QP_l) were selected to encode the left view of each original stereoscopic image. Second, for each QP_l, we used k QP grades for right view (QP_r) to encode the right view of each original stereoscopic image to obtain the encoded right view with different levels of quality. The values of the k QP_r grades were{QP_l, QP_l + δ, QP_l + 2δ, …, QP_l + (k − 1)δ}, where δ was the interval of the right view encoding quantization parameter. Here, δ is defined as follows:

$$ \delta =\left\{\begin{array}{l}\mathrm{round}\left[\frac{{\mathrm{QP}}_{\mathrm{max}}-{\boldsymbol{QP}}_l}{k}\right],\kern0.75em {\boldsymbol{QP}}_l<{\boldsymbol{QP}}_{\mathrm{max}}-k+1\\ {}1\kern7.25em ,\kern0.75em \mathrm{otherwise}\end{array}\right., $$

(1)

where QP_max is the maximum quantization parameter. We set the QP_max as 51 in our experiment. A compressed stereoscopic image pair is first generated wherein the left and right views are compressed with the same QP_l (denoted hereafter as the “reference stereoscopic image”). Then, we compressed the left view with a fixed QP_l and the right view with a QP_r larger than the QP_l. The two images were then combined into a new stereoscopic image (denoted hereafter as the “degraded stereoscopic image”). Corresponding to one QP_l grade of each original stereoscopic image, a single reference stereoscopic image and (k-1) degraded stereoscopic images were generated and classified as a stereoscopic image set for the paired comparison subjective experiment. Therefore, M × N groups of stereoscopic image sets were generated for the subjective experiment: M original stereoscopic images corresponding to N QP_l grades. In this experiment, M is set to six, N is set to four, k is set to 13 and QP_l is set to 20, 26, 32, and 38, respectively.

For each original stereoscopic image, the average local variance (ALV) is used to evaluate the texture complexity. We define the texture complexity of an image as follows

$$ {T}_{\mathrm{ALV}}=\frac{1}{N}\left[\sum \limits_i^{H/8}\sum \limits_j^{W/8}\left(\operatorname{sign}\left(i,j\right)\times {\sigma}^2\left(i,j\right)\right)\right], $$

(2)

where W represents the width of the original image, H represents the height of the original image, respectively,(i, j) denote the ith row jth column 8 × 8 block in the original image, σ²(i, j) denotes the variance of the ith row jth column 8 × 8 block, N represents the number of all 8 × 8 blocks’ pixels in the target region which is easy to get by statistics, sign(i, j) denote sign function which indicate whether the block is in the target area, when the block is in the target area, the sign(i, j) value is 1, otherwise the value is 0. In the subjective experiments, the participants were asked to focus on the small ball as the target region, regardless of the unchanged image background, because only the texture density of the ball is changed and we could calculate ALV of the ball region to represent the texture complexity of the image. Furthermore, the average texture complexity of the stereoscopic image is defined as

$$ {\overline{T}}_{\mathrm{ALV}}=g\left({T}_{\mathrm{ALV}l},{T}_{\mathrm{ALV}r}\right)=\alpha {T}_{\mathrm{ALV}l}+\beta {T}_{\mathrm{ALV}r}, $$

(3)

where T_ALVl and T_ALVr denote the complexity of the textural area in the left and right views, respectively, α and β are scaling factors.

3.2 Subjective test environment and methodology

In this paper, we applied the Paired Comparison Method (PCM) to study the asymmetrical visibility threshold. We simultaneously displayed two stereoscopic images on the projection screen [17], in which left view image was the same and the quality of right view image were different. Participants were asked to compare the two stereoscopic images with the same content and pointed out which of them is of higher subjective stereo quality. According to Recommendation ITU-R BT.500–11 [18], the experimental environment was conducted in a visual lab with ambient illumination, such that the color temperature and ambient sound could be strictly controlled. Thus, we set the light illumination to less than 200 lx and the illumination of the background wall to less than 20 lx. We used Samsung’s UA65F9000AJ Ultra-HD stereo TV (65 in., 16:9, resolution 1920 × 1080) as the stereo display device. When watching the images, the participants wore shutter glasses at a viewing distance of approximately four times the height of the screen (approximately 2.8 m).

There were 20 participants invited to take part in this experiment. The average age of them was 25 years old. Participants were non-expert and had almost no experience in stereo visual subjective scoring. The visual acuity of all participants were normal or corrected, and all of them passed the color vision test [19]. Each participant scored the M × N groups of stereoscopic image sets. Comparison clips from each group are displayed on the stereo television side by side, as shown in Fig. 3. The sequences were displayed in random order. The observers were asked to rank the quality of each comparison clip on the screen. Each clip was displayed for 10 s, and the observers were given a 5 s break between clips.

For the sake of prudence, the observers ranked the images by choosing between three options: the quality of the left stereoscopic image is better (denoted by “left better”), the right stereoscopic image is better (denoted by “right better”), or the quality of two images is the same (denoted by “comparable”). Observers were also instructed that when they could not determine which image was of higher quality within 15 s, it should be considered that the two stereoscopic images are comparable, and that neither left better nor right better should be selected.

3.3 Definition of visibility threshold for asymmetrical coding

When the participant determined whether the subjective quality of the reference stereoscopic image is better than that of the test stereoscopic image, the participant is considered to have perceived the distortion; otherwise, there is no perceptible distortion. We define the critical noticeable point that half of the observers perceive distortion while the other half do not (i.e., the probability of perceiving distortion is 50%). We denote the corresponding QP_r for the right view at the critical observation point as QP_Th. Because QP_Th may be not necessarily identical to the QP_r value used in the experiment, we calculated QP_Th with linear interpolation, as follows:

$$ {\boldsymbol{QP}}_{\mathrm{Th}}\left|{}_{P=50\%}\right.=\frac{{\mathrm{QP}}_{\alpha}\left({P}_{\beta }-0.5\right)+{\boldsymbol{QP}}_{\beta}\left(0.5-{P}_{\alpha}\right)}{P_{\beta }-{P}_{\alpha }}, $$

(4)

where point α is the case that is closest to the critical observation point but the corresponding probability of perceiving distortion (P_α) is less than 50% in subjective experiment, point β is the case that is closest to the critical observation point but the corresponding probability of perceiving distortion (P_β) is larger than 50% in subjective experiment, QP_α and QP_β, respectively refer to the corresponding quantization parameter of the right view for α and β.

Moreover, we define QP_TAVT as the difference between QP_Th and QP_l at the critical observation point. Here, QP_TAVT refers to the asymmetrical visibility threshold of the quantization parameters, revealing the visibility threshold for the right view with respect to the left view

$$ {\mathrm{QP}}_{\mathrm{TAVT}}={\mathrm{QP}}_{\mathrm{Th}}-{\mathrm{QP}}_l. $$

(5)

3.4 Experimental data analysis and non-linear fitting model

After conducting the subjective experiment and performing statistical analysis on the results, the M × N QP_TAVT threshold values are obtained for different texture complexities from varying QP_l values. The QP_TAVT values are summarized in Table 1. The value of $ {\overline{T}}_{\mathrm{ALV}} $ refers to the average texture complexity of each original stereoscopic image.

Table 1 QP_TAVT values with varying texture complexities based on QP_l

Full size table

Given the linear fitting relationship between QP_TAVT and the $ {\overline{T}}_{\mathrm{ALV}} $ under different QP_l values, we have the following three key observations.

1)
As shown in Fig. 4, there is a strong correlation between QP_TAVT and the texture complexity. An increase in the complexity of the texture results in a corresponding increase in the threshold of the QP_TAVT.
2)
When QP_l is 20, 26, or 32, QP_TAVT and $ {\overline{T}}_{\mathrm{ALV}} $ exhibited an approximate linear relation that increased monotonically, with the three sets of curves showing a similarly parallel relation. Thus, it can be concluded that when the left view quantization parameter is lower, the linear relationship between the perceptual threshold value QP_TAVT and the texture complexity is approximate. Given a certain texture complexity, the smaller the QP_l is, the greater the perceptual threshold value QP_TAVT becomes. This means that, for any given texture complexity, when the left view quantization parameter value is lower, the threshold value of tolerance is higher. This is mainly due to the superior quality of the left view image. Even when the right view image is distorted more seriously, the subjective perception of the stereoscopic image quality will not be affected.
3)
When QP_l is 38, the range of QP_TAVT varied between 2 and 4. Therefore, there is no obvious linear relationship between the perceptual threshold and the texture complexity when the left view image has a large QP (i.e., low quality). This is mainly because the significant degradation of the left image is caused from higher encoding quantization parameters and in this case the effect of stereoscopic masking decreased rapidly. Therefore, the distortion will be easily detected when the quality of the right view image is greatly reduced.

Finally, with two-order non-linear fitting of the QP_TAVT data in Table 1, the formula for TAVT model used to describe the relationship between QP_TAVT, $ {\overline{T}}_{\mathrm{ALV}} $, and QP_l is given as follows:

$$ {\mathrm{QP}}_{\mathrm{TAVT}}=f\left({\overline{T}}_{\mathrm{ALV}},{\mathrm{QP}}_l\right)=a+b\times {\overline{T}}_{\mathrm{ALV}}+c\times {\boldsymbol{QP}}_l+d\times {{\overline{T}}_{\mathrm{ALV}}}^2+e\times {\overline{T}}_{\mathrm{ALV}}\times {\boldsymbol{QP}}_l+f\times {{\boldsymbol{QP}}_l}^2. $$

(6)

Furthermore, the coefficient of the recommended values and 95% confidence intervals are shown in Table 2, which mathematically described the TAVT model.

Table 2 Coefficient of the recommended values and 95% confidence intervals

Full size table

The diagram of TAVT model are shown in Fig. 5, which intuitively revealed the fact that, QP_TAVT is positively correlated with the $ {\overline{T}}_{\mathrm{ALV}} $, given a certain QP_l. This means that for each block in the stereoscopic images, the asymmetrical threshold can be calculated by the TAVT model as long as the texture complexity is obtained.

3.5 Verification and evaluation of TAVT model with natural stereoscopic images

To test validity and evaluate the performance of the TAVT model, we designed three experiments in this section. In the first experiment, the reliability and validity of the TAVT model to the natural stereo images was proved by using subjective experiments. In the second experiment, the QP threshold symmetric and asymmetric coding was compared and analyzed to demonstrate the advantages of the TAVT model. In the third experiment, the coding performance using TAVT model for stereoscopic images was tested, thus proving that the model can saves more bitrates of stereoscopic images under the same subjective quality.

3.5.1 The reliability and effectiveness test for TAVT model

To test validity and effectiveness of the proposed TAVT model on natural stereoscopic images, the same PCM subjective experiment as above (Section 3.2) has been performed for 30 natural stereoscopic images as shown in Fig. 6. The 30 pieces of natural stereoscopic images include 19 stereoscopic images in NBU 3D IQA database [20] and 11 pieces of randomly selected stereoscopic frames from several stereoscopic test sequences, such as the 1st/100th/158th/247th/493rd frame of the Balloons sequence, the 2nd/47th/92nd/126th/178th/277th frame of the Newspaper sequence.

The reference stereoscopic image was obtained by using symmetric intra coding with the base QP_REF. The test stereoscopic image was obtained by asymmetric intra coding, wherein the coding QP of left view was the same as the reference image with QP_REF, while the coding QP of right view was determined by QP_REF and the TAVT model. For the right view of test stereoscopic image, by analyzing and calculating the texture complexity of each largest coding unit (LCU), the QP_TAVT threshold of the corresponding LCU was obtained by the TAVT model, and then QP of each LCU was adaptively adjusted based on QP_REF. Corresponding to reference stereoscopic image, a single test stereoscopic image were generated and classified as a comparison clip for the paired comparison subjective experiment. Considering the subjective experiment completeness and workload, three basic QP_REFs were selected, namely, 22, 28, 34, used for performance measurement model in different quality levels. Therefore, 90 groups of comparison clips were generated for the experiment. Details of the selection of QP for subjective experiment are shown in Table 3.

Table 3 The selection of QP for subjective experiment

Full size table

Following the subjective experiment standard [17, 18], 50 observers (nine of them were with average three-dimensional subjective scoring experience and the rest were naive) were invited to the subjective evaluation of the image quality. On each trial of the experiment, observers watched two stereoscopic images of each comparison clip. Observers were then given time to vote on the comparative quality of two stereoscopic images using quality comparison scales, shown in Table 4, where “0” indicates the two stereoscopic images are with the same perceptual quality and “1” indicates that they are different. The subjective experiments were strictly implemented. Figure 7 shows the subjective test screen for one comparison clip, where the left and right images are randomly placed reference or test stereoscopic images. In addition, the comparison clips were played also random order.

Table 4 Comparison scales for subjective quality evaluation

Full size table

Table 5 shows the statistical and analytical results of this subjective experiment. Fifty observers participated in the subjective experiment. Each observer scored 90 comparison clips. Statistically, each scoring trial is independent and irrelevant. In the 4500 scoring trials, there are 4478 scoring trials having the comparison scale ‘0’ which means that the observer considered the subjective qualities of the two images were the same. There were only 22 scoring trials that the observers considered the subjective qualities of the reference and test images are different. That is to say, the effective rate of the TAVT model is 99.51%.The proportion of scoring ‘1’ accounted for 0.49%, which can be considered a small probability event that is almost impossible to happen in one experiment. Furthermore, the statistical mean of all scoring results is 0.005, approaching zero indefinitely. The confidence interval for the statistical mean is [0.0030, 0.0070], of which the confidence level is 1-α. In statistics, α reflects the significance level and the value is 0.05 in this paper. The subjective quality of the test stereoscopic images with asymmetric TAVT model coding is infinitely close to that with symmetric coding, thanks to the stereoscopic visual masking effect. Therefore, the TAVT model shows its effectiveness and reliability through this subjective experiment.

Table 5 The comparative subjective quality evaluation results

Full size table

3.5.2 Comparison and analysis of QP threshold for symmetric and asymmetric coding

Traditional JND researches have shown that, due to the masking effect of the image [10,11,12], bitrate can be reduced without causing a subjective quality degradation when working within the JND range. This means that even if the asymmetric stereoscopic masking effect is not taken into account, as long as a larger QP is used to symmetrically encode the image in a certain range of JND, human eyes are still unable to observe the subjective quality changes of degraded reconstructed images. Since this effect always exists when we observe the image, so we need to conduct experiments to analyze the threshold of the symmetric encoding which just increases QP symmetrically. If the symmetric threshold is less than the proposed asymmetric threshold, the proposed TAVT model is superior to symmetric coding, otherwise, the proposed TAVT model has no obvious advantages and practical application value. Therefore, through this experiment, we compare the QP threshold of symmetric coding with the asymmetric threshold.

The subjective experiment was carried out as similar as Section 3.5.1, with same experimental environment, same natural stereoscopic image set as shown in Fig. 6 and 35 observers were invited to the subjective experiment. Both the reference images and the test images were obtained by using symmetric intra coding. For each original image, we encode the original image with QP_REF to obtain a reference image of a certain quality grade. Corresponding to this reference stereo image, we obtained 10 different degraded stereoscopic images by using 10 levels of QP encoding for the original image. The 10 levels of QP were QP_REF + 1、QP_REF + 2... QP_REF + 10. Therefore, corresponding to one original stereoscopic image, a single reference stereoscopic image and 10 degraded stereoscopic images were generated and classified as a stereoscopic image set for the paired comparison subjective experiment. Therefore, 30 groups of stereoscopic image sets were generated for the subjective experiment. To compare with the TAVT model and taking into account the completeness and workload of subjective experiments, 3 QP_REFs were adopted to evaluate JND of symmetric encoding quantization parameters (QP_{SE_JND}), which takes 22, 28, and 34, respectively. The selection of QP for subjective experiment is shown in Table 6.

Table 6 The selection of QP for subjective experiment

Full size table

The procedure of the subjective experiment was also similar to that in Section 3.2. Comparison clips of each group were displayed on the stereo television side by side. Each observer should compare the reference image with the test image of each comparison clips and made his/her decision. The observer ranked the images by choosing among three options: “left better,” “right better,” or “comparable.” To avoid visual inertia, the reference image and the test image were displayed in random order. Through data statistics and analysis on all subjective scoring data, we finally obtained QP_{SE_JND} of each group of stereoscopic image set, as shown in Table 7.

Table 7 QP_{SE_JND} of each group of stereoscopic image set for symmetric coding

Full size table

As can be seen from Table 7, there is indeed a masking effect inside the image. When base QP_REF is small, the QP_{SE_JND} is relatively large. But as QP_REF increases, the QP_{SE_JND} decreases. Taking the fifth stereoscopic image set as an example, for the reference image coded with QP_REF = 22, QP_{SE_JND} for the test image can reach 3.94. This means that the observers cannot perceive the difference of subjective quality between the reference image and the corresponding test image if QP of the test one is less than 25.94.

In addition, we calculated the average QP_TAVT of right image for test stereo image, calculated by the TAVT model. As mentioned above, there is a corresponding QP_TAVT value for each LCU by the TAVT model. In order to facilitate the analysis, the average QP_TAVT is represented by the mean of QP_TAVT value of all LCUs in the right image for test stereo image. After calculation and statistics, the average QP_TAVT of each group stereoscopic image set is shown in Table 8.

Table 8 Average QP_TAVT of each group stereoscopic image set

Full size table

As can be seen from Table 8, when base QP_REF is small and images are of high quality, the average QP_TAVT is large. As the base QP_REF increases, the average QP_TAVT decreases. For the fifth test stereo image, when using asymmetric encoding, if QP of the left view is 22, the average QP_TAVT for the right image can reach 10.13, which means the QP of the right image can be up to 32.13 on average. Even so, the subjective quality degradation can hardly be perceived according to the TAVT model, benefiting from the stereoscopic visual masking effect.

By further comparison of two kinds of thresholds in Table 7 and Table 8, it is easy to find that, for each group of stereoscopic image set, the symmetric threshold (statistical mean of QP_{SE_JND} in Table 7) was less than the corresponding asymmetric threshold (average QP_TAVT in Table 8) on the whole. When QP_REF is relatively small, the gap between two thresholds is large. When the QP_REF increases, the gap becomes smaller but the average QP_TAVT is still larger than QP_{SE_JND}. This means that when encoding a stereoscopic image, the TAVT model using asymmetric encoding can tolerate greater perceptual distortion than symmetric coding without perceptual quality of the stereoscopic image degradation. Therefore, the proposed TAVT model has obvious advantages over the symmetric coding method.

3.5.3 Coding performance using TAVT model for stereoscopic image

To further evaluate TAVT performance, we test coding performance over natural stereoscopic image set as shown in Fig. 6 with Intra-frame coding. Four QP_BASEs are selected for coding the stereoscopic images, which are 22, 27, 32, and 37, respectively. Two adopted coding schemes are as follows:

Sym: symmetric coding.
Asym-TAVT: asymmetric coding in which the QP of left view is QP_BASE while the coding QP of each LCU for right view is determined by QP_BASE and the TAVT model.

Following the subjective experiment standard [17, 18], subjective experiment on validating and evaluating the TAVT model was implemented. The experimental environment was the same as the Section 3.5.1, and 35 observers were invited for subjective experiment. According to the quality of the broadcasting stereoscopic images, the subjective experiment observers gave their subjective scores after viewing each image, and then the Mean Opinion Score (MOS) was obtained for each test stereoscopic image. The obtained subjective scoring criteria are presented in Table 9. To evaluate the efficiency of the proposed scheme, we used the Bjontegaard delta bitrate based on MOS (BDBR_MOS) to indicate bitrates comparison under the same subjective quality. Finally, based on all of test stereoscopic images, the bitrates coded by two schemes, statistical mean of MOS and BDBR_MOS are illustrated Table 10.

Table 9 Subjective scoring criteria

Full size table

Table 10 Rate-MOS performance of two schemes

Full size table

Table 10 shows that, with the same QP_BASE, the bitrate required for asymmetric coding scheme with TAVT is much less than symmetric coding scheme, especially for small QP_BASE. BDBR_MOS indicates the bitrate savings of the TAVT asymmetric coding method compared with the symmetric coding method, with the same subjective quality. From Table 10, TAVT asymmetric coding method can achieve 24.1% bitrate saving on average.

The Rate-MOS curves in Fig. 8 more intuitively illustrate the improvement of coding performance of TAVT asymmetric coding method. It can be seen that, compared with the symmetric coding method, the TAVT asymmetric coding method can achieve better Rate-MOS coding performance.

Therefore, even the TAVT model is derived from the unnatural stereoscopic images, it is proved to be effective through massive tests on natural stereoscopic images. TAVT model provides a more accurate JND profile in the human visual systems (HVS), since it is capable of exploiting stereo masking properties of human eyes without jeopardizing the visual quality. By analyzing the texture complexity of image blocks, the asymmetric coding perceptual threshold can be calculated accurately, through which the bitrate of right view can be greatly saved while the subjective quality of the stereoscopic image is not reduced. Perceptual redundancy is further removed, mainly because of the due consideration of the compound masking effect. Consequently, TAVT model can be applied to further improve coding efficiency for stereoscopic video encoder.

4 The proposed TAVT-based stereoscopic video encoder

It is well known that stereoscopic video coding scheme adopts a classical HBP architecture, which improves the time scalability and efficiency of coding compression [21, 22]. In addition, a variable quantization of asymmetric coding strategy is used in HBP coding architecture. However, in HBP architecture, the entire frame is encoded with a single QP, and the characteristics of video content does not taken into consideration with arbitrary frames for the different local texture, brightness, and contrast differences [23]. In this section, we proposed a novel TAVT-based stereoscopic video encoder, which can adjust QP value adaptively based on the texture complexity of each LCU.

4.1 TAVT-based stereoscopic video encoder

As shown in Fig. 9, the left view frame adopts the traditional independent view coding method, while the right view frame can be encoded by the proposed asymmetric coding scheme. Firstly on-line calculate the LCU texture complexity and obtain the maximum tolerable QP_TAVT threshold, then adaptively adjust QP of the right view coding, and finally execute dependent view coding. In this framework, a key extension of TAVT model for HBP is deduced in Section 4.2.

4.2 Extension of TAVT model for HBP

The TAVT model is induced by intra coding image, which means that the model is suitable for intra frame coding. However, the inter frame coding method based on HBP coding architecture is more common and practical proposal in stereoscopic video coding. Different from intra frame coding, a large number of motion estimation and disparity estimation are used in HBP coding framework. In this section, we have given the extension the TAVT model to the inter frame model, which makes the model more suitable for HBP architecture.

Assume that we have a uniform quantization step Q, the encoded distortion caused by quantization can be theoretically modeled [24] as

$$ D(Q)=\sum \limits_{i=-\infty}^{+\infty}\underset{\left(i-0.5\right)Q}{\overset{\left(i+0.5\right)Q}{\int }}{\left|u-C(i)\right|}^2{f}_U(u) du, $$

(7)

where u is the original input signal, Q is the quantization step size. C(i) is the reconstructed u value derived after quantization and inverse quantization, f_U(u) denotes the probability density function.

The transformed residuals for HEVC assumes Laplacian distribution with the zero mean [25], which can be presented as

$$ {f}_U(u)=\frac{\lambda }{2}{e}^{-\lambda \left|u\right|}, $$

(8)

where $ \lambda =\sqrt{2}/\sigma $ is the parameter of Laplacian distribution, σ is standard deviation of the transformed residuals.

From Eq.(7) and Eq. (8), the encoded distortion D(Q) is [26].

$$ D(Q)=\sum \limits_{i=-\infty}^{+\infty}\underset{\left(i-0.5\right)Q}{\overset{\left(i+0.5\right)Q}{\int }}{\left|u-C(i)\right|}^2\frac{1}{\sqrt{2}\sigma }{e}^{-\frac{\sqrt{2}}{\sigma}\left|u\right|} du, $$

(9)

In [26], Eq. (9) is further simplified as

$$ D(Q)\approx g\left({\sigma}^2,Q\right)=\frac{\sigma^2{Q}^2}{12{\sigma}^2+{Q}^2}=\frac{Q^2}{12+{Q}^2/{\sigma}^2}. $$

(10)

D (Q) is a function of σ² and quantization step Q.

Therefore, for I frame intra coding, distortion can be expressed as

$$ {D}_I\left({Q}_I\right)\approx g\left({\sigma_I}^2,{Q}_I\right)=\frac{{Q_I}^2}{12+{Q_I}^2/{\sigma_I}^2}, $$

(11)

where Q_I is quantization step size in I frame intra coding, σ_I² is the variance of the transformed residuals in I frame intra coding.

For B/P frame inter coding, distortion can be expressed as

$$ {D}_{\mathrm{BP}}\left({Q}_{\mathrm{BP}}\right)\approx g\left({\sigma_{\mathrm{BP}}}^2,{Q}_{\mathrm{BP}}\right)=\frac{{Q_{\mathrm{BP}}}^2}{12+{Q_{\mathrm{BP}}}^2/{\sigma_{\mathrm{BP}}}^2}, $$

(12)

where Q_BP is quantization step size in B/P frame inter coding, σ_BP² is the variance of the transformed residuals in B/P frame inter coding.

In coding experiments, if we want to maintain the same distortion for different coding types(I/B/P frame coding), the equation is established as follows:

$$ {D}_I\left({Q}_I\right)={D}_{\mathrm{BP}}\left({Q}_{\mathrm{BP}}\right). $$

(13)

The conversion relationship between Q_BP and Q_I can be obtained as follows:

$$ {Q_{\mathrm{BP}}}^2=f\left({Q}_I,{\sigma_I}^2,{\sigma_{\mathrm{BP}}}^2\right)=\frac{12{Q_I}^2}{12-{Q_I}^2\left(1/{\sigma_{\mathrm{BP}}}^2-1/{\sigma_I}^2\right)}. $$

(14)

The value of σ_I² and σ_BP² can be easily obtained by coding statistics. Since Q is quantization step and can be written as

$$ Q={2}^{\left( QP-4\right)/6}. $$

(15)

From Eq. (14) and Eq. (15), conversion between B/P frame quantization parameters and I frame quantization parameter can be re-written as

$$ {QP}_{\mathrm{BP}}=h\left({QP}_I,{\sigma_I}^2,{\sigma_{\mathrm{BP}}}^2\right)=3{\log}_2\frac{12\times {2}^{\left({QP}_I-4\right)/3}}{12-{2}^{\left({QP}_I-4\right)/3}\left(1/{\sigma_{\mathrm{BP}}}^2-1/{\sigma_I}^2\right)}+4, $$

(16)

where QP_BP is quantitative parameters by using B/P frame inter encoding, QP_I is quantitative parameters by using I frame intra encoding. From Eq. (16), the perceptual coding threshold QP_BP of B/P frame can be obtained from the corresponding perceptual threshold QP_I used in the intra frame encoding.

From Eq. (6) and Eq. (16), the I frame quantization threshold QP_TAVT can be converted B/P frame quantization threshold and can be re-written as

$$ {\boldsymbol{QP}}_{\mathrm{TAVT}\_\mathrm{BP}}=h\left({\boldsymbol{QP}}_{\mathrm{TAVT}},{\sigma_I}^2,{\sigma_{\mathrm{BP}}}^2\right)=3{\log}_2\frac{12\times {2}^{\left(f\left({\overline{T}}_{\mathrm{ALV}},{QP}_l\right)-4\right)/3}}{12-{2}^{\left(f\left({\overline{T}}_{\mathrm{ALV}},{QP}_l\right)-4\right)/3}\left(1/{\sigma_{\mathrm{BP}}}^2-1/{\sigma_I}^2\right)}+4, $$

(17)

where QP_{TAVT_BP} is quantitative threshold used for B/P-frame inter encoding, QP_TAVT is quantitative threshold of I frame intra encoding and can be calculated from Eq. (6). Therefore, when inter-frame coding is performed, we can compute the inter coding threshold QP_{TAVT_BP} by Eq. (17) very easily, which is more suitable for HBP coding architecture.

5 Experimental results and discussions on TAVT-based stereoscopic video encoder

The recent HEVC-based video coding reference software HTM13.0 was utilized to evaluate the proposed asymmetric stereoscopic coding schemes. The detailed information of the test sequences is provided in Table 11. Six stereoscopic video sequences with various motion properties and camera arrangement, including Kendo, GT_Fly, Poznan_Street, Poznan_Hall2, Shark, Undo_Dancer, various motion properties, and camera arrangement are adopted, as shown in Table 11. The number of frames in each sequence is 200~ 300, and two views of the sequences are selected for encoding. All the experiments were defined under the common test conditions (CTC) random access configuration [27]. Test conditions were set as follows: HEVC codecs were configured with 8 bit data processing and HBP coding architecture, the maximum coding CU has a fixed size of 64 × 64 pixels, and a maximum CU depth level of 4, resulting in a minimum CU size of 8 × 8 pixels. Intra frame period was 24 and GOP length was 8. The search range of motion estimation was configured with 64, four base QP values, 22, 27, 32, and 37 were used in our experiments. For each sequence, 100 frames were encoded for each view.

Table 11 Detailed information of test sequences

Full size table

5.1 Performance on bitrate saving

In order to objectively measure how the proposed method affects 3D-HEVC performance, three schemes were conducted as given below:

Scheme I: original HTM HPB-based stereoscopic video coding.
Scheme II: stereoscopic video coding with VT model in Wang [8].
Scheme III: proposed TAVT-based stereoscopic video coding.

The performance comparison of the three schemes in right view bitrate saving is shown in Table 12. Let scheme I be a benchmark, we can obtain the saving percentage of the schemes II and III with respect to scheme I, where bitrate saving between the original HTM encoder and compared algorithms are computed as

$$ \Delta R=\left({R}_{\boldsymbol{HTM}}-R\right)/{R}_{\mathrm{HTM}}\times 100\left[\%\right], $$

(18)

where R is bitrate of compared algorithms scheme, R_HTM is bitrate of the original HTM, i.e., scheme I.

Table 12 The comparison of the bitrate saving performance of the three coding schemes

Full size table

From Table 12, we can see that scheme II saves bitrate by 1.90 to 26.81% (12.43% on average), while scheme III saves bitrate by 0.06 to 33.95% (14.30% on average).The bitrate saving of scheme III is 1.87% higher than that of scheme II. Therefore, the proposed coding scheme III can achieve a higher bitrate saving, especially for smaller QP. When base coding quantization parameter QP is 22, scheme III can achieve bitrate saving ranging from 26.63 to 46.53% (33.95% on average), while scheme II saving bitrate ranging from 19.84 to 39.64% (26.81% on average). This is because when base QP is smaller, reconstruction quality of the left view is better after encoding, which can tolerate more deterioration of reconstruction quality for right view. Compared to a single unified QP threshold of the scheme II, the scheme III analyses the texture complexity for each coding CU and adaptively adjusts the QP for the right view based on TAVT model. According to the results, when QP increased, the left view coding quality is reduced, and the stereoscopic masking effect is no longer significant. Therefore, the stereo visual perception redundancy is reduced accordingly. The quantitative threshold value obtained by scheme III is gradually close to the threshold obtained by scheme II, so the saving bitrate is gradually approaching.

By further analyzing the rate saving performance of three coding schemes in Table 12, we found that the bitrate savings performance is great difference for different test sequences. In Poznan_Street sequences (abundant texture, static background, and foreground with slow motion) and Undo_Dancer sequences (abundant texture, little parallax, and scene with slow motion), the bitrate saving percentage is 46.53 and 42.31% (with base QP = 22). In Poznan_Hall2 sequences (abundant flat region, large parallax) and Kendo sequences (violent scene movement), the bitrate saving percentage is 27.40 and 26.63% (with base QP = 22). Therefore, the proposed coding algorithm of scheme III is more suitable for those sequences in which the texture are abundant and the scene move slowly with little parallax.

5.2 RD performance evaluation of objective quality

Setting scheme I as the benchmark, the rate-distortion (RD) performances of schemes II and III are shown in Table 13. Not only the classical objective quality evaluation metric PSNR but also structural similarity index measurement (SSIM) is used to evaluate video quality in this paper. Compared with PSNR, SSIM is found to be a better indicator of perceived image quality than mean-squared error which is the theoretical basis of PSNR [28, 29]. PSNR and SSIM value of six reconstructed video sequences with different QP are shown in Table 13. Bjontegaard delta bitrate based on PSNR (BDBR_PSNR) and Bjontegaard delta bitrate based on SSIM (BDBR_SSIM) are adopted to illustrate RD performance separately.

Table 13 SSIM value of six reconstructed video sequences under different QP

Full size table

As can be seen from the Table 13, the BDBR_PSNR of scheme II ranges from − 0.8 to − 9.5%, while the BDBR_PSNR of scheme III ranges from 1.0 to − 13.4%. The average BDBR_PSNR of scheme III is − 5.7%, which is slightly inferior to scheme II with − 5.8% of average BDBR_PSNR gain. However, the BDBR_SSIM of scheme II ranges from − 0.9 to − 9.4%, while the BDBR_SSIM of scheme III ranges from − 0.7 to − 12.0%. BDBR_SSIM performance of scheme III is superior to scheme II and the average BDBR_SSIM of two schemes is − 6.7 and − 5.9%. By further analyzing the RD data in Table 13, the bad RD performance (BDBR_PSNR and BDBR_SSIM) of Poznan_Hall2 sequence affect the overall scheme III RD performance greatly. The BDBR_PSNR and BDBR_SSIMdegrades of scheme III in Poznan_Hall2 and Kendo sequence mainly due to their sequence characteristic (large parallax and abundant flat region in Poznan_Hall2 and violent scene motion in Kendo) which cause the estimated TAVT value inaccurate and smaller. Comprehensively analyzed the rate distortion performance of the two schemes, the RD performance of scheme III is better than scheme II. Different from single view video, stereoscopic video quality is also affected by the binocular perception. Because the human eye is the final receptor of the video, the subjective evaluation results are more accurate than objective evaluation results to reflect the quality of reconstructed stereoscopic video. In order to make a more accurate comparison of the three schemes, a perceptual evaluation experiment is carried out in the following section.

5.3 Rate-MOS performance of subjective perception quality assessment

A subjective quality evaluation method of Double Stimulus Continuous Quality Scale (DSCQS) video quality is used to evaluate the reconstructed video quality [17]. Following the subjective experiment standard [18], 35 observers (six of them are with 3D video subjective scoring experience and the rest are naive) are invited to the subjective perception quality assessment of reconstructed stereoscopic video. In the scoring process, five labels, “excellent,” “good,” “fair,” “poor,” and “bad” corresponding to score 5 to 1, were given to quantify the quality of each test video. According to the quality of the broadcasting video, the subjective experiment observers gave the corresponding subjective scores and finally get MOS for each test video. Through statistics and calculation, the statistical mean of MOS, 95% confidence interval for statistical mean of MOS and BDBR_MOS are illustrated Table 14.

Table 14 The MOS and BDBR_MOS for each test video sequence

Full size table

The proposed scheme III can achieve better BDBR_MOS performance ranging from − 6.6 to − 11.4%, while the BDBR_MOS of scheme II ranges from 0.4 to − 3.0%. Without the subjective quality degradation for these sequences, the average BDBR_MOS of scheme III is − 8.7% which is superior to scheme II with − 1.3% of average BDBR_MOS gain. Especially for the Poznan_Street and Undo_Dancer sequence, the high gain of − 9.8 and − 11.4% can be obtained. Therefore, we can find that the proposed scheme III achieves more considerable bitrate saving for these test sequences when compared with the schemes II and I. The Rate-MOS curves in Fig. 10 illustrate the improvement of coding performance of scheme III intuitively. It can be seen that, the scheme III achieve better Rate-MOS coding performance than the other two schemes.

6 Conclusions

Stereoscopic video coding is one of the most important technologies in three-dimensional video applications. In this paper, the influence of texture features on stereoscopic perceptual threshold is revealed and an asymmetric stereoscopic video coding scheme with texture-based asymmetrical visibility threshold model is proposed. In the proposed coding scheme, a hypothesis is first proposed and a subjective perception experiment is executed to verify the hypothesis. Based on the subjective perceptual experiment, we reveal the relationship between the asymmetrical visibility threshold and texture complexity and build our asymmetric stereoscopic video coding scheme. The proposed scheme has the following characteristics. The method takes the perceptual capabilities of human vision system into account, which achieves significant bitrate saving while maintaining perception quality very well, especially in the applications of small QP and high bitrate. Future work related to the stereoscopic perceptual video coding should focus on two aspects. One is the research on the stereoscopic TAVT model that can assess more stereoscopic perceptual features. The other is exploring the application of the proposed model in the field of rate control and super-high resolution video coding.

Abbreviations

ALV:: Average local variance
BDBR_MOS :: Bjontegaard delta bitrate based on MOS
BDBR_PSNR :: Bjontegaard delta bitrate based on PSNR
BDBR_SSIM :: Bjontegaard delta bitrate based on SSIM
CTC:: Common test conditions
DSCQS:: Double Stimulus Continuous Quality Scale
HBP:: Hierarchical B picture
HVS:: Human visual systems
JND:: Just noticeable distortion
LCU:: Largest Coding Unit
MOS:: Mean Opinion Score
PCM:: Paired Comparison Method
PSNR:: Peak signal-to-noise ratio
QP:: Quantization parameters
SSIM:: Structural Similarity Index Measurement
TAVT:: Texture-based Asymmetrical Visibility Threshold model
VT:: Visibility threshold

References

JC Chiang, JR Wu, Asymmetrically frame-compatible depth video coding. Electron. Lett. 51, 1780–1782 (2015)
Article Google Scholar
C-H Lin, K-L Chung, J-J Chen, Y-H Chiu, Y-N Chen, Fast and quality-efficient scheme for asymmetric multi-view video plus depth coding under the bitrate constraint. J. Vis. Commun. Image Represent. 30, 350–362 (2015)
Article Google Scholar
AE Rey, B Riou, D Muller, S Dabic, R Versace, “The mask who wasn’t there”: Visual masking effect with the perceptual absence of the mask. J. Exp. Psychol. Learn. Mem. Cogn. 41, 567–573 (2015)
Article Google Scholar
P. Aflaki, M. M. Hannuksela, J. Häkkinen, P. Lindroos, and M. Gabbouj, Subjective study on compressed asymmetric stereoscopic video, 2010 IEEE International Conference on Image Processing(Hong Kong, 2010), pp. 4021–4024, doi: https://doi.org/10.1109/ICIP.2010.5650661.
MG Perkins, Data compression of stereopairs. IEEE Trans. Commun. 40, 684–696 (1992)
Article Google Scholar
Z Jin, T Tillo, C Yao, J Xiao, Y Zhao, Virtual-view-assisted video super-resolution and enhancement. IEEE Transactions on Circuits and Systems for Video Technology 26, 467–478 (2016)
Article Google Scholar
M Joachimiak, MM Hannuksela, M Gabbouj, View upsampling optimization for mixed resolution 3D video coding. Multidim. Syst. Sign. Process. 27, 763–783 (2016)
Article Google Scholar
X Wang, G Jiang, J Zhou, Y Zhang, F Shao, Z Peng, et al., Visibility threshold of compressed stereoscopic image: Effects of asymmetrical coding. The Imaging Science Journal 61, 172–182 (2013)
Article Google Scholar
F Shao, G Jiang, X Wang, M Yu, K Chen, Stereoscopic video coding with asymmetric luminance and chrominance qualities. IEEE Trans. Consum. Electron. 56, 2460–2468 (2010)
Article Google Scholar
X Shang, Y Wang, L Luo, Y Zuo, Z Zhang, Fast mode decision for multiview video coding based on just noticeable distortion profile. Circuits, Systems & Signal Processing 34, 301–320 (2015)
Article Google Scholar
Z. Chen and H. Liu, JND modeling: Approaches and applications, 2014 19th International Conference on Digital Signal Processing( Hong Kong, 2014), pp. 827–830.doi: https://doi.org/10.1109/ICDSP.2014.6900782.
J Kim, S Bae, M Kim, An HEVC-compliant perceptual video coding scheme based on JND models for variable block-sized transform kernels. IEEE Transactions on Circuits and Systems for Video Technology 25, 1786–1800 (2015)
Article Google Scholar
S-P Lu, B Ceulemans, A Munteanu, P Schelkens, Spatio-temporally consistent color and structure optimization for multiview video color correction. IEEE Transactions on Multimedia 17, 577–590 (2015)
Article Google Scholar
F Shao, W Lin, W Lin, G Jiang, M Yu, R Fu, Stereoscopic visual attention guided seam carving for stereoscopic image retargeting. J. Disp. Technol. 12, 22–30 (2016)
Article Google Scholar
G. Georgiadis, A. Chiuso, and S. Soatto, Texture representations for image and video synthesis, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, 2015), pp. 2058–2066, doi: https://doi.org/10.1109/CVPR.2015.7298817.
K. McCann, B. Bross, W.-J. Han, I. K. Kim, K. Sugimoto, G. J. Sullivan, High efficiency video coding (HEVC) test model 14 (HM14) encoder description, http://phenix.int-evry.fr/jct/. Accessed 26 Mar 2014.
ITU-T Recommendation P.910, Subjective video quality assessment methods for multimedia applications, International Telecommunication Union, Geneva, Switzerland, 1999
ITU-R Recommendation BT.500-11, Methodology for the subjective assessment of the quality of television pictures, International Telecommunication Union, Geneva, Switzerland, 2002
P Aflaki, MM Hannuksela, M Gabbouj, Subjective quality assessment of asymmetric stereoscopic 3D video, signal. Image and Video Processing 9, 331–345 (2015)
Article Google Scholar
J. Zhou, G. Jiang, X. Mao, M. Yu, F. Shao, Z. Peng, et al., Subjective quality analyses of stereoscopic images in 3DTV system, Visual Communications and Image Processing (VCIP), 2011 IEEE, (Tainan, 2011), pp. 1–4. doi: https://doi.org/10.1109/VCIP.2011.6115913.
W-J Tsai, Y-C Sun, P-J Chiu, Robust video coding based on hybrid hierarchical B pictures. IEEE Transactions on Circuits and Systems for Video Technology 24, 878–888 (2014)
Article Google Scholar
M Paul, W Lin, C-T Lau, BS Lee, A long-term reference frame for hierarchical B-picture-based video coding. IEEE Transactions on Circuits and Systems for Video Technology 24, 1729–1742 (2014)
Article Google Scholar
Q Zhang, M Chen, H Zhu, X Wang, Y Gan, An efficient depth map filtering based on spatial and texture features for 3D video coding. Neurocomputing 188, 82–89 (2016)
Article Google Scholar
N Kamaci, Y Altunbasak, RM Mersereau, Frame bit allocation for the H.264/AVC video coder via Cauchy-density-based rate and distortion models. IEEE Transactions on Circuits and Systems for Video Technology 15, 994–1006 (2005)
Article Google Scholar
J. Si, S. Ma, S. Wang, and W. Gao, Laplace distribution based CTU level rate control for HEVC, in Visual Communications and Image Processing (VCIP) (VCIP) (Kuching, 2013), pp. 1–6. doi: https://doi.org/10.1109/VCIP.2013.6706333.
L. Xu, X. Ji, W. Gao, and D. Zhao, Laplacian distortion model (LDM) for rate control in video coding, in Advances in Multimedia Information Processing–PCM 2007: 8th Pacific Rim Conference on Multimedia, (Hong Kong, China, 2007). pp. 638–646.
A. V. K. Mueller, Common test conditions of 3DV core experiments, Joint Collaborative Team on 3D Video Coding Extensions (JCT-3 V) DocumentJCT3V-G1100 7th Meeting.(San Jose, CA, USA, 2014).
Z Kotevski, P Mitrevski, in ICT Innovations 2009, ed. by D Davcev, J M Gómez. Experimental comparison of PSNR and SSIM metrics for video quality estimation, vol 2010 (Springer, Berlin), pp. 357–366
A Sankisa, K Pandremmenou, PV Pahalawatta, LP Kondi, AK Katsaggelos, SSIM-based distortion estimation for optimized video transmission over inherently noisy channels. International Journal of Multimedia Data Engineering and Management (IJMDEM) 7, 34–52 (2016)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the editors and anonymous reviewers for their valuable suggestions and comments.

Funding

This work was supported by Natural Science Foundation of China (61671258, U1301257, 61471348), K.C. Wong Magna Fund of Ningbo University,the General Scientific Research Project of the Education Department of Zhejiang Province(Y201636754), Guangdong Natural Science Foundation for Distinguished Young Scholar under Grant 2016A030306022, Shenzhen Science and Technology Development Project under Grant JSGG20160229202345378, and Shenzhen International Collaborative Research Project under Grant GJHZ20170314155404913.

Availability of data and materials

Data would not be shared and available right now. Reason for not sharing the data and materials is that the work submitted for review is not completed. The research is still ongoing, and those data and materials are still required by the author and co-authors for further investigations. We will share the data online after the paper is published. People with any questions are welcome to contact us.

Author information

Authors and Affiliations

Faculty of Information Science and Engineering, Ningbo University, Ningbo, China
Baozhen Du, Mei Yu, Gangyi Jiang & Tianzhi Zhu
Electronic and Information College, Ningbo Polytechnic, Ningbo, China
Baozhen Du
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yun Zhang
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Yun Zhang

Authors

Baozhen Du
View author publications
You can also search for this author in PubMed Google Scholar
Mei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Gangyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianzhi Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

BD designed the proposed algorithm and drafted the manuscript. MY designed and conducted the subjective experiments. GJ offered useful suggestions and helped to modify the manuscript. YZ participated in the algorithm design and tested the proposed algorithm. TZ conducted the subjective experiment and performed the statistical analysis. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mei Yu.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Du, B., Yu, M., Jiang, G. et al. A novel texture-based asymmetric visibility threshold model for stereoscopic video coding. J Image Video Proc. 2018, 27 (2018). https://doi.org/10.1186/s13640-018-0265-y

Download citation

Received: 01 December 2016
Accepted: 10 April 2018
Published: 26 April 2018
DOI: https://doi.org/10.1186/s13640-018-0265-y

A novel texture-based asymmetric visibility threshold model for stereoscopic video coding

Abstract

1 Introduction

2 Framework of proposed stereoscopic video coding system

3 Texture-based asymmetrical visibility threshold model

3.1 Generation of test stereoscopic image set

3.2 Subjective test environment and methodology

3.3 Definition of visibility threshold for asymmetrical coding

3.4 Experimental data analysis and non-linear fitting model

3.5 Verification and evaluation of TAVT model with natural stereoscopic images

3.5.1 The reliability and effectiveness test for TAVT model

3.5.2 Comparison and analysis of QP threshold for symmetric and asymmetric coding

3.5.3 Coding performance using TAVT model for stereoscopic image

4 The proposed TAVT-based stereoscopic video encoder

4.1 TAVT-based stereoscopic video encoder

4.2 Extension of TAVT model for HBP

5 Experimental results and discussions on TAVT-based stereoscopic video encoder

5.1 Performance on bitrate saving

5.2 RD performance evaluation of objective quality

5.3 Rate-MOS performance of subjective perception quality assessment

6 Conclusions

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords