- Research Article
Video Enhancement from Multiple Compressed Copies in Transform Domain
EURASIP Journal on Image and Video Processingvolume 2010, Article number: 404137 (2010)
Increasingly, we can obtain more than one compressed copy of the same video content with different levels of visual quality over the Internet. As the original source video is not always available, how to choose or derive a video of the best quality from these copies becomes a challenging and interesting problem. In this paper, we address this new research problem by blindly enhancing the quality of the video reconstructed from such multiple compressed copies. The aim is to reconstruct a video that achieves a better quality than any of the available copies. Specifically, we propose to reconstruct each coefficient of the video in the transform domain by using a narrow quantization constraint set derived from the multiple compressed copies together, using a Laplacian or Cauchy distribution model for each AC transform coefficient to minimize the distortion. Analytical and experimental results show the effectiveness of the proposed method.
Over the past few decades, transform-based coding has been widely used in lossy image and video compression to exploit the spatial correlation of visual signals. Achieving good energy compaction over a wide class of visual signals, block-based discrete cosine transform (DCT) is commonly adopted in most popular video compression standards, including H.261/3/4 and MPEG-1/2/4 [1, 2]. While block-based coding can attain good quality at high bit rates, it often suffers from undesirable coding artifacts (such as blocking artifact, ringing noise, and corner outliers) at moderate to-low bit rates. These coding artifacts are mainly due to the error introduced by the quantization/dequantization process, which may result in severe loss in visual quality and fidelity of the reconstructed video.
To alleviate this problem, postprocessing is one of the most promising solutions as it can improve the video quality without the need of changing the encoder structure. Many postprocessing techniques have been proposed to reduce the quantization artifacts of block-based coding. These include block-boundary postfiltering techniques to smooth the discontinuous in either spatial [3–7] or transform domain [8–13] such as adaptive filtering and wavelet-based filtering. Also proposed are more sophisticated methods that enhance the reconstructed video by using image/video restoration techniques such as iterative methods based on the theory of projection onto convex sets (POCS) or constrained minimization [14–18], maximum a posterior probability estimation approach (MAP) [19–21], and regularized image/video restoration [22–27]. These methods consider the compressed images/videos to be distorted by a codec system and apply restoration techniques to reduce the quantization noises and coding artifacts.
With the development of network and communication techniques as well as the popularity of video-centric websites such as YouTube, Facebook, and Google Video, delivery of visual signals over the network has become more and more popular. Given the phenomenal rate at which image and video contents are being generated and distributed, we can now easily obtain many copies of the same video content with different levels of visual quality. For example, different people may record the same interesting soccer match or a piece of news from a television channel and encode it in different formats or using different coding parameters to meet their constraints (e.g., transmission bandwidth, storage capacity, etc.) before sharing it over the network. Similarly, one can gain access to many copies of movie trailers or video clips extracted from DVDs, which have exactly the same content but different visual quality.
Employing the existing postprocessing techniques, one can possibly enhance the quality of each of these compressed copies independently from the other copies. However, as the original source video or information on the video quality is not always available, how to obtain the best video from these multiple compressed copies becomes an interesting problem. The problem shares some similarity with the well-known superresolution (SR) restoration problem, which has been addressed intensively in the literature.For example, Gunturk et al. [28, 29] and Segall et al.  proposed to reconstruct high-resolution images by using multiple neighboring low-resolution frames of compressed videos. It should be noted that the restoration or enhancement of high-resolution images in SR requires a set of low-resolution observations, which usually contain different but related views of the scene (e.g., images taken from different cameras, view angles, illumination conditions, or even a sequence of frames from a video). What we consider here is, however, to enhance the video quality from multiple compressed copies of the same content (i.e., no spatial variations) with different levels of quantization noise.
In this paper, we address this new research problem by blindly enhancing the quality of the video reconstructed from multiple compressed copies of the same visual content, where existing postprocessing techniques may no longer be suitable nor effective as they usually consider only a single compressed video. Our aim is to reconstruct a video that achieves better quality than any of the available copies. The proposed method is considered to be a "blind" approach as the original source video is not available, and this makes the problem particularly challenging as we donnot definitively know which of the multiple copies, which frame of a copy, and which region of a frame have the best quality.
In our previous work, we have proposed a scheme based on the theory of POCS to improve the video quality from multiple video copies . However, having projected iteratively the reconstructed video onto the quantization constraint sets in the transform domain and the smooth constraint sets in the spatial domain, the method incurs intensively computational complexity. Here, we consider a different approach and propose a fast method to reconstruct the enhanced video in the transform domain. By exploiting the information from the quantization constraint sets and transform coefficient statistics only in the transform domain, the proposed method can provide an enhanced video with better quality than any of the available copies while incurring much lower computational complexity compared with the previous method.
Specifically, we propose to reconstruct each coefficient of the video in the transform domain by using a narrow quantization constraint set derived from the multiple compressed copies and in which the exact value of the coefficient should lie. In addition, a Laplacian or Cauchy distribution model is utilized to further reduce the distortion of each AC transform coefficient. Analytical and experimental results show that the video reconstructed by the proposed method generally yields a distortion smaller than that of any of the compressed copies available. In many scenarios, the proposed method can attain a notable gain in terms of average peak-signal-to-noise ratio (PSNR) compared to the best video from the multiple compressed copies.
The remainder of the paper is organized as follows. Section 2 briefly reviews some key features of transform-based coding in the latest H.264/AVC standard exploited in this paper. Section 3 formulates the problem of blindly enhancing the video content from multiple compressed copies and describes the proposed method with the assumption that the temporal resolutions of the available copies are well aligned. An effective method for temporal registration of multiple compressed copies is presented in Section 4. Mathematical analysis and experimental results to justify of the performance of the proposed method are presented in Sections 5 and 6, respectively. In Section 7, we conclude the paper by summarizing our main contributions.
2. Brief Overview of H.264/AVC Transform
Many video coding standards, such as H.261/3/4 and MPEG-1/2/4, have been developed and standardized to address the need of efficient storage and delivery of video content. Although the recent video coding standard H.264/AVC has incorporated a number of advanced coding features to achieve its high coding efficiency, it still employs the so-called block-based hybrid video coding approach. The basic algorithm of a hybrid video coding approach makes use of motion-compensated prediction to exploit the temporal statistical dependency, and transform coding to exploit the spatial statistical dependency. In this section, we will review some key features of transform coding through the state-of-the-art video coding standard H.264/AVC, which will be exploited in this paper to demonstrate the effectiveness of the proposed method.
In essence, the existing video coding standards support intracoding and intercoding. While intercoding employs temporal prediction (motion compensation) from previously encoded pictures, intracoding only uses the information contained in the picture itself. The (either intra- or inter-) prediction residue, which is the difference between the original and the predicted picture, is then transformed, quantized, and coded. Instead of using discrete cosine transform (DCT) like in the previous standards (e.g., H.263 and MPEG-1/2/4), a integer transform, which basically has the same properties as a DCT, is applied in H.264/AVC to avoid the mismatch between encoders and decoders.
Let denote a block. The integer transform coefficients of are defined in H.264/AVC as
where ⊗ represents point-to-point multiplication (e.g., each element of is multiplied by the element in the same position of matrix ), is the forward transformation matrix, and is the forward postscaling factor matrix, which are defined as 
Note that if the macroblock is coded in intra prediction mode, the DC coefficients of the luma residue blocks will be transformed again using a Hadamard transform to decorrelate the DC coefficients before the quantization process (see  for details).
Let denote the quantized coefficient value and denote the dequantized coefficient value of . The quantization process in H.264/AVC is defined as follows:
where is the quantization step size, is the rounding control parameter, is the floor operator that rounds to the nearest integer towards minus infinity, and returns the sign of a signal. In the implementation of the H.264/AVC reference software , for intra blocks and for interblocks. It should also be noted that the postscaling operation is incorporated together with the forward quantizer in the reference software to avoid rounding errors in the transform process. The quantization step size is determined by a quantization parameter (QP), which is calculated by the rate control algorithm and may be different for each macroblock. The quantization step size will double in size for every increment of 6 in QP and increase by 12.5% for each increment of 1 in QP. Mathematically, can be computed from QP as
where QP2QSTEP function is defined in Table 1, and is the remainder of integer division of by .
The inverse transform is given by 
where is the inverse transformation matrix, and is the inverse postscaling factor matrix, which are defined as
3. Problem Formulation and Proposed Method
We can formulate the problem considered as follows: given different compressed copies of the same video content, let and denote the quantized (residual) value and the corresponding quantization step size of the th integer transform coefficient in a video frame of the th copy, respectively, and, if any, let be the corresponding intraprediction value for intramode or motion-compensated prediction value for intermode in the integer transform domain. Note that the prediction values from the compressed videos are only available in the spatial domain. Hence, is obtained by using the integer transform as defined in (1). The decoded value of the th integer transform coefficient in the th copy is computed as
The distortion of the decoded frame in terms of mean square error (MSE) can be computed in the transform domain according to Parseval's Theorem as follows:
where is the total number of pixels in the video frame, and denotes the value of the th integer transform coefficient of the original frame.
Let be the estimated value of the coefficient in the enhanced reconstructed video. The objective is to find given 's and 's from multiple compressed copies of the same video content such that the distortion of the reconstructed video is no larger than that of any given copy, that is,
As quantization is a many-to-one mapping, the quantized value and quantization step size of each coefficient will specify an interval, referred to as the quantization constraint set (QCS), in which the exact value of the coefficient should lie. Different compressed copy will possibly give a different QCS of each coefficient in the video frame. In the simple dead-zone scalar quantization defined by (3), it is easy to see that each QCS of the th integer transform coefficient reveals that , where
Given compressed copies, we can have up to different QCSs in which the exact value of the th integer transform coefficient should lie. Thus, if the frames of these different compressed copies are well aligned, we can obtain a narrow QCS for by taking the intersection of these sets, as follows:
Equation (11) shows that by having multiple compressed copies, the size of the QCS for each integer transform coefficient can likely be reduced. The reduction in QCS size allows us to estimate a more accurate integer transform coefficient and thus reconstruct a video frame with a lower distortion. Ideally, we can obtain the exact integer transform coefficient if the QCS becomes a scalar, a scenario that rarely occurs. Hence, we propose in this paper to reconstruct each integer transform coefficient by using the corresponding narrow QCS and justify that by doing so the constraints in (9) can be satisfied.
Since each integer transform coefficient is quantized independently by a quantization step size, minimizing the MSE subject to the constraints specified in (9) is equivalent to minimizing the distortion caused by each integer transform coefficient. In order to have a quantization that better fits to a nonuniform distribution of the integer transform coefficient over the QCS, H.264/AVC decoder uses the rounding control parameter in (3) to control the position of the reconstructed value inside the QCS interval. Due to the nonsymmetric distribution, the reconstructed value is not located in the center of the corresponding QCS like the previous coding standards such as H.263 and MPEG-1/2/4. A fixed value of smaller than half of the QCS size is used to reduce the quantization error. However, to achieve the optimal quantization error, the reconstructed value should be adaptively decided based on the probability distribution of the integer transform coefficient over the corresponding QCS.
Consider the th integer transform coefficient . Let denote the probability density function (pdf) of that integer transform coefficient. Given a QCS of , the average distortion incurred by reconstructing the coefficient as can be computed as
where is the pdf of over the QCS , which has the form of . Here, is the constant so that . It follows that and
To minimize (13), it is easy to show that (see  e.g.,), the reconstructed value should be chosen as the centroid of the QCS , given by
It has been shown in the previous studies that in the DCT transform domain of a natural image, while the DC coefficients can be approximated as the uniform distribution, the AC coefficient distribution can be modeled by a generalized Gaussian [35, 36] or Laplacian [37, 38] probability density function. Although the generalized Gaussian model gives the most accurate representation of the AC coefficient distribution, the Laplacian model is commonly employed due to it being more tractable both mathematically and computationally. Recently, Kamaci et al.  proposed to use the Cauchy model, which is shown as a better choice than the Laplacian model for estimating the actual probability distribution of AC coefficients in H.264/AVC. In this paper, both Laplacian and Cauchy models will be examined for the estimation of the reconstructed video. In what follows, we present a method to estimate the parameters of both distribution models by using the decoded values from the compressed videos.
Laplacian Model Parameter Estimation
The Laplacian probability density function can be described by
where is the distribution parameter and is the coefficient value for a given AC frequency. If the original coefficient values are known, an estimation of parameter could be computed using the maximum-likelihood (ML) method as
where is the original th coefficient value for a given AC frequency and is the number of coefficients.
To estimate the Laplacian distribution of each AC coefficient from the dequantized values, we adopt the ML method proposed in . Let , be the dequantized values for a given AC frequency from all the integer transform blocks of the video frames in the compressed copies, and be the corresponding quantization step sizes. Note that the number of dequantized values will be the product of the number of integer transform blocks in a video frame and the number of compressed copies . Let be the QCS in which the original coefficient lies and can be computed using (10). The ML estimate of the parameter is given by
where is the probability of the reconstructed AC coefficient being , and it can be computed as
Substitute (18) into (17), we have
Differentiate (19) with respect to , we obtain
whose solution can be found by using an iterative root finding algorithm. In our implementation, we used the Newton-Raphon's root finding method .
Cauchy Model Parameter Estimation
The Cauchy probability density function can be described by
where is the distribution parameter. Similar to the Laplacian model, the parameter in the Cauchy model can be estimated by the ML method using (17), where can be computed as
Substitute (22) into (17), we have
Differentiate (23) with respect to , we obtain
whose solution can also be found by using an iterative root finding algorithm. Similar to the case of the Laplacian model, the Newton-Raphon's root finding method  was used in our implementation.
In short, our proposed method for enhancing the video reconstructed from multiple compressed copies can be summarized as follows.
Estimate the parameters of the Laplacian and Cauchy distribution for each AC coefficient using (20) and (24), respectively.
Obtain the narrow QCS for each integer transform coefficient from the multiple copies using (11).
Reconstruct each integer transform coefficient as the centroid of the narrow QCS obtained in Step 2 using (14).
Note that although H.264/AVC was used in the implementation for evaluating the performance of the proposed method, it can be readily extendable to other video coding standards such as H.263 or MPEG-1,2,4. With different quantization methods in various coding standards such as H.263 and H.264/AVC, only minor modification in (10) is required to obtain the quantization constraint sets for different video coding standards. Nevertheless, the proposed method may not be readily applicable in the case of multicodecs involving both H.264/AVC and the existing coding standards such as H.263. This is due to the different types of transformation used in different coding standards (i.e., integer transform in H.264/AVC and DCT transform in H.263 and MPEG-1,2,4). However, in the case of multi-codecs involving only the old coding standards (e.g., multiple compressed videos encoded by H.263 and MPEG-1,2,4), the proposed method can still be applicable.
It is easy to see that the most computationally intensive part of the proposed method is to construct the narrow QCS and to estimate the model parameter of the distribution for each AC integer transform frequency. Other than the quantization parameters and quantized values available in the compressed bitstream, the prediction values in the integer transform domain are also needed to compute the narrow QCS, which requires fully decoding every available compressed copy. As only simple and straightforward calculations are required to compute the narrow QCS using (10) and (11) and the reconstructed integer transform coefficient as the centroid of the narrow QCS using (14), this amount of computation is rather insignificant. By applying root-finding algorithms such as the Newton-Raphon's method, the model parameter estimation for the distribution of each AC integer transform frequency does not require much computation either in comparison with the whole fully decoding process. Thus, the complexity of the proposed method is approximately equal to the complexity required to decode all available compressed input copies.
In addition, in comparison with our previous method  or any relevant SR and post-processing methods for quantization error reduction, the proposed method generally requires much less computational complexity. Note that these methods widely employ the constraint-based techniques with the popular theory of projection onto convex sets (POCS). One of the necessary constraint sets is the smoothness constraint set (SCS) computed in the spatial domain, which also requires fully decoding all the compressed input copies, not to mention the computational load required for the computation of the smoothness criteria. Furthermore, the iterative projection process among various constraint sets requires a number of conversions among the SCS and other constraint sets (e.g., between the spatial domain for the SCS and the transform domain for the QCS), which results in intensively computational load. In order to converge to the optimal solution, a few number of iterations is generally required, which makes the computational complexity of these methods significantly higher compared with that of the proposed method.
4. Video Alignment
In Section 3, we propose an effective method to enhance the reconstructed video from multiple compressed copies of the same video content under the assumption that the frames of the available copies are well aligned. However, this assumption may not always hold in practice. For example, a same broadcast video can be encoded by different people starting at slightly different time instances. The same video may also be edited, encoded at different frame rate (e.g., 3-2 pull down), or subjected to frame dropping during the video compression process.
We propose in this section a simple method to align the given compressed video sequences. Without loss of generality, we focus on the alignment of two video sequences here. Let and where and represent the th video frames, and are the total number of frames in the two video sequences. Our objective is to find alignment functions () and () such that frame is similar to frame , for , where is the total number of possible matching frame pairs. Mathematically, finding the optimal alignment functions and is equivalent to minimizing the matching cost function defined as
where is the distance function representing the difference or dissimilarity between frame and , is the weighting function which could place different emphasis on different aligned frame pairs, and . Frame is considered similar to frame if their frame distance measure is sufficiently small. In addition, it should be noted that the minimization is subject to a causal constraint on and that is and with .
It can be seen that the accuracy of the alignment will partly depend on how efficiently the frame distance measure is able to differentiate dissimilar frames. Many sophisticated frame distance measures have been proposed in the literature for image/video matching, as color histogram, image signatures, and so forth. Since compressed copies of the same video content exhibit no spatial variations such as different view angles or illumination conditions like the case of existing image/video matching problems, we use here a simple but effective frame distance measure based on the side information extracted from the compressed videos.
Let be the narrow QCS of the th integer transform coefficient obtained from frame and using (11). The proposed frame distance measure between frame and is defined as
where is the total number of pixels in the video frame. Extensive simulation results show that the distance between aligned frame pairs is generally small compared with that of misaligned frames. To illustrate, we obtained two compressed copies of a short video segment from a popular situation comedy by encoding the original movie at different coding parameters. Figure 1 shows the proposed distance measure from a certain frame of one copy (e.g., frame 30 and frame 50) to all frames of the other copy. As can be seen from the figure, the distances between aligned frame pairs are sufficiently small compared with that of misaligned ones. In our work, we define a frame pair to be similar if their distance measure is smaller than some threshold, which is empirically obtained by simulation with a large number of video sequences.
To solve (25), we use the forward dynamic programming technique proposed in  for video retrieval. Let is the minimum matching cost between two subsequences and . The minimum cost for all and can be computed by using the recursive formula given by
where for or . Hence, the matching frame pairs between video sequences and can be found by determining the optimal path (i.e., the path with minimum final matching cost ). It should be noted that only frame pairs obtained from the optimal path whose the distance measures are smaller than some predefined threshold will be utilized to enhance the reconstructed video by using the proposed method.
It is easy to see that the most computationally intensive part of the proposed alignment method is to compute the minimum matching cost function using (27). The computation of each needs only three algebraic operations (each algebraic operation consists of one addition and one multiplication) and two numerical comparisons. Therefore, the optimal path, and hence matching frame pairs between video sequences and , can be obtained with algebraic operations and numerical comparisons. Thus, the complexity of the proposed alignment method is .
To evaluate the performance of the proposed alignment method, we have conducted the experiment on a large number of test sequences. To create the misalignment among the compressed video inputs, the original video sequence was encoded starting from different time instances. Furthermore, we purposely dropped some video frames randomly from the original test sequence before encoding to obtain a compressed copy. The experimental results show that the proposed method can obtain the matching frame pairs among these misaligned compressed copies with a hundred percent of accuracy.
5. Analytical Justification
We justify in this section that reconstructing integer transform coefficients using the narrow QCS can generally yield a lower distortion than that of using only the QCS of any single copy.
Let be a random variable representing an integer transform coefficient, which can be either uniform (for a DC coefficient) or Laplacian/Cauchy (for an AC coefficient). denotes the reconstructed value of as the centroid of the QCS . The estimated mean-squared error can be obtained by (13).
Consider a quantization constraint set , and its subset where (i.e., and ). Then, for any such subset , we have
as the functions of β. Then, is also a function of β and can be easily obtained as . We first prove that is an increasing function of β by showing . Rearranging as and taking the derivative of the above function with respect to β, we have
Using the Leibniz integral rule, it is easy to see that
Replacing (30) into (29), we obtain
Showing is equivalent to show . Taking the first and the second derivatives of with respect to β, we have
Since is a symmetric function, we only need to consider . Let . It is easy to see that . Hence, is a decreasing function with the increase of β. It follows that and , hence . Thus, increases with β, and . This leads to increasing with β too, and . Similarly, we can prove that is a decreasing function of α. Hence, the assertion in Lemma 1 holds.
Lemma 1 implies that reconstructing quantized coefficients as the centroid over a narrow QCS can yield a lower distortion on average. Since the proposed method reconstructs the value of each integer transform coefficient by using a narrow QCS that is a subset of the QCSs obtained from the multiple copies, we can reconstruct a video which has a lower distortion, on average, than the video decoded from any given compressed copy.
Furthermore, one would expect that more decrease in distortion can be achieved when the size of the narrow QCS decreases. However, how narrow the intersection of the multiple QCSs depends not only on the relation among the sizes of QCSs from multiple compressed copies, which are determined by the corresponding quantization step sizes, but also the position of the QCSs' intervals. In particular, we can unlikely obtain a narrower QCS through intersection when the quantization step sizes are not close to each other. For example, if the quantization step size of one copy is too large compared to another, there is a high probability that the QCS interval determined by the smaller quantization step size is entirely confined by the other QCS interval. In this case, we cannot obtain through intersection a QCS narrower than that of the copy with the smaller quantization step size, resulting in no reduction in distortion compared with that copy. Figure 2(a) illustrates this scenario where the quantization step sizes of Copy 1 and 2 are too large compared with that of Copy 3. This happens when the Copy 3 is compressed at much higher quality than the other copies. As a result, we could not obtain a narrower QCS compared with that of Copy 3, which leads to that the quality of the reconstructed video is not better than that of Copy 3. However, it can be seen in Figure 2(b) the relative positions among multiple QCSs can help to reduce the size of the narrow QCS significantly. This is because the position of each independent QCS is partly determined by the prediction value (see (7)), which can be much different for each compressed copy. In addition, one would also expect intuitively the size of the intersected QCS will decrease when more compressed copies are available.
To further illustrate this insight, we provide a simple example. Consider two compressed copies of an integer transform coefficient , which are coded using two different quantization step sizes and , respectively. We extensively sampled the values of based on a Laplacian distribution and quantized with and two different values of and 53. As is not larger than , it is obvious that the coefficient reconstructed from the first copy will have a lower distortion. Figure 3 shows the probability histograms of the sizes of the QCS from the first copy and the narrow QCS obtained through intersection with different values of . As can be seen from the figure, when is too large compared to (e.g., ), most of the narrow QCS through intersection has the same size as the QCS from the first copy, resulting in not much distortion reduction. More narrow QCS with smaller sizes compared to that of the first copy can be obtained when is close to . This explains why using of 19 can yield a lower distortion than using of 53 (see Figure 3).
6. Experimental Results
We have conducted a series of experiments to evaluate the performance of the proposed enhancement method. Our test sequences include ten popular CIF resolution () sequences, as shown in Table 2. These sequences contain different amounts of motion and spatial details, and have been widely tested in the literature of video compression.
The experiments were conducted by using the state-of-the-art transform-coding-based video compression standard, namely, the H.264/AVC encoder. The multiple copies of the input video were obtained by encoding the same video content using the coding standard with different target bit rates and coding parameters such as the structure of the group of pictures (GOP). In what follows, we will discuss various scenarios in which the multiple video copies were compressed in different ways, resulting in various possible performance gains.
6.1. Laplacian and Cauchy Probability Distribution Model
In the first set of the experiments, we evaluate the performance of the proposed method using the Laplacian and Cauchy models, respectively, to resemble the probability distribution of the AC coefficients. We obtained two compressed copies of the Foreman test sequence by encoding at target bit rates 900 kbits/s and 1000 kbits/s. The GOP of the first copy consists of ten frames with one bidirectional-predictive-coded (B) frame between I and P frames, while the GOP of the second copy consists of twelve frames with two bidirectional-predictive-coded (B) frames between I and P frames. Figure 4 shows the PSNR results of the best input copy, which is the copy compressed at 1000 kbits/s in this case, and the reconstructed video obtained by the proposed method using the Laplacian and Cauchy models, respectively. As can be seen from the figure, the proposed method can consistently reconstruct a video which has a better quality than that of the best input copy. However, the proposed method with the Cauchy model can provide a slightly better reconstructed video quality than that of using the Laplacian model. The superiority of the Cauchy model was also observed on the simulation results of other test sequences. Thus, the Cauchy model is selected to approximate the probability density function of the AC coefficients in our work.
6.2. Multiple Copies Compressed at Different Target Bit Rates
In the second set of experiments, the same video contents of the test sequences were encoded at different target bit rates. We considered two input sets at different bit rate ranges, each consisting of three compressed copies of the same content (see Table 3). For the purpose of comparison, we consider two cases where the video frames from the available copies at the same instance are encoded using the same picture coding types I, P, or B frame (Case 1) or different picture coding types (Case 2). The compressed video copies have different GOP structures as shown in Table 3.
Tables 4 and 5 show the average PSNR results of the video reconstructed from multiple video inputs using the proposed method for Case 1 and Case 2, respectively. Note that the first two copies in each set were used for the case of two video inputs. As expected from the analysis in Section 5, the experimental results show that without the original source video or information on the quality of each input video, the proposed method can consistently reconstruct a video which has a better quality (in terms of average PSNR) than that of any input copy. When the input copies are encoded at high-bit rate ranges or more copies are available, the improvement in quality becomes more significant. Specifically, by using all the three copies in Set 2, the video reconstructed by the proposed method can achieve about more than 1.0-dB PSNR improvement than that of the best input copy. In some specific test sequences such as Stefan and Coastguard, the PSNR gain can be more than 2.0 dB (see Table 5).
The experimental results also show that the PSNR improvement obtained from the set of low-bit rate inputs is lower than that of the high-bit rate set. This can be explained as at low-bit rate range, coarse quantization step sizes are generally used for encoding, resulting in a large QCS for each integer transform coefficient. Furthermore, the QCSs of the low-quality copies (e.g., copies 1 and 2 in Set 1) do not contribute much in reducing the size of the narrow QCS obtained by the proposed method. This is because the quantization step sizes used in these copies are generally too large compared to that of the best copy. As a result, the size of the narrow QCS cannot be significantly reduced, and hence it usually remains the same as that of the best copy. Thus, not much quality improvement compared to the best copy can be obtained (see the results of Set 1 in Tables 4 and 5 and discussion in Section 5). Furthermore, we can generally obtain better PSNR gain in the case where the similar frames from the available copies are coded using different picture coding types (Case 2 in Table 5) compared with that of using the same picture coding types (Case 1 in Table 4). Note that the size of the narrow QCS depends not only on the relation among the sizes of QCSs from multiple compressed copies, but also the relative position of the QCSs' intervals. As explained in Section 5, this relative position of each independent QCS is partly determined by the prediction value, which can be much different when different picture types are used to code similar frames from the available copies. This will help to reduce the size of the narrow QCS obtained by the proposed method significantly, resulting in more distortion reduction.
In addition, it can be seen from Figure 4 that the PSNR gains are also quite consistent and uniformly distributed over the entire sequence. For visual comparison, Figures 5 and 7 show the sample frames of the 63th and 77th frames from the Stefan sequence, respectively, which are obtained by the proposed method and by reconstructing from three input copies. The figures show that the proposed method can achieve a better perceived video quality in terms of sharpness and details compared with those reconstructed from the input copies directly. The perceptual quality differences can be easily noticed in the regions around the player, which are denoted by the rectangular boxes in Figures 5 and 7. For better visualization, these regions are enlarged and shown in Figures 6 and 8, respectively.
Note that the reconstructed frame from the best input copy, in terms of average PSNR, may not always provide better quality than those reconstructed from other copies as shown in Figure 7. However, the reconstructed frame obtained by the proposed method can still achieve better quality, in terms of both PSNR and visual quality, than the best frame reconstructed from the available input copies (i.e., the reconstructed frame from Copy 2 in this case).
6.3. Multiple Copies Compressed at the Same Target Bit Rates
In another set of experiments, the input copies were obtained by encoding the test sequences at the same target bit rates. For simplicity, the same GOP structure was used but with different starting frames for different video copies. This is likely to occur in practice, for example, when different people can encode a same broadcast video but starting at slightly different time instances and upload the compressed videos to websites such as YouTube and Google Video. Thus, the encoded picture type (i.e., I, P, or B) for each particular frame may not be the same among different compressed copies (e.g., it can be an I frame in one copy and a B or P frame in other copies).
Figures 9 and 10 show the PSNR gain of the video reconstructed by the proposed method compared with the best input copy of different target bit rates and different number of input copies for the Foreman and Mobile sequences. The results show that the proposed method can provide a higher PSNR gain compared with the case in Section 6.2. Specifically, processing the Foreman sequence using three copies encoded at bit rates 400 kbits/s, 600 kbits/s, and 800 kbits/s can only obtain 1.04-dB PSNR gain in comparison with the best copy (see Table 5). Meanwhile, with three compressed copies at 400 kbits/s and 800 kbits/s, we can yield about 1.49 dB and 1.99-dB PSNR gains, respectively. This is because, unlike the case of different bit rates, the quantization step sizes used to code each copy at the same bit rate are quite close to each other. Furthermore, the same video frame in each copy may be encoded with different picture types, resulting in different motion-compensated values. As the quantization interval of a predicted integer transform coefficient is obtained by adding the integer transform value from the reference frame(s), this could effectively reduce the size of the QCS intersection obtained by the proposed method, leading to a large reduction in the distortion. The experimental results also show that more gain can be achieved with the increase of the bit rates and number of input copies.
6.4. Multiple Copies Compressed as Variable and Constant Bit Rates
In this set of experiments, we obtained the first compressed copy of the Tennis sequence by encoding the original video using a constant quantization parameter. The second compressed copy is obtained by encoding at the same target bit rate achieved by the first copy. Unlike the case in Section 6.3, although both copies have the same target bit rate, the first copy that uses the constant QP for the entire video sequence typically obtains a good performance, in terms of both average PSNR and quality consistency. Figure 11 shows the PSNR results of both available compressed copies and the reconstructed video obtained by using the proposed method. It can be seen that although both copies have different quality in terms of average PSNR, the proposed method can still provide a notable PSNR gain compared to that of the first copy like the case in Section 6.3. The gain obtained by the proposed method is consistent and uniformly distributed over the entire sequence.
6.5. Application to Real Video Sequences
In the last set of experiments, we evaluate the performance of the proposed method when used together with some real video contents. The real video test sequences were extracted from some featured episodes of a well-known situation comedy with the resolution of pixels. The duration of each real video test sequence is about 10 seconds, which consists of about 250 frames. The sample frames of these test sequences are shown in Figure 12. The multiple copies of the input video were obtained by encoding these extracted sequences using the coding standard with different target bit rates and coding parameters, which are shown in Table 6.
Table 7 shows the average PSNR results of the video reconstructed from multiple video inputs using the proposed method and the best input copy. Like the experiments in Section 6.2, the first two copies in each set were used for the case of two video inputs. Similar to the results obtained by using the standard test sequences, we observe that the proposed method can consistently reconstruct a video which has a better quality than that of the best input copy. Specifically, with the three available compressed copies, the reconstructed video obtained by using the proposed method can obtain about 0.7-dB and 1.2-dB PSNR gains on average for the test sequences in Set 1 and Set 2, respectively.
We have addressed a new and interesting research problem of blindly enhancing the video reconstructed from multiple compressed video copies of the same video content with different levels of quality. Without making reference to the original source video or information on the quality of the compressed copies, the proposed method effectively exploits the compressed information of different video copies to reconstruct a video that has a better quality in terms of PSNR than the best compressed copy. Specifically, each coefficient of the reconstructed in the transform domain is estimated using a narrow quantization constraint set obtained from the multiple compressed copies together with a Laplacian or Cauchy distribution model for each AC frequency to minimize the distortion. By reconstructing the enhanced video in the transform domain, the proposed method incurs much lower computational complexity compared with the previous method. In addition, analytical and experimental results show that the video reconstructed by the proposed method not only yields a lower distortion than any given compressed copy but also achieves a significant PSNR gain compared to the best copy. Furthermore, a similar approach can be easily extended to other transform-based coding schemes such as DCT-based or wavelet-based transform coding.
Musmann HG, Pirsch P, Grallert H-J: Advances in picture coding. Proceedings of the IEEE 1985,73(4):523-548.
Netravali AN, Haskell BG: Digital Pictures: Representation and Compression. 2nd edition. Plenum Press, New York, NY, USA; 1995.
Reeves HC III, Lim JS: Reduction of blocking effects in image coding. Optical Engineering 1984,23(1):34-37.
Ramamurthi B, Gersho A: Nonlinear space-variant postprocessing of block coded images. IEEE Transactions on Acoustics, Speech, and Signal Processing 1986,34(5):1258-1268. 10.1109/TASSP.1986.1164961
Shen M-Y, Kuo C-C: Review of postprocessing techniques for compression artifact removal. Journal of Visual Communication and Image Representation 1998,9(1):2-14. 10.1006/jvci.1997.0378
Kwon K-K, Im S-H, Lim D-S: Deblocking algorithm in MPEG-4 video coding using block boundary characteristics and adaptive filtering. Proceedings of the IEEE International Conference on Image Processing (ICIP '05), September 2005 541-544.
Kong H-S, Nie Y, Vetro A, Sun H, Barner KE: Coding artifacts reduction using edge map guided adaptive and fuzzy filtering. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '04), June 2004 1135-1138.
Chen T, Wu HR, Qiu B: Adaptive postifiltering of transform coefficients for the reduction of blocking artifacts. IEEE Transactions on Circuits and Systems for Video Technology 2001,11(5):594-602. 10.1109/76.920189
Liu S, Bovik AC: Efficient DCT-domain blind measurement and reduction of blocking artifacts. IEEE Transactions on Circuits and Systems for Video Technology 2002,12(12):1139-1149. 10.1109/TCSVT.2002.806819
Liew AW-C, Yan H: Blocking artifacts suppression in block-coded images using overcomplete wavelet representation. IEEE Transactions on Circuits and Systems for Video Technology 2004,14(4):450-461. 10.1109/TCSVT.2004.825555
Luo Y, Ward RK: Removing the blocking artifacts of block-based DCT compressed images. IEEE Transactions on Image Processing 2003,12(7):838-842. 10.1109/TIP.2003.814252
Kim S, Jeong J: Enhancement of wavelet-coded images via novel directional filtering. Proceedings of the International Conference on Neural Networks and Signal Processing, December 2003 2: 1062-1065.
Ismaeil IR, Ward RK: Removal of DCT blocking artifacts using DC and AC filtering. Proceedings of the IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM '03), August 2003 229-232.
Rosenholtz RE, Zakhor A: Iterative procedures for reduction of blocking effects in transform image coding. Image Processing Algorithms and Techniques II, February 1991, San Jose, Calif, USA, Proceedings of SPIE 1452: 116-126.
Zakhor A: Iterative procedures for reduction of blocking effects in transform image coding. IEEE Transactions on Circuits and Systems for Video Technology 1992,2(1):91-95. 10.1109/76.134377
Yang Y, Galatsanos NP, Katsaggelos AK: Projection-based spatially adaptive reconstruction of block-transform compressed images. IEEE Transactions on Image Processing 1995,4(7):896-908. 10.1109/83.392332
Paek H, Kim R-C, Lee S-U: On the POCS-based postprocessing technique to reduce the blocking artifacts in transform coded images. IEEE Transactions on Circuits and Systems for Video Technology 1998,8(3):358-367. 10.1109/76.678636
Guleryuz OG: Linear, worst-case estimators for denoising quantization noise in transform coded images. IEEE Transactions on Image Processing 2006,15(10):2967-2986.
Mateos J, Katsaggelos AK, Molina R: A Bayesian approach for the estimation and transmission of regularization parameters for reducing blocking artifacts. IEEE Transactions on Image Processing 2000,9(7):1200-1215. 10.1109/83.847833
Li Z, Delp EJ: MAP-based post processing of video sequences using 3-D Huber-Markov random field model. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '02), August 2002 1: 153-156.
Li J, Kuo C-CJ: Coding artifact removal with multiscale postprocessing. Proceedings of the International Conference on Image Processing, October 1997, Santa Barbara, Calif, USA 1: 45-48.
Yang T, Galatsanos P, Katsaggelos AK: Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform compressed images. IEEE Transactions on Circuits and Systems for Video Technology 1993,3(6):421-432. 10.1109/76.260198
Choi MG, Yang Y, Galatsanos NP: Multichannel regularized recovery of compressed video sequences. IEEE Transactions on Circuits and Systems II 2001,48(4):376-387. 10.1109/82.933797
Li Z, Delp EJ: Block artifact reduction using a transform-domain Markov random field model. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(12):1583-1593.
Zou JJ, Yan H: A deblocking method for BDCT compressed images based on adaptive projections. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(3):430-435.
Kartalov T, Ivanovski ZA, Panovski L, Karam LJ: An adaptive POCS algorithm for compression artifacts removal. Proceedings of the 9th International Symposium on Signal Processing and Its Applications (ISSPA '07), February 2007 1-6.
Liew AW-C, Yan H, Law N-F: POCS-based blocking artifacts suppression using a smoothness constraint set with explicit region modeling. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(6):795-800.
Gunturk BK, Altunbasak Y, Mersereau RM: Multiframe resolution-enhancement methods for compressed video. IEEE Signal Processing Letters 2002,9(6):170-174. 10.1109/LSP.2002.800503
Gunturk BK, Altunbasak Y, Mersereau RM: Multiframe blocking-artifact reduction for transform-coded video. IEEE Transactions on Circuits and Systems for Video Technology 2002,12(4):276-282. 10.1109/76.999205
Segall CA, Molina R, Katsaggelos AK: High-resolution images from low-resolution compressed video. IEEE Signal Processing Magazine 2003,20(3):37-48. 10.1109/MSP.2003.1203208
Wang C, Yang G, Tan Y-P: Reconstructing videos from multiple compressed copies. IEEE Transactions on Circuits and Systems for Video Technology 2009,19(9):1342-1351.
Richardson I: H.264 and MPEG-4 Video Compression. John Wiley & Sons, New York, NY, USA; 2003.
JM 12.4-H.264 reference software http://iphome.hhi.de/suehring/tml/
de Queiroz RL: Processing JPEG-compressed images and documents. IEEE Transactions on Image Processing 1998,7(12):1661-1672. 10.1109/83.730378
Müller F: Distribution shape of two-dimensional DCT coefficients of natural images. Electronics Letters 1993,29(22):1935-1936. 10.1049/el:19931288
Eude T, Grisel R, Cherifi H, Debrie R: On the distribution of the DCT coefficients. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '94), April 1994, Adelaide, Australia 5: 365-368.
Lam EY, Goodman JW: A mathematical analysis of the DCT coefficient distributions for images. IEEE Transactions on Image Processing 2000,9(10):1661-1666. 10.1109/83.869177
Smoot SR, Rowe LA: DCT coefficient distributions. Human Vision and Electronic Imaging, February 1996, San Jose, Calif, USA, Proceedings of SPIE 403-411.
Kamaci N, Altunbasak Y, Mersereau RM: Frame bit allocation for the H.264/AVC video coder via cauchy-density-based rate and distortion models. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(8):994-1006.
Brandão T, Queluz MP: No-reference image quality assessment based on DCT domain statistics. Signal Processing 2008,88(4):822-833. 10.1016/j.sigpro.2007.09.017
Kelley CT: Solving Nonlinear Equations with Newton's Method, Fundamentals of Algorithms. SIAM, Philadelphia, Pa, USA; 2003.
Tan YP, Kulkarni SR, Ramadge PJ: A framework for measuring video similarity and its application to video query. Proceedings of the IEEE International Confference on Image Processing (ICIP '99), 1999, Kobe, Japan 2: 106-110.
This research is partially supported by a research grant awarded by The Agency for Science, Technology and Research (A*STAR), Singapore, under the Mobile Media Thematic Strategic Research Programme of the Science and Engineering Research Council.