Distributed video coding supporting hierarchical GOP structures with transmitted motion vectors
© Min et al.; licensee Springer. 2015
Received: 14 May 2014
Accepted: 29 April 2015
Published: 10 May 2015
In this paper, we propose a new distributed video coding (DVC) method, with hierarchical group of picture (GOP) structure. Coding gain of DVC can be significantly improved by enlarging GOP size for slow-moving frames. The proposed DVC decoder estimates a side information (SI) frame and transmits motion vectors (MVs) of the SI to the proposed encoder. Using the received MVs from the decoder, the proposed encoder can generate a predicted SI (PSI), which is the same as the SI in the decoder, and estimate the quality of PSI with minimal computational complexity. The proposed method decides the best coding mode among key, Wyner-Ziv (WZ), and skip modes, by estimating rate-distortion costs. Based on the selected best coding mode, the best GOP size can be automatically determined. As the GOP size is adaptively decided depending on the SI quality, entropy and parity bits can be effectively consumed. Experimental results show that the proposed algorithm is around 0.80 dB better in Bjøntegaard delta (BD) bitrate than an existing conventional DVC system.
KeywordsVideo coding Distributed video coding Side information GOP RD competition
As many portable multimedia devices have been developed, such as mobile phones, electronic pads, and laptops, many people enjoy using them to take videos, transmit them to friends or web-sites such as YouTube and Facebook, and in turn, view them. These days, video sensor networks are also used to monitor very large outdoor areas for environment surveillance and safety. Therefore, demands for low cost and powerful encoders are continuously increasing. However, conventional video coding standards, such as MPEG-x and H.26x, cannot satisfy these requirements, because those encoders have high computational complexity, while their decoders require low complexity. Distributed video coding (DVC) methods have been researched to meet these requirements. DVC technology is based on migration of computational complexity from encoders to decoders and can achieve coding gain with regard to prediction on the decoder side.
DVC was developed as a new video-coding paradigm derived from Slepian-Wolf information theory . They proved that the DVC can perform encoding by disregarding correlation between two input signals and the coding performance of the decoder side by exploiting the correlation can come close to the efficiency of the conventional coding systems that employs the correlation at the encoder side. Wyner-Ziv  presented the extended work to show information theoretic bounds for lossy compression by side information at the decoder. Based on the Wyner-Ziv theory, several lossy DVC approaches which do not perform motion estimation have been proposed in order to reduce computational complexity of the DVC encoder [3-6]. To reduce temporal redundancy, motion estimation is performed on the DVC decoder side, not in the encoder. For DVC based on the Wyner-Ziv approach, the original input frames are coded by two different modes [3-6]. One mode is to code with the conventional intra coding technique and the coding mode is called the key-frame mode. The other mode is performed by a channel coder after pre-processing, and the coding mode is called the Wyner-Ziv (WZ) mode. While the outputs of the channel coder are parity bits and original data, only a part of parity bits are sent to the DVC decoder for compression performance. The reconstructed WZ frame is reconstructed by a channel decoder with the transmitted parity bits for a side information (SI) frame. The SI frame is generated the same as possible as the original frame with the reconstructed key frames in the decoder when the size of the group of picture (GOP) is small, e.g., the size is equal to 2. The error of SI frame is assumed to be transmission error caused by a variable channel. SI frame is regarded as the predicted frame of the original frame (WZ frame), degraded by channel errors. Therefore, the errors are corrected by a channel decoder. A low density parity check accumulate (LDPCA) coder and Turbo coder are often used for DVC systems [7-12].
In general, conventional codecs, such as h.26x and MPEG-x, set GOP size from 8 to 30, as the increasing number of intra-frame degrades compression rate. However, a lot of the conventional DVC systems set the GOP size to the minimum of two, because performance of DVC is directly related to accuracies of SI frames. Accuracies of SI frames cannot be known at both the encoder and decoder sides, and accuracies of SI frames are generally the best with GOP size set as two. In addition, since accuracies of SI frames vary, depending on the features of a sequence, they cannot be correctly predicted. However, SI frames are generated well for slow-moving cases. For these cases, GOP size can be prolonged, to reduce the entropy bits and/or parity bits. Therefore, some conventional DVC algorithms are proposed to predict accuracies of SI frames on the encoder side and increase GOP size [13,14]. Since the purpose of DVC is to reduce computational complexity of the encoder, they should generate a predicted SI (PSI) with low delay, though an SI frame which is generated using several motion estimation algorithms and filters, for high quality of an SI on the decoder side. Therefore, although the encoder generates a PSI frame by using the estimated MVs and key frames in its own, the PSI frame is not the same as the SI frame on the decoder side. As a result, the estimated accuracies of PSI frames are different from those of SI frames and lead to a decrease in compression performance.
The proposed DVC performs motion estimation at the decoder side, and the estimated motion vectors (MVs) are transmitted to the corresponding DVC encoder. The proposed encoder can generate a PSI that is identical to the SI frame of the decoder side with minimal computation load, because motion compensation is performed with the received MVs and reference key frames. Therefore, the proposed encoder can correctly estimate the quality of the SI frames. Based on the accuracies of the SI frames, the best coding mode is selected based on rate-distortion (RD) optimization; thus, the GOP size can be adaptively and hierarchically set. In this paper, each frame is coded as one among key, WZ, and skip modes. In order to assess the RD cost of the key mode with minimum computational complexity, the proposed method estimates it with a weighted linear interpolation of RD costs of neighboring key frames. Distortion of a frame coded by WZ mode can be estimated with the original frame, and the PSI frame and rates of the frame can be estimated with the number of errors. Therefore, the proposed method assesses the RD cost of WZ mode with the compensated frame. The RD cost of skip mode can be estimated with the PSI frame on the encoder side. Based on RD competition, the proposed method can select the best coding mode and GOP size prior to actual encoding. The RD competition estimates rates and distortions for each coding modes in advance. Therefore, the proposed method improves coding performance by enlarging the GOP size for slow-moving frames. Note that the WZ frame is coded in frequency domain with LDPCA.
The rest of this paper is organized as follows. Section ‘Conventional DVC algorithms’ introduces several conventional DVC algorithms. Section ‘Proposed DVC for hierarchical GOP structure’ presents details of the proposed method. In Section ‘Experimental results’, experimental results are given and discussed. Finally, Section ‘Conclusions’ concludes this paper and gives further work items.
2 Conventional DVC algorithms
Average PSNRs of SI frames in terms of key frame intervals
Key frame interval
3 Proposed DVC for hierarchical GOP structure
Since videos generally include not only fast-motion but also slow-motion performance of DVC, systems can be improved by adaptively modifying the GOP size. When the accuracies of SI frames are quite high, the encoder is not likely to send parity bits, and can reduce the number of key frames. Therefore, the proposed method adaptively modifies hierarchical GOP size based on RD competition, for compression performance of DVC systems.
3.1 Proposed DVC encoder and decoder supporting hierarchical GOP structure
The proposed method sets the initial GOP size (S) and encodes the first frame at t and the last frame at t + S in a GOP range as the key mode. Coded bitstreams of the key frames are sent and reconstructed in the decoder side. The SI frame at t + S/2 that is located at the interposition between two key frames is generated with the key frames, and then MVs are estimated from the SI frame at t + S/2 to the key frames, and the compressed MVs in a lossless mode are sent to the encoder side. As the encoder generates a PSI frame at t + S/2 with the received MVs and the key frames, the PSI can be the same as the SI in the decoder, without high computational complexity. Based on the PSI frame, the proposed encoder assesses the RD costs of key, WZ, and skip modes and selects one of them. The frame at t + S/2 is coded by the selected mode, and its associated data are sent to the decoder side. Hierarchically, the SI frame at t + S/4 (t + 3S/4) is generated with the frame t and the frame at t + S/2 (frame at t + S/2 and frame at t + S).
3.2 SI frame generation and PSI frames compensation
The quality of an SI frame directly impacts on the performance of a DVC system. For the proposed algorithm, it is also important to generate a PSI frame that is the same as the SI frame, for proper decisions of coding modes on the encoder side. In order to make sure that PSI frames are the same as SI frames, SI frames in the proposed algorithm are generated with the existing two-stage algorithm . In the first stage, an initial SI (ISI) frame is estimated with key frames  for a target frame, however; any SI frame generation (SIG) algorithms with a gap-filling algorithm can be employed. At the second stage, the proposed DVC decoder performs motion estimation from the ISI frame to the neighboring key frames, and then the final SI frame is reconstructed with the key frames and the estimated MVs . The motion vectors are sent to the decoder side. Regardless of the first stage motion estimation algorithm, we can guarantee that the SI of the decoder side can be reconstructed with the transmitted MVs and related data at the encoder side, because the motion vectors are defined from the target frame to key frames. In addition, the proposed DVC encoder does not require any hole-filling algorithms and blending of two overlapped blocks. As a result, the SI frame can be generated with minimum computational load. Note that the first stage motion estimation is conducted with a conventional algorithm, based on adaptive search range for DVC .
3.3 RD competition
where λ is a scaling factor. Conventional encoders, such as H.264/AVC, conduct pre-encoding and calculate rates and distortions for multiple modes. Then they select the best coding mode jointly having minimum rate and distortion. Therefore, they require high computational complexity, although they can select the best coding mode. However, since the purpose of the proposed DVC is to encode videos with low computational complexity, conventional methods to compute RD costs are not suitable for the best mode selection in the DVC encoder.
The key feature of DVC codecs is to encode videos with low computational complexity. In the conventional codecs, RD competition generally increases RD performance with high computational complexity. However, conventional DVC encoders do not employ RD competition, due to the computational complexity and non-availability of the reconstructed frames. In the proposed DVC, we employ RD competition for high RD performance with minimum computational complexity. In addition, to reduce encoding computational time, the proposed method determines the candidate modes to conduct RD competition among key, WZ, and skip modes, depending on the coding mode of a previous coded frame. As input frames are hierarchically coded in the proposed method, we can predict which modes are suitable for the consecutive frames, based on the coding mode of the previous frame. In the proposed algorithm, approximate RD costs are computed. To perform RD competition, the proposed method estimates RD costs of the selected candidates depending on the conditions, as shown in the encoder flowchart. However, we need to note that the quality of a frame coded by WZ (skip) mode could be reasonably good, even when the rate is 0. The distortion of the frame is the same as that of the associated SI frame, because SI frames are generated from reference key frames without any explicit data. Therefore, WZ or skip mode is likely to be selected with low rate and high distortion by RD competition. However, a video quality that is too low is not suitable for commercial video applications. For quality control, the best mode is decided not only by RD competition but also by a quality threshold.
When an accurate objective visual quality and its bitrate are known, we can perform accurate RD competition. For the competition, actual encoding and decoding should be conducted at the encoder side. However, DVC-based encoders have the philosophy of low complexity at the encoder side. In the proposed algorithm, the PSI can be reconstructed with the received motion vectors; thus, it is helpful to estimate more accurate objective visual quality. However, we cannot reconstruct the decoded frames at the encoder side due to low complexity constraint. Nevertheless, the proposed algorithm is better than the exiting algorithms with better prediction in estimating approximated RD competition.
3.3.1 Approximate RD cost of key frame mode
where λ K is a scaling factor between rate and distortion.
3.3.2 Approximate RD cost of WZ mode
WZ frames are reconstructed from the SI frame with error correction via a channel decoder. Since channel decoding operation is one of the main sources of computational load, it is not proper to perform the channel decoding on the encoder side for low complexity encoding. Therefore, the proposed method estimates approximate RD costs, by predicting the reconstructed WZ frame with low computation complexity, before actual WZ encoding.
When a frame is selected to encode by WZ mode, the proposed method sends as many parity bits by accounting for the number of DCT blocks and the number of the demanded bit for the block, as shown in Figure 8. Therefore, the proposed method does not need feedback iteration and reduces time delay. If the demanded number of parity bits for error correction is different from the computed value, the performance of the proposed method could decrease. Once a frame is to be coded as the WZ mode, each block is evaluated whether its quality is enough good or not. Parity bits for the well-predicted one with the PSI could be not sent to the decoder side. For other blocks, the proper amount of bits given by Figure 8 is supposed to be sent.
3.3.3 Approximate RD cost of skip mode
Note that λ K is empirically computed with six sequences (‘Akko’, ‘Ballroom’, ‘Exit’, Flamenco2’, ‘Race1’, and ‘Rena’ sequences). The parameter is set to (1,518, 3,824, 9,636, 24,281, and 61,185) as a function of QPs (33, 37, 41, 45, and 49).
4 Experimental results
For performance evaluation of the proposed algorithm, the RD performance of the proposed and conventional algorithms was evaluated. Four test sequences (‘Akko’, ‘Ballroom’, ‘Flamenco2’, and ‘Race1’) were used with the format and size of 4:0:0 YUV and 640 × 480, respectively. Key frames were coded using JM 17.2, and five QP points (33, 37, 41, 45, and 49) were used. Note that the ‘Akko’, ‘Ballroom’, ‘Flamenco2’, and ‘Race1’ sequences consist of 300, 250, 250, and 250 frames, respectively. The conventional algorithm employs every other frame as the key frame, while the number of key frames are determined depending on GOP size. The SI frames are reconstructed based on an adaptive search range  and an LDPCA channel coder with a matrix length of 6,336 .
Estimated errors of rates and PSNRs with the proposed algorithm in terms of GOP sizes
Average rate difference
Average PSNR difference
where S means the time interval for the initial GOP size (=8). α, ε, β, γ, δ, and ξ indicate encoding, decoding, transmission, PSI generation, RD competition, and SI generation times, respectively. Through the experiment for estimation of the delay, the encoding, decoding, transmission, PSI generation, RD competition, and SI generation time are 600, 14, 5, 14, 400, and 3,500 ms, respectively. We found in the experiment that the proposed DVC requires 14,233 ms for a GOP structure. Note that the proposed system was implemented on Intel i5 (2,53 GHz) with 4 GB over Window 7. The proposed feedback-based DVC requires the delay, and it makes the proposed algorithm applied for high frame-rate video applications. However, the proposed algorithm is considered to be a trade-off between no-feedback DVC and iterative feedback DVC algorithms. Note that RD performance of the proposed algorithm is better than the no-feedback algorithms. This evaluation and assessment would be not practical for practical scenarios and conditions. The network delay can vary depending on traffics. In addition, we employed JM reference encoding software and SIG having large computational complexity in the evaluation. Hardwired logics or fast computing platforms can be employed to implement practical applications based on the proposed DVC system. Extensive further research should be performed for practical applications and services in the future.
In this paper, a new adaptive distributed video coder has been proposed, with hierarchical GOP structure. In the proposed algorithm, the PSI can be reconstructed in the encoder, using reference key frames and MVs without motion estimation. Therefore, we can estimate the exact accuracies of SI frames with the PSI frames. With the PSI frames, the proposed method performs RD competition and selects the best coding mode. Based on the decided coding mode, the best GOP structure is automatically decided in the proposed method. As the proposed method reduces the number of key frames when a video has little and/or linear motion, the performance of the proposed method improves. In addition, the proposed method reduces the number of WZ frames, if large motion between consecutive frames occurs, because an SI frame of low accuracy requires many parity bits for error correction. Therefore, the proposed method has higher performance than the several existing methods. However, the proposed method requires high computational complexity, according to the initial GOP size. For further work, we would optimize the encoding, decoding, and SI generation modules, for reduction of the delay.
This research was partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2014R1A2A1A11052210) and the MSIP (Ministry of Science, ICT & Future Planning), Republic of Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2014-H0301-14-1018).
- D Slepian, J Wolf, Noiseless coding of correlated information sources. IEEE Trans Inf Theory 19(4), 471–480 (1973)View ArticleMATHMathSciNetGoogle Scholar
- A Wyner, J Ziv, The rate-distortion function for source coding with side information at the decoder. IEEE Trans Inf Theory 22(1), 1–10 (1976)View ArticleMATHMathSciNetGoogle Scholar
- J Micallef, JR Farrugia, C Debono, Low-density parity-check codes for asymmetric distributed source coding. Paper presented at the 2010 1st IEEE International Conference on Information Theory and Information Security (IEEE, Beijing, China, 2010)Google Scholar
- Q Linbo, H Xiaohai, L Rui, D Xiewei, Application of punctured turbo codes in distributed video coding. Paper presented at the 2007 4th IEEE International Conference on Image and Graphics (IEEE, Sichuan, China, 2007)Google Scholar
- A Aaron, R Zhang, B Girod, Wyner-Ziv coding of motion video. Paper presented at the 2002 37th Asilomar Conference on Signals and Systems (IEEE, Grove, CA, 2002)Google Scholar
- D Varodayan, A Aaron, B Girod, Rate-adaptive codes for distributed source coding. EURASIP Signal Process J Spec Sect Distributed Source Coding 86(11), 3123–3130 (2006)MATHGoogle Scholar
- C Brites, F Pereira, Encoder rate control for transform domain Wyner-Ziv video coding. Paper presented at the 2007 14th IEEE International Conference on Image Processing (IEEE, San Antonio, TX, 2007)Google Scholar
- F Zhai, IJ Fair, Techniques for early stopping and error detection in turbo decoding. IEEE Trans Commun 51(10), 1617–1623 (2003)View ArticleGoogle Scholar
- WJ Chien, LJ Karam, GP Abousleman, Rate-distortion based selective decoding for pixel-domain distributed video coding. Paper presented at the 2008 15th IEEE International Conference on Image Processing (IEEE, San Diego, CA, 2008)Google Scholar
- J Skorupa, J Slowack, S Mys, P Lambert, R Van de Walle, C Grecos, Stopping criterions for turbo coding in a Wyner-Ziv video codec. Paper presented at the 2009 27th IEEE Picture Coding Symposium (IEEE, Chicago, IL, 2009)Google Scholar
- JL Martinez, C Holder, GE Fernandez, H Kalva, F Quiles, DVC using a half-feedback based approach. Paper presented at the 2008 9th IEEE International Conference on Multimedia and Expo (IEEE, Hannover, Germany, 2008)Google Scholar
- B Du, H Shen, Encoder rate control for pixel-domain distributed video coding without feedback channel. Paper presented at the 2009 3rd IEEE International Conference on Multimedia and Ubiquitous Engineering (IEEE, Qingdao, China, 2009)Google Scholar
- C Yaacoub, J Farah, B Pesquet-Popescu, Content adaptive gop size control with feedback channel suppression in distributed video coding. Paper presented at the 2009 16th IEEE International Conference on Image Processing (IEEE, Cairo, Egypt, 2009)Google Scholar
- J Ascenso, C Brites, F Pereira, Content adaptive Wyner-Ziv video coding driven by motion activity. Paper presented at the 2006 13th IEEE International Conference on Image Processing (IEEE, Atlanta, GA, 2006)Google Scholar
- M Morbee, J Prades-Nebot, A Pizurica, W Philips, Rate allocation algorithm for pixel-domain distributed video coding without feedback channel. Paper presented at the 2007 32nd IEEE International Conference on Acoustic, Speech, and Signal Processing (IEEE, Honolulu, HI, 2007)Google Scholar
- J Kubasov, K Lajnef, C Guillemot, A hybrid encoder/decoder rate control for a Wyner-Ziv video codec with a feedback channel. Paper presented at the 2007 9th IEEE Workshop on Multimedia Signal Processing (IEEE, Crete, Greece, 2007)Google Scholar
- WJ Chien, LJ Karam, GP Abousleman, Block-adaptive Wyner-Ziv coding for transform-domain distributed video coding. Paper presented at the 2007 32nd IEEE International Conference on Acoustic, Speech, and Signal Processing (IEEE, Honolulu, HI, 2007)Google Scholar
- L Limin, L Zhen, EJ Delp, Backward channel aware Wyner-Ziv video coding. Paper presented at the 2006 13th IEEE International Conference on Image Processing (IEEE, Atlanta, GA, 2006)Google Scholar
- W Jia, W Xiaolin, Y Songyu, S Jun, New results on multiple descriptions in the Wyner-Ziv setting. IEEE Trans Inf Theory 55(4), 1708–1710 (2009)Google Scholar
- R Liu, Z Yue, C Chen, Side information generation based on hierarchical motion estimation in distributed video coding. J Aeronaut 22(2), 167–173 (2009)View ArticleGoogle Scholar
- Y Shuiming, M Ouaret, F Dufaux, T Ebrahimi, Improved side information generation with iterative decoding and frame interpolation for distributed video coding. Paper presented at the 2008 15th IEEE International Conference on Image Processing (IEEE, San Diego, CA, 2008)Google Scholar
- H Xin, S Forchhammer, Improved side information generation for distributed video coding. Paper presented at the 2008 10th IEEE Workshop on Multimedia Signal Processing (IEEE, Cairns, Australia, 2008)Google Scholar
- KY Min, SN Park, DG Sim, Side information generation using adaptive search range for distributed video coding. Paper presented at the 2009 11th IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE, B.C., Canada, 2009)Google Scholar
- KY Min, SN Park, JH Nam, DG Sim, SH Kim, Distributed video coding based on adaptive block quantization using received motion vectors. KICS J 35(2), 172–181 (2010)Google Scholar
- I Ahmad, Z Ahmad, I Abou-Faycal, Delay-efficient GOP size control algorithm in Wyner-Ziv video coding, Paper presented at the 2009 7th IEEE International Symposium on Signal Processing and Information Technology (IEEE, Ajman, UAE, 2009)Google Scholar
- C Yaacoub, J Farah, B Pesquet-Popescu, New adaptive algorithms for GOP size control with return channel suppression in Wyner-Ziv video coding. Int J Digit Multimedia Broadcasting 2009, 319021 (2009)Google Scholar
- G Huchet, W Demin, Distributed video coding without channel codes. Paper presented at the 2010 3rd IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (IEEE, Shanghai, China, 2010)Google Scholar
- JL Martinez, G Fernandez-Escribano, H Kalva, WARJ Weerakkody, Feedback-free DVC architecture using machine learning. Paper presented at the 2008 15th IEEE International Conference on Image Processing (IEEE, San Diego, CA, 2008)Google Scholar
- X Artigas, J Ascenso, M Dalai, S Klomp, D Kubasov, M Ouaret, The discover codec, architecture, techniques and evaluation. Paper presented at the 2007 IEEE Picture Coding Symposium (IEEE, Lisbon, Portugal, 2007)Google Scholar
- M Jang, JW Kang, and SH Kim, A design of rate-adaptive LDPC codes for distributed source coding using PEG algorithm. Paper presented at the 2010 IEEE Military Communications Conference, San Joes, CA, 31 October-3 November 2010Google Scholar
- SY Shin, M Jang, JW Kang, SH Kim, New distributed source coding scheme based on LDPC codes with source revealing rate-adaptation. Paper presented at the 2011 12th IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE, Victoria, Canada, 2011)Google Scholar
- CK Kim, DY Suh, Channel adaptive rate control for loss resiliency of distributed video coding. Paper presented at the 2010 International Conference on Electronics, Information, and Communication (IEEK, Cebu, Philippine, 2010)Google Scholar
- JA Park, DY Suh, GH Park, Distributed video coding with multiple side information sets. IEICE Trans Inf Syst E93-D(3), 654–657 (2010)View ArticleGoogle Scholar
- JY Lee, CW Seo, DG Sim, JK Han, Efficient ME/MD schemes for Wyner-Ziv codec to VC-1 transcoder. Paper presented at the 2011 International Technical Conference on Circuits/Systems, Computers and Communications (IEEK, Gyeongju, Korea, 2011)Google Scholar
- SY Shim, JK Han, J Bae, Adaptive reconstruction scheme using neighbour pixels in PDWZ coding. Electron Lett 46(9), 626–628 (2010)View ArticleGoogle Scholar
- R Oh, JB Park, BW Jeon, Fast implementation of Wyner-Ziv video codec using GPGPU, in Symposium on IEEE BMSB, 2010, pp. 1–5Google Scholar
- X Van Hoang, BW Jeon, Flexible complexity control solution for transform domain Wyner-Ziv video coding. IEEE Trans Broadcasting 58(2), 209–220 (2012)View ArticleGoogle Scholar
- KY Min, DG Sim, Adaptive distributed video coding with motion vectors through a back channel. EURASIP J Image Video Process 22, 1–12 (2013)MATHGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.