Joint redundant motion vector and intra macroblock refreshment for video transmission
 Jimin Xiao^{1, 2},
 Tammam Tillo^{2}Email author,
 Chunyu Lin^{3} and
 Yao Zhao^{3}
DOI: 10.1186/16875281201112
© Xiao et al; licensee Springer. 2011
Received: 7 January 2011
Accepted: 30 September 2011
Published: 30 September 2011
Abstract
This paper proposes a scheme for errorresilient transmission of videos which jointly uses intra macroblock refreshment and redundant motion vector. The selection of using intra refreshment or redundant motion vector is determined by the ratedistortion optimization procedure. The endtoend distortion is used for the ratedistortion optimization, which can be easily calculated with the recursive optimal perpixel estimate (ROPE) method. Simulation results show that the proposed method outperforms both the intra refreshment approach and redundant motion vector approach significantly, when the two approaches are deployed separately. Specifically, for the Foreman sequence, the average PSNR of the proposed approach can be 1.12 dB higher than that of the intra refreshment approach and 5 dB higher than that of the redundant motion vector approach.
Keywords
H.264/AVC error resilience endtoend distortion intra refreshment redundant motion vector1. Introduction
The H.264/AVC [1] video coding standard provides higher coding efficiency and stronger network adaptation capability in comparison to all the previously developed video coding standards. However, as previous video compression standards, it is still based on a hybrid coding method, which use transform coding with motioncompensated prediction (MCP). As a result, when transmitting the hybridcoded video in packet loss environments, it suffers from error propagations and this leads to the wellknown drifting phenomenon [2, 3].
Due to the unreliable underlying networks, the development of errorresilient video coding techniques are a crucial requirement for video communications over lossy networks. Among all the errorresilient video coding techniques, two categories of robust coding approaches are promising. One category is based on intra macroblock refreshment, and another one is redundant coding.
The intra macroblock refreshment approach is standard compatible, and it is an useful tool to combat network packet losses. It can be employed to weaken the inter picture dependency due to inter prediction, and eventually, cutoff the error propagations. The early intra macroblock refreshment algorithms are based on randomly inserting intra macroblocks [4] or periodically inserting intra contiguous macroblocks [5]. However, in both [4] and [5] the intra refresh frequency is determined in a heuristic way, and it is costly to code an entire picture by intra coding. So the tradeoff between code efficiency and error resiliency need to be balanced. Zhang et al. first treated this problem as optimization of coding mode selection for each macroblock in [6], and proposed the wellknown recursive optimal perpixel estimate (ROPE) approach to determine intra macroblock. In [6] the expected endtoend distortion for each pixel is calculated in a recursive way, then in the mode selection step, the expected endtoend distortion is used in the ratedistortion optimization process. In [7], another flexible intra macroblock update algorithm was investigated to optimize the expected ratedistortion performance. In this approach, the endtoend distortion is calculated by emulating the real channel behaviors, therefore, the computation complexity is tremendous. Among the methods to get the expected endtoend distortion, [6] is pixelbased, another blockbased approach [8] generates and recursively updates a blocklevel distortion map for each frame. The work in [6–8] are lossaware endtoend ratedistortion optimized intra macroblock refreshment algorithm, which are currently the best known way for determining both the correct number and placement of intra macroblocks for error resilience.
Redundant coding is another effective tool for robust video communications over lossy networks. In [9], an optimal algorithm is presented to determined whether one picture needs redundant version. In [10], redundant slices are optimally allocated based on the slice position in the group of pictures (GOP), and the primary and redundant slices are then interleaved to generate two equal importance descriptions of the same data using the multiple description coding (MDC) paradigm. Whereas in [11], the two descriptions are generated by splitting the video pictures into two threads, and then redundant pictures are periodically inserted into the two threads. In both [9] and [11] redundant coding are optimized in frame level, namely all the macroblocks in one frame is encoded with the same redundant coding parameters. Whereas for [10], redundant information is allocated in slice level. In all the three approaches, redundant bitrate is allocated to both motion vectors and residual information. In [12] a new approach with only redundant motion vectors is proposed, as the redundant bitrate for motion vector is low, this approach improves the bandwidth utilization with limited primary picture quality degradation. In [13] a significant motion vector protection (SMVP) scheme for errorresilient transmission of videos is proposed. This scheme shows how to determine the significant motion vectors (SMVs) and how much rate should be dedicated to SMVs. The idea behind this scheme is to give more protection to SMVs.
Intra macroblock refreshment can stop errors in the previous frames, while redundant coding is a way of preventing and minimizing propagated errors in the future frames. Motivated by the two approaches, in this paper, we propose an innovative approach that jointly uses intra macroblock refreshment and redundant motion vector. For each macroblock, intra coding or redundant motion vector is chosen based on the ratedistortion optimization procedure. The lossaware endtoend expected distortion is used for this RD optimization, and the endtoend distortion is calculated with the ROPE [6] method.
The rest of the paper is organized as follows. In Section 2, the ROPE method is presented as preliminary, as it is the method we adopt to calculate the endtoend distortion, In Section 3, the proposed joint redundant motion vector and intra macroblock refreshment (JRVIR) approach is introduced. In Section 4, extensive simulation results are given, which validate our approach. Finally, some conclusions are drawn in Section 5.
2. Preliminary and the rope approach
where λ_{mode} is the Lagrange multiplier, D_{MB} and R_{MB} are the encoding distortion and the bitrate in different encoding modes, respectively. This optimization mode is tailored for errorfree environments, and no packet loss is considered here.
However, when the compressed videos are transmitted over errorprone networks, traditional schemes cannot adaptively insert intra refresh macroblocks to efficiently stop the channel error propagations. The ROPE approach uses the endtoend distortion in the RD optimization, which takes into account the channel packet losses. With the ROPE approach, intra macroblocks are optimally used to stop error propagations, and it is defined as follows.
The overall expected meansquarederror (MSE) distortion of a pixel is ${d}_{n}^{i}$, obviously, it is determined by the first and second moments of the decoder reconstruction. ROPE provides an optimal recursive algorithm to accurately calculate the two moments for each pixel in a frame.
Let us assume that packet loss events are independent for simplicity, and the packet loss rate (PLR) p is available at the encoder side. To make it more general, there is no limitation on the slice shape and size, so the motion vectors from neighboring macroblocks are not always available in the error concealment stage. Therefore, the decoder may not be able to use motion vector from neighboring macroblocks for concealment. Accordingly, we assume the decoder copies reconstructed pixels from the previous frame for concealment. The prediction at the encoder only employs the previous reconstructed frame. The recursive formulas of ROPE are as follows.

Pixel in the intra macroblock$E\left\{{\stackrel{\u0303}{f}}_{n}^{i}\right\}=\left(1p\right){\widehat{f}}_{n}^{i}+pE\left\{{\stackrel{\u0303}{f}}_{n1}^{i}\right\}$(4)$E\left\{{\left({\stackrel{\u0303}{f}}_{n}^{i}\right)}^{2}\right\}=\left(1p\right){\left({\widehat{f}}_{n}^{i}\right)}^{2}+pE\left\{{\left({\stackrel{\u0303}{f}}_{n1}^{i}\right)}^{2}\right\}$(5)

Pixel in the inter macroblock$E\left\{{\stackrel{\u0303}{f}}_{n}^{i}\right\}=\left(1p\right)\left({\xea}_{n}^{i}+E\left\{{\stackrel{\u0303}{f}}_{n1}^{i+mv}\right\}\right)+pE\left\{{\stackrel{\u0303}{f}}_{n1}^{i}\right\}$(6)$\begin{array}{ll}\hfill E\left\{{\left({\stackrel{\u0303}{f}}_{n}^{i}\right)}^{2}\right\}& =\left(1p\right)\left({\left({\xea}_{n}^{i}\right)}^{2}+2{\xea}_{n}^{i}E\left\{{\stackrel{\u0303}{f}}_{n1}^{i+mv}\right\}+E\left\{{\left({\stackrel{\u0303}{f}}_{n1}^{i+mv}\right)}^{2}\right\}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{1em}{0ex}}+pE\left\{{\left({\stackrel{\u0303}{f}}_{n1}^{i}\right)}^{2}\right\}\phantom{\rule{2em}{0ex}}\end{array}$(7)
where inter coded pixel i is predicted from pixel i+mv in the previous frame. The prediction residual ${e}_{n}^{i}$ is quantized to ${\xea}_{n}^{i}$.
It is important to notice that in order to make it simple, we apply ROPE in its simple setting, where the motion estimation is evaluated at pixel level accuracy, and we use constrained intra prediction, so there are no error propagations in the intra prediction. Recent advances in ROPE further expand its capability to accommodate subpixel prediction [15], bursty packet loss [16]. But they are not incorporated here so as to avoid diluting the focus. In the ROPE approach, the endtoend distortion is only used in the mode selection stage. However, recently, in [17, 18] endtoend distortion is applied in the motion estimation and motion prediction stage, which is socalled lossaware motion estimation and lossaware motion prediction. With this extension the errorresilience capability of ROPE is improved further. The lossaware motion estimation and lossaware motion prediction is not used in our approach, because we extend ROPE in different direction, in fact, the gain can be accumulated if both the lossaware motion estimation and lossaware motion prediction are applied.
3. The proposed JRVIR approach
In general, intra coding is more expensive in terms of rate requirement with respect to redundant motion vector, therefore for the macroblocks with smooth texture and/or macroblocks with slow and translational movements, providing redundant motion vector would lead to better resource utilization, i.e., bitrate, with respect to the intra coding. Whether to encode one macroblock with intra mode, inter mode with redundant motion vector or without motion vector is determined by our JRVIR ratedistortion optimization process.
3.1 The JRVIR ratedistortion optimization
where D_{MB}(o) is the expected endtoend distortion for mode o, R_{MB}(o) is the rate for this mode and λ_{mode} is the Lagrangian multiplier. Γ_{JRVIR} is a set of encoding options which includes all encoding modes. For the original ROPE approach, the available encoding modes includes Intra mode, SKIP mode and Inter mode, so Γ_{ROPE} = {Intra, SKIP, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8}. However, in our JRVIR approach, there are five new modes, they are SKIP, Inter16 × 16, Inter16 × 8, Inter8 × 16 and Inter8 × 8, all with redundant motion vector. For simplicity, let us use Skip_dup, Inter_dup16 × 16, Inter_dup16 × 8, Inter_dup8 × 16, Inter_dup8 × 8 to denote the five new modes, with dup standing for duplicating motion vector. Therefore, for the JRVIR approach, the set of encoding options becomes Γ_{JRVIR} = {Intra, SKIP, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8, Skip_dup, Inter_dup16 × 16, Inter_dup16 × 8, Inter_dup8 × 16, Inter_dup8 × 8}.
3.2 The JRVIR endtoend distortion
For those inter macroblocks with redundant motion vector, the probability of receiving the primary information is 1  p. The probability of receiving the redundant motion vector while losing the primary information is p(1  p), and the probability of both the primary information and the redundant motion vector get lost is p^{2}. With all those probabilities, we can easily get Equations 9 and 10 for inter macroblock with redundant motion vector.
3.3 The JRVIR rate
In the RD optimization procedure, the rate of the redundant motion vector should be taken into account. For those redundant motion vectors, encoding them without exploiting the correlation among them can cost a significant number of bits. Motion vectors for neighboring macroblocks are often highly correlated and so each motion vector is predicted from vectors of nearby, and previously coded macroblocks. Therefore, the motion vector encoding procedure in H.264 standard [1], which includes motion vector prediction, is adopted to encode the redundant motion vector to reduce bits. However, it is worth noticing that, in our JRVIR approach we do not provide redundant motion vector for all the inter macroblocks. For example, in Figure 1 only three macroblocks have redundant motion vector, when encoding the redundant motion vector for the macroblock in row 3, there are no motion vectors to predict from, because its up and left macroblocks do not have redundant motion vectors. As a result, the performance is compromised. In future, more sophisticated prediction method will be investigated, for example, the redundant motion vector would be predicted from the concealment emulated motion vector.
3.4 Lagrange multiplier selection
where QP is the quantization parameter.
3.5 The pseudo code of JRVIR algorithm
The whole mode selection process of the proposed JRVIR approach is described in Algorithm 1. It is important to note that, in the proposed JRVIR approach, the endtoend distortion is used in the ratedistortion optimization process, and five new encoding modes are adopted. Upon the optimal encoding mode is selected, the first and second moments for all the pixels in current macroblock are recorded based on the selected encoding mode, and those values will be recursively used in the ratedistortion optimization process of next frame. At the decoder side, if the primary slice is available, the redundant motion vector will be discarded, whereas when the primary slice is lost while the redundant motion vector is available, the motion vector will be used to conceal the lost region by copying into the lost macroblock the region indicated by the redundant motion vector. In general, with the correct motion vector, the concealed pixels will be much more accurate than those generated by Temporal Replacement (TR), which copies the pixels from the same positions in the previous frame.
4. Simulation result
Our simulation setting builds on the JM14.0 H.264 codec [19], with constrained intra prediction and CABAC entropy coding used. Rate control mechanism in the JM codec is used with one common quantization scale to all the macroblocks of one row. Pixel level accuracy motion estimation and prediction is used. Each slice contains one row of macroblocks (22 macroblocks for the CIF video sequences) for both primary and redundant frames, and one slice per network packet is adopted, therefore the term packet and slice are used interchangeably. The IPPP. . . GOP structure is used, and it is assumed that the Iframe is transmitted over secure channel. A random packetloss generator is used to drop the packets according to the required packet loss rate, except the burst packetloss is specified explicitly. The luminance PSNR (YPSNR) is averaged over 200 trials to get statistical meaningful results. To evaluate the proposed JRVIR approach, we use conventional ROPE [6] and redundant motion vector (RMV) [12] as benchmark.
Algorithm 1 The whole algorithm of mode selection in JRVIR
RD_cost ⇐ ∞
best_mode ⇐ ∞
for for each mode o ∈ Γ_{JRVIR}do
if o ∈ {INTRA} then
calculate D_{MB} using Equations 25
calculate R_{MB} using Equation 11
else
if o ∈ {Skip, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8} then
calculate D_{MB} using Equations 2, 3, 6 and 7
calculate R_{MB} using Equation 11
else
if o ∈ {Skip_dup, Inter_dup16 × 16, Inter_dup16 × 8, Inter_dup8 × 16, Inter_dup8 × 8}
then
calculate D_{MB} using Equations 2, 3, 9 and 10
calculate R_{MB} using Equation 12
end if
end if
end if
calculate J_{MB} using Equations 1 and 13
if J_{MB}< RD_cost then
RD_cost = J_{MB}
best_mode = o
record the value of $E\left\{{\stackrel{\u0303}{f}}_{n}^{i}\right\}$ and $E\left\{{\left({\stackrel{\u0303}{f}}_{n}^{i}\right)}^{2}\right\}$
end if
end for
Video quality of joint redundant motion vector and intra macroblock refreshment (JRVIR), redundant motion vector (RMV) and recursive optimal perpixel estimate (ROPE) for different bitrate and packet loss rate
Sequence  Rate (Kbps)  Method  Packet loss rate (PLR)  

5%  10%  15%  20%  
RMV  30.89dB  29.70dB  28.71dB  27.70dB  
News  256  ROPE  32.27dB (2.68%)  31.42dB (3.56%)  30.78dB (4.15%)  30.20dB (4.62%) 
JRVIR  32.29dB ( 1.54, 2.52)%  31.70dB ( 1.82, 3.90)%  31.18dB ( 2.11, 4.41)%  30.73dB ( 2.46, 4.94)%  
RMV  30.75dB  29.00dB  28.22dB  27.52dB  
Silent  384  ROPE  33.56dB (4.70%)  32.75dB (5.96%)  32.10dB (6.76%)  31.60dB (7.29%) 
JRVIR  33.76dB ( 2.93, 3.49)%  33.05dB ( 3.43, 5.08)%  32.45dB ( 3.95, 5.76)%  31.96dB ( 4.54, 6.04)%  
RMV  29.32dB  27.42dB  25.89dB  24.56dB  
Foreman  512  ROPE  31.48dB (6.78%)  30.29dB (9.40%)  29.42dB (11.32%)  28.66dB (13.06%) 
JRVIR  32.03dB ( 3.84, 18.94)%  31.16dB ( 4.84, 25.40)%  30.32dB ( 5.54, 27.52)%  29.56dB ( 6.90, 27.97)%  
RMV  34.73dB  32.54dB  30.07dB  29.48dB  
Highway  1024  ROPE  37.71dB (11.68%)  36.64dB (15.09%)  35.76dB (16.52%)  35.01dB (18.10%) 
JRVIR  38.06dB ( 7.10, 9.87)%  37.20dB ( 8.80, 12.34)%  36.50dB ( 10.25, 13.85)%  35.74dB ( 11.70, 13.67)%  
RMV  25.02dB  21.81dB  19.69dB  18.22dB  
Stefan  2048  ROPE  28.31dB (15.27%)  26.63dB (19.53%)  25.49dB (22.13%)  24.60dB (23.65%) 
JRVIR  29.54dB ( 6.38, 19.85)%  27.50dB ( 9.59, 18.81)%  26.09dB ( 12.86, 16.82)%  24.99dB ( 15.24, 14.71)% 
The time spent on encoding 30 frames for various video sequences and bitrates, for joint redundant motion vector and intra macroblock refreshment (JRVIR) and JM software, we assume the packet loss rate is 10%
Sequence  Bitrate (Kbps)  JRVIR (s)  JM 14.0 (s) 

News  256  41.19  40.79 
Silent  384  40.51  39.35 
Foreman  512  42.86  40.97 
Highway  1024  42.61  41.63 
Stefan  2048  42.25  40.87 
5. Conclusions
In this paper, a novel joint redundant motion vector and intra macroblock refreshment approach has been proposed to combat packet loss. Besides the traditional skip, inter and intra mode, we add a set of new modes, which are inter coding modes with redundant motion vector. Given the packet loss rate and the channel bitrate, the reconstructed distortion at the decoder side and the total bitrate for each mode are estimated at the encoder during the mode selection process. Based on the estimated endtoend RD cost, the optimal encoding mode is selected. Equipped with the two tools, namely intra macroblock refreshment and redundant motion vector, extensive experimental results show that the proposed approach outperforms other errorresilient approaches. Our future work would be investigating more sophisticated prediction method to compress the redundant motion vector. In addition, in this paper, we use pixel accuracy motion estimation and motion prediction, extending the current approach to quarterpixel accuracy motion estimation and motion prediction would also be promising.
Declarations
Acknowledgements
This work was supported by National Natural Science Foundation of China (No. 60972085, No. 60903066), the SinoSingapore JRP (No. 2010DFA11010) and National Science Foundation of China for Distinguished Young Scholars (No. 61025013).
Authors’ Affiliations
References
 Draft ITUT Recommendation and Final Draft International Standard of Joint Video Specification (ITUT Rec. H.264 ISO/IEC 1449610 AVC), document JVTG050.doc, Joint Video Team (JVT) of ISO/IEC MPEG and ITUT VCEG 2003.Google Scholar
 Wenger S: H.264/AVC over IP. IEEE Trans Circ Syst Video Technol 2003, 13(7):645656. 10.1109/TCSVT.2003.814966View ArticleGoogle Scholar
 Stockhammer T, Hannuksela MM, Wiegand T: H.264/AVC in wireless environments. IEEE Trans IEEE Trans Circ Syst Video Technol 2003, 13(7):657673. 10.1109/TCSVT.2003.815167View ArticleGoogle Scholar
 Côté G, Kossentini F: Optimal intra coding of blocks for robust video communication over the internet. Signal Process Image Commun 1999, 15(12):2534. 10.1016/S09235965(99)000223View ArticleGoogle Scholar
 Zhu QF, Kerofsky L: Joint source coding, transport processing and error concealment for H.323based packet video. In Proc SPIE VCIP'99. Volume 3653. San Jose, CA; 1999:5262.Google Scholar
 Zhang R, Regunathan SL, Rose K: Video coding with optimal inter/intramode switching for packet loss resilience. IEEE J Selected Areas Commun 2000, 18(6):966976. 10.1109/49.848250View ArticleGoogle Scholar
 Stockhammer T, Kontopodis D, Wiegand T: Ratedistortion optimization for JVT/H.26L coding in packet loss environment. In Paper presented at the Packet Video Workshop 2002. Pittsburgh, PA, April; 2002.Google Scholar
 Zhang Y, Gao W, Lu Y, Huang Q, Zhao D: Joint sourcechannel ratedistortion optimization for H.264 video coding over errorprone networks. IEEE Trans Multimedia 2007, 9(3):445454.View ArticleGoogle Scholar
 Zhu CB, Wang YK, Hannuksela MM, Li HQ: Error resilient video coding using redundant pictures. IEEE Trans Circ Syst Video Technol 2009, 19(1):314.View ArticleGoogle Scholar
 Tillo T, Grangetto M, Olmo M: Redundant slice optimal allocation for H.264 multiple description coding. IEEE Trans Circ Syst Video Technol 2008, 18(1):5970.View ArticleGoogle Scholar
 Radulovic I, Frossard P, Wang YK, Hannuksela M, Hallapuro A: Multiple description video coding with H.264/AVC redundant pictures. IEEE Trans Circ Syst Video Technol 2010, 20(1):144148.View ArticleGoogle Scholar
 Dissanayake MB, Hewage CTER, Worrall ST, Fernando WAC, Kondoz AM: Redundant motion vectors for improved error resilience in H.264/AVC coded video. Proc IEEE ICME 2008, 2528.Google Scholar
 Chen JR, Lu CS, Fan KC: A significant motion vector protectionbased errorresilient scheme in H.264. Proc IEEE 6th Workshop on Multimedia Signal Processing 2004, 287290.Google Scholar
 Sullivan GJ, Wiegand T: Ratedistortion optimization for video compression. IEEE Signal Process Mag 1998, 15(6):7490. 10.1109/79.733497View ArticleGoogle Scholar
 Yang H, Rose K: Advances in recursive perpixel endtoend distortio estimation for robust video coding in H.264/AVC. IEEE Trans Circ Syst Video Technol 2007, 17(7):845856.View ArticleGoogle Scholar
 Heng BA, Apostolopoulos JG, Lim JS: Endtoend ratedistortion optimized mode selection for multiple description video coding. Proc IEEE ICASSP 2005, 5: 905908.Google Scholar
 Wan S, Izquierdo E: Ratedistortion optimized motioncompensated prediction for packet loss resilient video coding. IEEE Trans Image Process 2007, 16(5):13271338.View ArticleMathSciNetGoogle Scholar
 Yang H, Rose K: Optimizing motion compensated prediction for error resilient video coding. IEEE Trans Circ Syst Video Technol 2010, 19(1):108118.MathSciNetGoogle Scholar
 Available online at[http://iphome.hhi.de/suehring/tml/download]
 Schulzrinne H, Casner S, Frederick R, Jacobson V: RTP: A transport protocol for realtime applications. Internet Engineering Task Force  RFC 1889 1996.Google Scholar
 Loguinov D, Radha H: Endtoend internet video traffic dynamics: Statistical study and analysis. Proc of IEEE INFOCOM '02 2002, 723732.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.