Distributed Video Coding: Trends and Perspectives

Dufaux, Frederic; Gao, Wen; Tubaro, Stefano; Vetro, Anthony

doi:10.1155/2009/508167

Review Article
Open access
Published: 11 April 2010

Distributed Video Coding: Trends and Perspectives

Frederic Dufaux¹,
Wen Gao²,
Stefano Tubaro³ &
…
Anthony Vetro⁴

EURASIP Journal on Image and Video Processing volume 2009, Article number: 508167 (2010) Cite this article

6258 Accesses
39 Citations
3 Altmetric
Metrics details

Abstract

This paper surveys recent trends and perspectives in distributed video coding. More specifically, the status and potential benefits of distributed video coding in terms of coding efficiency, complexity, error resilience, and scalability are reviewed. Multiview video and applications beyond coding are also considered. In addition, recent contributions in these areas, more thoroughly explored in the papers of the present special issue, are also described.

1. Introduction

Tremendous advances in computer and communication technologies have led to a proliferation of digital media content and the successful deployment of new products and services. However, digital video is still demanding in terms of processing power and bandwidth. Therefore, this digital revolution has only been possible thanks to the rapid and remarkable progress in video coding technologies. Additionally, standardization efforts in MPEG and ITU-T have played a key role in order to ensure the interoperability and durability of video systems as well as to achieve economy of scale.

For the last two decades, most developments have been based on the two principles of predictive and transform coding. The resulting motion-compensated block-based Discrete Cosine Transform (DCT) hybrid design has been adopted by all MPEG and ITU-T video coding standards to this day. This pathway has culminated with the state-of-the-art H.264/Advanced Video Coding (AVC) standard [1]. H.264/AVC relies on an extensive analysis at the encoder in order to better represent the video signal and thus to achieve a more efficient coding. Among many innovations, it features a transform which allows a better representation of the video signals thanks to localized adaptation. It also supports spatial intraprediction on top of inter prediction. Enhanced inter prediction features include the use of multiple reference frames, variable block-size motion compensation, and quarter-pixel precision.

The above design, which implies complex encoders and lightweight decoders, is well suited for broadcasting-like applications, where a single sender is transmitting data to many receivers. In contrast to this downstream model, a growing number of emerging applications, such as low-power sensor networks, wireless video surveillance cameras, and mobile communication devices, are rather relying on an upstream model. In this case, many clients, often mobile, low-power, and with limited computing resources, are transmitting data to a central server. In the context of this upstream model, it is usually advantageous to have lightweight encoding with high compression efficiency and resilience to transmission errors. Thanks to the improved performance and reducing cost of cameras, another trend is towards multiview systems where a dense network of cameras captures many correlated views of the same scene.

More recently, a new coding paradigm, referred to as Distributed Source Coding (DSC), has emerged based on two Information Theory theorems from the seventies: Slepian-Wolf (SW) [2] and Wyner-Ziv (WZ) [3]. Basically, the SW theorem states that for lossless coding of two or more correlated sources, the optimal rate achieved when performing joint encoding and decoding (i.e., conventional predictive coding) can theoretically be reached by doing separate encoding and joint decoding (i.e., distributed coding). The WZ theorem shows that this result still holds for lossy coding under the assumptions that the sources are jointly Gaussian and a Mean Square Error (MSE) distortion measure is used. Distributed Video Coding (DVC) applies this paradigm to video coding. In particular, DVC relies on a new statistical framework, instead of the deterministic approach of conventional coding techniques such as MPEG and ITU-T schemes. By exploiting this result, the first practical DVC schemes have been proposed in [4, 5]. Following these seminal works, DVC has raised a lot of interests in the last few years, as evidenced by the very large amount of publications on this topic in major conferences and journals. Recent overviews are presented in [6, 7].

DVC offers a number of potential advantages which make it well suited for the aforementioned emerging upstream applications. First, it allows for a flexible partitioning of the complexity between the encoder and decoder. Furthermore, due to its intrinsic joint source-channel coding framework, DVC is robust to channel errors. Because it does not rely on a prediction loop, DVC provides codec independent scalability. Finally, DVC is well suited for multiview coding by exploiting correlation between views without requiring communications between the cameras, which may be an important architectural advantage. However, in this case, an important issue is how to generate the joint statistical model describing the multiple views.

In this paper, we offer a survey of recent trends and perspectives in distributed video coding. More specifically, we address some open issues such as coding efficiency, complexity, error resilience, scalability, multiview coding, and applications beyond coding. In addition, we also introduce recent contributions in these areas provided by the papers of this special issue.

2. Background

The foundations of DVC are traced back to the seventies. The SW theorem [2] establishes some lower bounds on the achievable rates for the lossless coding of two or more correlated sources. More specifically, let us consider two statistically dependent random signals and . In conventional coding, the two signals are jointly encoded and it is well known that the lower bound for the rate is given by the joint entropy . Conversely, with distributed coding, these two signals are independently encoded but jointly decoded. In this case, the SW theorem proves that the minimum rate is still with a residual error probability which tends towards 0 for long sequences. Figure 1 illustrates the achievable rate region. In other words, SW coding allows the same coding efficiency to be asymptotically attained. However, in practice, finite block lengths have to be used. In this case, SW coding entails a coding efficiency loss compared to lossless source coding, and the loss can be sizeable depending on the block length and the source statistics [8].

Subsequently, Wyner and Ziv (WZ) extended the Slepian-Wolf theorem by characterizing the achievable rate-distortion region for lossy coding with Side Information (SI). More specifically, WZ showed that there is no rate loss with respect to joint encoding and decoding of the two sources, under the assumptions that the sources are jointly Gaussian and an MSE distortion measure is used [3]. This result has been shown to remain valid as long as the innovation between X and Y is Gaussian [9].

2.1. PRISM Architecture

PRISM (Power-efficient, Robust, hIgh compression Syndrome-based Multimedia coding) is one of the early practical implementations of DVC [4, 10]. This architecture is shown in Figure 2. For a more detailed description of PRISM, the reader is referred to [10]. More specifically, each frame is split into blocks which are DCT transformed. Concurrently, a zero-motion block difference is used to estimate their temporal correlation level. This information is used to classify blocks into 16 encoding classes. One class corresponds to blocks with very low correlation which are encoded using conventional Intracoding. Another class is made of blocks which have very high correlation and are merely signaled as skipped. Finally, the remaining blocks are encoded based on distributed coding principles. More precisely, syndrome bits are computed from the least significant bits of the transform coefficients, where the number of least significant bits depends on the estimated correlation level. The lower part of the least significant bit planes is entropy coded with a (run, depth, path, last) 4-tuple alphabet. The upper part of the least significant bit planes is coded using a coset channel code. For this purpose, a BCH code is used, as it performs well even with small block-lengths. Conversely, the most significant bits are assumed to be inferred from the block predictor or SI. In parallel, a 16-bit Cyclic Redundancy Check (CRC) is also computed. At the decoder, the syndrome bits are then used to correct predictors, which are generated using different motion vectors. The CRC is used to confirm whether the decoding is successful.

2.2. Stanford Architecture

Proposed at the same time as PRISM, another early DVC architecture has been introduced in [5, 11]. A block diagram of this architecture is illustrated in Figure 3, whereas a more detailed description is given in [11]. The video sequence is first divided into Group Of Pictures (GOPs). The first frame of each GOP, also referred to as key frame, is encoded using a conventional intraframe coding technique such as H.264/AVC in intraframe mode [1]. The remaining frames in a GOP are encoded using distributed coding principles and are referred to as WZ frames. In a pixel-domain WZ version, the WZ frames first undergo quantization. Alternatively, in a transform-domain version [12], a DCT transform is applied prior to quantization. The quantized values are then split into bitplanes which go through a Turbo encoder. At the decoder, SI approximating the WZ frames is generated by motion-compensated interpolation or extrapolation of previously decoded frames. The SI is used in the turbo decoder, along with the parity bits of the WZ frames requested via a feedback channel, in order to reconstruct the bitplanes, and subsequently the decoded video sequence. In [13], rate-compatible Low-Density Parity-Check Accumulate (LDPCA) codes, which better approach the communication channels capacity, replace the Turbo codes.

2.3. Comparison

The two above architectures differ in a number of fundamental ways, as we will discuss hereafter. A more comprehensive analysis is also given in [14].

The block-based nature of PRISM allows for a better local adaptation of the coding mode in order to cope with the nonstationary statistics typical of video data. By performing simple interframe prediction for block classification based on correlation at the encoder, the WZ coding mode is only used when appropriate, namely, when the correlation is sufficient. However, this block partitioning implies a short block-length which is a limiting factor for efficient channel coding. For this reason, a BCH code is used in PRISM. In contrast, in the frame-based Stanford approach, a frame is WZ encoded in its whole. Nevertheless, this enables the successful usage of more sophisticated channel codes, such as Turbo or LDPC codes.

The way motion estimation is performed constitutes another important fundamental distinction. In the Stanford architecture, motion estimation is performed prior to WZ decoding, using only information directly available at the decoder. Conversely, in PRISM, motion vectors are estimated during the WZ decoding process. In addition, this process is helped by the transmitted CRC check. Hence, it leads to better performance and robustness to transmission errors.

In the Stanford approach, rate control is performed at the decoder side and a feedback channel is needed. Hence, the SW rate can be better matched to the realization of the source and SI. However, the technique is limited to real-time scenarios without too stringent delay constraints. As in PRISM rate control is carried out at the encoder, the latter does not have this restriction. However, in this codec, the SW rate has to be determined based on a priori classification at the encoder, which may result in decreased performance.

Note that some of these shortcomings have been addressed in subsequent research works. For instance, the Stanford architecture has been augmented with hash codes transmitted to enhance motion compensation in [15], a block-based Intracoding mode in [16], and an encoder-driven rate control in order to eliminate the feedback channel in [17].

2.4. State-of-the-Art Performance

The codec developed by the European project DISCOVER, presented in [18], is one of the best performing DVC schemes reported in the literature to date. A thorough performance benchmark of this codec is publicly available in [19]. The DISCOVER codec is based on the Stanford architecture [5, 11] and brings several improvements. It uses the same DCT-like transform as in H.264/AVC. Notably, SI is obtained by motion compensated interpolation with motion vectors smoothing resulting in enhanced performance. Moreover, the issue of online parameter estimation is tackled, including rate estimation, virtual channel model and soft input calculation, and decoder success/failure.

In [19], the coding efficiency of the DISCOVER DVC scheme is compared to two variants of H.264/AVC with low encoding complexity: H.264/AVC Intra (i.e., all the frames are Intra coded) and H.264/AVC No Motion (i.e., interframe coding with zero motion vectors). It can be observed that DVC consistently matches or outperforms H.264/AVC Intra, except for scenes with complex motion (e.g., the test sequence "Soccer"). For scenes with low motion (e.g., the test sequence "Hall Monitor"), the gain can reach up to 3 dB.

More recently, the performance of the DVC codec developed by the European project VISNET II has been thoroughly assessed [20]. This codec is also based on the Stanford architecture [5, 11]. It makes use of some of the same techniques as in the DISCOVER codec and includes a number of enhancements including better SI generation, an iterative reconstruction process, and a deblocking filter. In [20], it is shown that the VISNET II DVC codec consistently outperforms the DISCOVER scheme. For low-motion scenes, gains up to 5 dB are reported over H.264/AVC Intra. On the other hand, when compared to H.264/AVC No Motion, the performance of the VISNET II DVC codec typically remains significantly lower. However, DVC shows strong performance for scenes with simple and regular global motion (e.g., "Coastguard"), where it outperforms H.264/AVC No Motion.

In terms of complexity, [19] shows that the DVC encoding complexity, expressed in terms of software execution time, is significantly lower than for H.264/AVC Intra and H.264/AVC No Motion.

3. Current Topics of Interest

The DVC paradigm offers a number of major differentiations when compared to conventional coding. First, it is based on a statistical framework. As it does not rely on joint encoding, the content analysis can be performed at the decoder side. In particular, DVC does not need a temporal prediction loop characteristic of past MPEG and ITU-T schemes. As a consequence, the computational complexity can be flexibly distributed between the encoder and the decoder, and in particular, it allows encoding with very low complexity. According to information theory, this can be achieved without loss of coding performance compared to conventional coding, in an asymptotical sense and for long sequences. However, coding efficiency remains a challenging issue for DVC despite considerable improvements over the last few years.

Most of the literature on distributed video coding has addressed the problem of light encoding complexity, by shifting the computationally intensive task of motion estimation from the encoder to the decoder. Given its properties, DVC also offers other advantages and functionalities. The absence of the prediction loop prevents drifts in the presence of transmission errors. Along with the built-in joint source-channel coding structure, it implies that DVC has improved error resilience. Moreover, given the absence of the prediction loop, DVC is also enabling codec independent scalability. Namely, a DVC enhancement layer can be used to augment a base layer which becomes the SI. DVC is also well suited for camera sensor networks, where the correlation across multiple views can be exploited at the decoder, without communications between the cameras. Finally, the DSC principles have been useful beyond coding applications. For instance, DSC can be used for data authentication, tampering localization, and secure biometrics.

In the following sections, we address each of these topics and review some recent results as well as the contributions of the papers in this special issue.

3.1. Coding Efficiency

To be competitive with conventional schemes in terms of coding efficiency has proved very challenging. Therefore, significant efforts have focused on further improving the compression performance in DVC. As reported in Section 2.4, the best DVC codecs now consistently outperform H.264/AVC Intracoding, except for scenes with complex motion. In some cases, for example, video sequences with simple motion structure, DVC can even top H.264/AVC No Motion. Nevertheless, the performance remains generally significantly lower than a full-fledge H.264/AVC codec.

Very different tools and approaches have been proposed over the years to increase the performance of DVC.

The compression efficiency of DVC depends strongly on the correlation between the SI and the actual WZ frame. The SI is commonly generated by linear interpolation of the motion field between successive previously decoded frames. While the linear motion assumption holds for sequences with simple motion, the coding performance drops for more complex sequences. In [21, 22], spatial smoothing and refinement of the motion vectors is carried out. By removing some discontinuities and outliers in the motion field, it leads to better prediction. In the same way, in [23], two SIs are generated by extrapolation of the previous and next key frames, respectively, using forward and backward motion vectors. Then, the decoding process makes use of both SI concurrently. Subpixel accuracy, similar to the method in H.264/AVC, is proposed in [24] in order to further improve motion estimation for SI generation.

Another approach to improve coding efficiency is to rely on iterative SI generation and decoding. In [25], motion vectors are refined based on bitplane decoding of the reconstructed WZ frame as well as previously decoded key frames. It also allows for different interpolation modes. However, only minor performance improvements are reported. The approach in [26] shares some similarities. A partially decoded WZ frame is first reconstructed. The latter is then exploited for iteratively enhancing motion-compensated temporal interpolation and SI generation. An iterative method by way of multiple SI with motion refinement is introduced in [27]. The turbo decoder selects for each block which SI stream to use, based on the error probability. Finally, exploiting both spatial and temporal correlations in the sequence, a partially decoded WZ frame is exploited to improve the performance of the whole SI generation in [28]. In addition, an enhanced motion compensated temporal frame interpolation is proposed.

A different alternative is for the encoder to transmit auxiliary information about the WZ frames in order to assist the SI generation in the decoder. For instance, CRCs are transmitted in [4, 10], whereas hash codes are used in [15, 29]. At the decoder, multiple predictors are used, and the CRC or hash is exploited to verify successful decoding. In [30], 3D model-based frame interpolation is used for SI. For this purpose, feature points are extracted from the WZ frames at the encoder and transmitted as supplemental information. The decoder makes use of these feature points to correct misalignments in the 3D model. By taking into account geometric constraints, this method leads to an improved SI, especially for static scenes with moving camera.

Another important factor impacting the performance of DVC is the estimation of the correlation model between SI and WZ frames. In some earlier DVC schemes [5], a Laplacian model is computed offline, under the unrealistic assumption that original frames are available at the decoder. In [31], a method is proposed for online estimation at the decoder of the correlation model. Another technique, proposed in [32], consists in computing the parameters of the correlation model at the encoder by approximating the SI.

For the blocks of the frame where the SI fails to provide a good predictor, in other words for the regions where the correlation between SI and WZ frame is low, it is advantageous to encode them in Intramode. In [16], a block-based coding mode selection is introduced based on the estimation of SI at the encoder side. Namely, blocks with weak correlation estimation are Intracoded. This method shares some similarities with the mode selection previously described for PRISM [4, 10].

The reconstruction module also plays an important role in determining the quality of the decoded video. In the Stanford architecture [5, 11], the reconstructed pixel is simply calculated from the corresponding side information and boundaries of the quantization interval. Another approach is proposed in [33], which takes advantage of the average statistical distribution of transform coefficients. In [34], the reconstructed value is instead computed as the expectation of the source coefficient given the quantization interval and the side information value, showing improved performance. A novel algorithm is introduced in [35], which exploits the statistical noise distribution of the DVC-decoded output.

Note that closing the performance gap with conventional coding is not simply a question of finding new and improved DVC techniques. Indeed, as stated in Section 2, some theoretical hurdles exist. First, the Slepian-Wolf theorem states that SW coding can achieve the same coding performance asymptotically. In practice, using finite block lengths results in a performance loss which can be sizeable [8]. Then, the Wyner-Ziv theorem holds for Gaussian sources, although video data statistics is known to be non-Gaussian.

The performance of decoder side motion interpolation is also theoretically analyzed in [36, 37]. In [36], it is shown that the accuracy of the interpolation depends strongly on the temporal coherence of the motion field as well as the distance between successive key frames. A model, based on a state-space model and Kalman filtering, demonstrates that DVC with motion interpolation at the decoder cannot reach the performance of conventional predictive coding. A method to optimize the GOP size is also proposed. In [37], a model is proposed to study the performance of DVC. It is theoretically shown that conventional motion-compensated predictive interframe coding outperforms DVC by 6 dB or more. Subpixel and multireference motion search methods are also examined.

In this special issue, three contributions address different means to improve coding efficiency. In [38], Wu et al. address the shortcoming of the common motion-compensated temporal interpolation which assumes that the motion remains translational and constant between key frames. In this paper, a spatial-aided Wyner-Ziv video coding is proposed. More specifically, auxiliary information is encoded with DPCM at the encoder and transmitted along with WZ bitstream. At the decoder, SI is generated by spatial-aided motion-compensated extrapolation exploiting this auxiliary information. It is shown that the proposed scheme achieves better rate distortion performance than conventional motion-compensated extrapolation-based WZ coding without auxiliary information. It is also demonstrated that the scheme efficiently improves WZ coding performance for low-delay applications.

Sofke et al. [39] consider the problem that current WZ coding schemes do not allow controlling the target quality in an efficient way. Indeed, this may represent a major limitation for some applications. An efficient quality control algorithm is introduced in order to maintain uniform quality through time. It is achieved by dynamically adapting the quantization parameters depending on the desired target quality without any a priori knowledge about the sequence characteristics.

Finally, the contribution [40] by Ye et al. proposes a new SI generation and iterative reconstruction scheme. An initial SI is first estimated using common motion-compensated interpolation, and a partially decoded WZ frame is obtained. Next, the latter is used to generate an improved SI, featuring motion vector refinement and smoothing, a new matching criterion, and several compensation modes. Finally, the reconstruction step is carried out again to get the decoded WZ frame. The same idea is also applied to a new hybrid spatial and temporal error concealment scheme for WZ frames. It is shown that the proposed scheme outperforms a state-of-the-art DVC codec.

3.2. Complexity

Among the claimed benefits of DVC, low-complexity encoding is often the most widely cited advantage. Relative to conventional coding schemes that employ motion estimation at the encoder, DVC provides a framework that eliminates this high computational burden altogether as well as the corresponding memory to store reference frames. Encoding complexity was evaluated in [19, 41]. Not surprisingly, it showed that DVC encoding complexity (DISCOVER codec based on the Stanford architecture) was indeed providing a substantial speed-up when compared to conventional H.264/AVC Intra and H.264/AVC No Motion in terms of software execution time.

Not only does the DVC decoder need to generate side information, which is often done using computationally intense motion estimation techniques, but it also incurs the complexity of a typical channel decoding process. When the quality of the side information is very good, the time for channel decoding could be lower. But in general, several iterations are required to converge to a solution. In [19, 41], it is shown that the DVC decoder is several orders of magnitude more complex in term of software execution time compared to that of a conventional H.264/AVC Intraframe decoder and about 10–20 times more complex than an H.264/AVC Intraframe encoder.

Clearly, this issue has to be addressed for DVC to be used in any practical setting. In [42], a hybrid encoder-decoder rate control is proposed with the goal to reduce decoding complexity while having a negligible impact on encoding complexity and coding performance. Decoding execution time reductions of up to 70% are reported.

While the signal processing community had devoted little research effort to reduce the decoder complexity of DVC, there is substantial work on fast and parallel implementations of various channel decoding algorithms, including turbo decoding and belief propagation (BP). For instance, it has been shown that parallelization of the message-passing algorithm used in belief propagation can result in speed-ups of approximately 13.5 on a multicore processor relative to single processor implementations [43]. There also exists decoding methods that use information from earlier-decoded nodes to update the latter-decoded nodes in the same iteration, for example, Shuffled BP [44, 45]. It should also be possible to reduce complexity of the decoding process by changing the complexity of operations at the variable nodes, for example, replacing complex trigonometric functions by simple majority voting. These and other innovations should help to alleviate some of the complexity issues for DVC decoding. Certainly, more research is needed to achieve desirable performance. Optimized decoder implementations on multicore processors and FPGAs should specifically be considered.

3.3. Robust Transmission

Distributed video coding principles have been extensively applied in the field of robust video transmission over unreliable channels. One of the earliest examples is given by the PRISM coding framework [4, 10, 46], which simultaneously achieves light encoding complexity and robustness to channel losses. In PRISM, each block is encoded without the deterministic knowledge of its motion-compensated predictor, which is made available at the decoder side only. If the predictor obtained at the decoder is within the noise margin for the number of encoded cosets, the block is successfully decoded. The underlying idea is that, by adjusting the number of cosets based on the expected correlation channel, decoding is successfully achieved even if the motion compensated predictor is noisy, for example, due to packet losses affecting the reference frame.

These results were extended to a fully scalable video coding scheme in [47, 48], which is shown to be robust to losses that affect both the enhancement and the base layers. This is due to the fact that the correlation channel that characterizes the dependency between different scalability layers is captured at the encoder in a statistical, rather than deterministic, way.

Despite PRISM, most of the distributed video coding schemes that focus on error resilience try to increase the robustness of standard encoded video by adding redundant information encoded according to distributed video coding principles. One of the first works along this direction is presented in [49], where auxiliary data is encoded only for some frames, denoted as "peg" frames, in order to stop drift propagation at the decoder. The idea is to achieve the robustness of intrarefresh frames, without the rate overhead due to intraframe coding.

In [50], a layered WZ video coding framework similar to Fine Granularity Scalability (FGS) coding is proposed, in the sense that it considers the standard coded video as the base layer and generates an embedded bitstream as the enhancement layer. However, the key difference with respect to FGS is that, instead of coding the difference between the original video and the base layer reconstruction, the enhancement layer is "blindly" generated, without knowing the base layer. Although the encoder does not know the exact realization of the reconstructed frame, it can try to characterize the effect of channel errors (i.e., packet losses) in statistical terms, in order to perform optimal bit allocation. This idea has been pursued, for example, in [51] where a PRISM-like auxiliary stream is encoded for Forward Error Protection (FEP), and rate-allocation is performed at the encoder by exploiting the information provided by the Recursive Optimal Per-pixel Estimate (ROPE) algorithm.

Distributed video coding has been applied to error resilient MPEG-2 video broadcasting in [52], where a systematic lossy source channel coding framework is proposed, referred to as Systematic Lossy Error Protection (SLEP). An MPEG-2 video bitstream is transmitted over an error-prone channel without error protection. In addition, a supplementary bitstream is generated using distributed video coding tools, which consists of a coarsely quantized video bitstream obtained using a conventional hybrid video coder, applying Reed–Solomon codes, and transmitting only the parity symbols. In the event of channel errors, the decoder decodes these parity symbols using the error-prone conventionally decoded MPEG-2 video sequence as side information. The SLEP scheme has also been extended to the H.264/AVC video coding standard [53]. Based on the SLEP framework, the scheme proposed in [53] performs Unequal Error Protection (UEP) assigning different amounts of parity bits between motion information and transform coefficients. This approach shares some similarities with the one presented in [54] where a more sophisticated rate allocation algorithm, based on the estimated induced channel distortion, is proposed.

To date, the robustness to transmission errors has proved to be one of the most promising directions for DVC in order to bring this technology to a viable and competitive level in the market place.

In this special issue, two papers propose the use of DVC for robust video transmission. In particular, the contribution by Tonoli et al. [55] evaluates and compares the error resilience performance of two distributed video coding architectures: the DISCOVER codec [18] which is based on the Stanford architecture [5, 11], and a codec based on the PRISM architecture [4, 10]. In particular, a rate-distortion analysis of the impact of transmission errors has been carried out. Moreover, a performance comparison with H.264/AVC, both without error protection and with a simple FEP, is also reported. It is shown that the codecs behavior strongly depends on the content. More specifically, PRISM performs better on low-motion sequences, whereas DISCOVER is more efficient otherwise.

In [56] Liang et al. propose three schemes based on Wyner-Ziv coding for unequal error protection. They apply different levels of protection to motion information and transform coefficients in an H.264/AVC stream, and they are shown to provide with better error resilience in the presence of packet loss when compared to equal error protection.

3.4. Scalability

With the emergence of heterogeneous multimedia networks and the variety of client terminals, scalable coding is becoming an attractive feature. With a scalable representation, the video content is encoded once but can be decoded at different spatial and temporal resolutions or quality levels, depending on the network conditions and the capabilities of the terminal. Due to the absence of a closed-loop in its design, DVC supports codec-independent scalability. Namely, WZ enhancement layers can be built upon conventional or DVC base layers which are used as SI.

In [47], a scalable version of PRISM [4, 10] is presented. Namely, an H.264/AVC base layer is augmented with a PRISM enhancement layer, leading to a spatiotemporal scalable video codec. It is shown that the scalable version of PRISM outperforms the nonscalable one as well as H.263+ Intra. However, the performance remains lower when compared to motion compensated H.263+.

In [57], the problem of scalable predictive video coding is posed as a variant of the WZ side information problem. This approach relaxes the conventional constraint that both the encoder and decoder employ the very same prediction loops, hence enabling a more flexible prediction across layers and preventing the occurrence of prediction drift. It is shown that the proposed scheme outperforms a simple scalable codec based on conventional coding.

A framework for efficient and low-complexity scalable coding based on distributed video coding is introduced in [32]. Using an MPEG-4 base layer, a multilayer WZ prediction is introduced which results in improved temporal prediction compared to MPEG-4 FGS [58]. Significant coding gain is achieved over MPEG-4 FGS for sequences with high temporal correlation.

Finally, [59] proposes DVC-based scalable video coding schemes supporting temporal, spatial, and quality scalability. Temporal scalability is realized by using a hierarchical motion-compensated interpolation and SI generation. Conversely, a combination of spatial down- and upsampling filters along with WZ coding is used for spatial scalability. The codec independence is illustrated by using both H.264/AVC Intra and JPEG 2000 [60] base layers, with the same enhancement WZ layer.

While the variety of scalability offered by DVC is intriguing, a strong case remains to be made where its specificities play a critical role in enabling new applications.

In this special issue, two contributions address the use of DVC for scalable coding. In the first one [61] by Macchiavello et al. the rate-distortion performance of different SI estimators is compared for temporal and spatial scalable WZ coding schemes. In the case of temporal scalability, a new algorithm is proposed to generate SI using a linear motion model. For spatial scalability, a superresolution method is introduced for upsampling. The performance of the scalable WZ codec is assessed using H.264/AVC as reference.

In the second contribution [62] Devaux and De Vleeschouwer propose a highly scalable video coding scheme based on WZ, supporting fine-grained scalability in terms of resolution, quality, and spatial access as well as temporal access to individual frames. JPEG 2000 is used to encode Intrainformation, whereas blocks changing between frames are refreshed using WZ coding. Due to the fact that parity bits aim at correcting stochastic errors, the proposed approach is able to handle a loss of synchronization between the encoder and decoder. This property is important for content adaptation due to fluctuating network conditions.

3.5. Multiview

With its ability to exploit intercamera correlation at the decoder side, without communication between cameras, DVC is also well suited for multiview video coding where it could offer a noteworthy architectural advantage. Moreover, multiview coding is gathering a lot of interests lately, as it is attractive for a number of applications such as stereoscopic video, free viewpoint television, multiview 3D television, or camera networks for surveillance and monitoring.

When compared to monoview, the main difference in multiview DVC is that the SI can be computed not only from previously decoded frames in the same view but also from frames in other views. Another important matter concerns the generation of the joint statistical model describing the multiple views.

Disparity Compensation View Prediction (DCVP) [63] is a straightforward extension of motion compensated temporal interpolation, where the prediction is carried out by motion compensation of the frames in other views using disparity vectors. Multiview Motion Estimation (MVME) [64] estimates motion vectors in the side views and then applies them to the view to be WZ encoded. For this purpose, disparity vectors between views have also to be estimated. A homography model, estimated by global motion estimation, is rather used in [65] for interview prediction, showing significant improvement in the SI quality. Another approach is View Synthesis Prediction (VSP) [66]. Pixels from one view are projected to the 3D world coordinates using intrinsic and extrinsic camera parameters and then are used to predict another view. The drawback of this approach is that it requires depth information and the quality of the prediction depends on the accuracy of the camera calibration as well as the depth estimation. Finally, View Morphing (VM) [67], which is commonly used to create a synthesized image for a virtual camera positioned between two real cameras using principles of projective geometry, can also be applied to estimate SI from side views.

When the SI can be generated either from the view to be WZ encoded, using motion compensated temporal interpolation, or from side views, using one of the method previously described, the next issue is how to combine these different predictions. For fusion at the decoder side, the challenge lies in the difficulty of determining the best predictor. In [68], a technique is proposed to fuse intraview temporal and interview homography side information. It exploits the previous and next key frames to choose the best predictor on a pixel basis. It is shown that the proposed approach outperforms monoview DVC for video sequences containing significant motion. Two fusion techniques are introduced in [69]. They rely on a binary mask to estimate the reliability of each prediction. The latter is computed on the side views and projected on the view to be WZ encoded. However, depth information is required for intercamera disparity estimation. The technique in [70] combines a discrete wavelet transform and turbo codes. Fusion is performed between intraview temporal and interview homography side information, based on the amplitude of motion vectors. It is shown that this fusion technique surpasses inter-view temporal side information. Moreover, the resulting multiview DVC scheme significantly outperforms H.263+ Intracoding. The method in [71] follows a similar approach but relies on the H.264/AVC mode decision applied on blocks in the side views. Experimental results confirm that this method achieves notably better performance than H.263+ Intracoding and is close to Intercoding efficiency for sequences with complex motion. Taking a different approach, in [63] a binary mask is computed at the encoder and then transmitted to the decoder in order to help the fusion process. Results show that the approach improves coding efficiency when compared to monoview DVC. Finally, video sensors to encode multiview video are described in [72]. The scheme exploits both interview correlation by disparity compensation from other views as well as temporal correlation by motion compensated lifted wavelet transform. The proposed scheme leads to a bit rate reduction by performing joint decoding when compared to separate decoding. Note that in all the above techniques, the cameras do not need to communicate. In particular, the joint statistical model is still derived at the decoder.

Two papers address multiview DVC coding in this special issue. In the first one [73], Taguchi and Naemura present a multiview DVC system which combines decoding and rendering to synthesize a virtual view while avoiding full reconstruction. More specifically, disparity compensation and geometric estimation are performed jointly. The coding efficiency of the system is evaluated, along with the decoding and rendering complexity.

The paper by Ouaret et al. [74] explores and compares different intercamera prediction techniques for SI. The assessment is done in terms of prediction quality, complexity, and coding performance. In addition, a new technique, referred to as Iterative Multiview Side Information, is proposed, using an iterative reconstruction process. Coding efficiency is compared to H.264/AVC, H.264/AVC No Motion and H.264/AVC Intra.

3.6. Applications beyond Coding

The DSC paradigm has been widely applied to realize image and video coding systems that shift a significant part of the computational load from the transmitter to the receiver side or allow a joint decoding of images taken by different cameras without any need of information exchange among the coders. Outside the coding scenario, DSC has also found applications for some other domains.

For example, watermarks are normally used for media authentication, but one serious limitation of watermarks is lack of backward compatibility. More specifically, unless the watermark is added to the original media, it is not possible to authenticate it. In [75], an application of the DSC concepts to media hashing is proposed. This method provides a Slepian-Wolf encoded quantized image projection as an authentication data which can be successfully decoded only by using an authentic image as side information. DSC helps in achieving false acceptance rates close to zero for very small authentication data size. This scheme has been extended for tampering localization in [76].

Systems presented in [75, 76] can do successful image authentication for JPEG compressed images but are not able to work correctly if the transmission channel applies any linear transformation on the image such as contrast and brightness adjustment in addition to JPEG compression. Some improvements are presented in [77]. In [78], a more sophisticated system for image tampering detection is presented. It combines DVC and Compressive Sensing concepts to realize a system that is able to detect practically any type of image modification and is also robust to geometrical manipulation (cropping, rotation, change of scale, etc.).

In [79, 80], distributed source coding techniques are used for designing a secure biometric system for fingerprints. This system uses a statistical model of relationship between the enrollment biometric and the noisy biometric measurement taken during authentication.

In [81], a Wyner-Ziv coding technique is applied for multiple bit rate video streaming, which allows the server to dynamically change the transmitted stream according to available bandwidth. More specifically, in the proposed scheme, a switching stream is coded using Wyner-Ziv coding. At the decoder side, the switch-to frame is reconstructed by taking the switch-from frame as side information.

The application of DSC to other domains beyond coding is still a relatively new topic of research. It is not unexpected that further explorations will lead to significant results and opportunities for successful applications.

In this special issue, the paper by Valenzise et al. [82] deals with the application of DSC to audio tampering detection. More specifically, the proposed scheme requires that the audio content provider produces a small hash signature by computing a limited number of random projections of a perceptual, time-frequency representation of the original audio stream; the audio hash is given by the syndrome bits of an LDPC code applied to the projections. At the user side, the hash is decoded using distributed source coding tools, provided that the distortion introduced by tampering is not too high. If the tampering is sparsifiable or compressible in some orthonormal basis or redundant dictionary (e.g., DCT or wavelet), it is possible to identify the time-frequency position of the attack.

4. Perspectives

Based on the above considerations, in this section we offer some thoughts about the most important technical benefits provided by the DVC paradigm and the most promising perspectives and applications.

DVC has brought to the forefront a new coding paradigm, breaking the stronghold of motion-compensated DCT-based hybrid coding such as MPEG and ITU-T standards, and shedding a new light on the field of video coding by opening new research directions.

From a theoretical perspective, the Slepian-Wolf and Wyner-Ziv theorems state that DVC can potentially reach the same performance as conventional coding. However, as discussed in Section 2.4, in practice, this has only been achieved when the additional constraint of low complexity encoding is taken into account. In this case, state-of-the-art DVC schemes nowadays consistently outperform H.264/AVC Intracoding, while encoding is significantly simpler. Additionally, for sequences with simple motion, DVC matches and even in some cases surpasses H.264/AVC No Motion coding. However, the complexity advantage provided by DVC may be very transient, as with Moore's law, computing power increases exponentially and makes cost-effective within a couple of years the implementation that is not manageable today. As a counter argument to this, the time to have a solution with competitive cost relative to alternatives could be more than a couple years and this typically depends on the volumes that are sold and level of customization. Simply stated, we cannot always expect a state-of-the-art coding solution with a certain cost to be the best available option for all systems, especially those with high-resolution video specifications and nontypical configurations. It is also worth noting that there are applications that cannot tolerate high complexity coding solutions and are typically limited to intraframe coding due to platform and power consumption constraints; space and airborne systems are among the class of applications that fall into this category. For these reasons, it is possible that DVC can occupy certain niche applications provided that coding efficiency and complexity are at competitive and satisfactory levels.

Another domain where DVC has been shown to be appealing is for video transmission over error-prone network channels. This follows from the statistical framework on which DVC relies, and especially the absence of prediction loop in the codec. Moreover, as the field of DVC coding is still relatively young and the subject of intensive research, it is not unreasonable to expect further significant performance improvements in the near future.

The codec-independent scalability property of DVC is interesting and may bring an additional helpful feature in some applications. However, it is unlikely to be a differentiator by itself. Indeed, scalability is most often a secondary goal, surpassed by more critically important features such as coding efficiency or complexity. Moreover, the codec-independent flavor brought by DVC has not found its killer application yet.

Multiview coding is another domain where DVC shows promises. On top of the above benefits for monoview, DVC allows for an architecture where cameras do not need to communicate, while still enabling the exploitation of interview correlation during joint decoding. This may prove a significant advantage from a system implementation standpoint, avoiding complex and power consuming networking. However, multiview DVC coding systems reported to date still reveal a significant rate-distortion performance gap when compared to independent H.264/AVC coding for each camera. Note that the latter has to be preferred as a point of reference instead of Multiview Video Coding (MVC), as MVC requires communication between the cameras. Moreover, the amount of interview correlation, usually significantly lower than intraview temporal correlation, depends strongly on the geometry of the cameras and the scene.

Taking a very different path, it has been proposed in [83] to combine conventional and distributed coding into a single framework in order to move ahead towards the next rate-distortion performance level. Indeed, the significant coding gains of MPEG and ITU-T schemes over the years have mainly been the result of more complex analysis at the encoder. However, these gains have been harder to achieve lately and performance tends to saturate. The question remains whether more advanced analysis at the decoder, borrowing from distributed coding principles, could be the next avenue for further advances. In particular, this new framework could prove appealing for the up-and-coming standardization efforts on High-performance Video Coding (HVC) in MPEG and Next Generation Video Coding (NGVC) in ITU-T, which aim at a new generation of video compression technology.

Finally, while most of the initial interest in distributed source coding principles has been towards video coding, it is becoming clear that these ideas are also helpful for a variety of other applications beyond coding, including media authentication, secure biometrics, and tampering detection.

Based on the above considerations, DVC is most suited for applications which require low complexity and/or low power consumption at the encoder and video transmission over noisy channels, with content characterized by low-motion activity. Under the combination of these conditions, DVC may be competitive in terms of rate-distortion performance when compared to conventional coding approaches.

Following a detailed analysis, 11 promising application scenarios for DVC have been identified in [84]: wireless video cameras, wireless low-power surveillance, mobile document scanner, video conferencing with mobile devices, mobile video mail, disposable video cameras, visual sensor networks, networked camcorders, distributed video streaming, multiview video entertainment, and wireless capsule endoscopy. This inventory represents a mixture of applications covering a wide range of constraints offering different opportunities, and challenges, for DVC. Only time will tell which ones of those applications will span out and successfully deploy DVC-based solutions in the market place.

5. Conclusions

This paper briefly reviewed some of the most timely trends and perspectives for the use of DVC in coding applications and beyond. The following papers in this special issue further explore selected topics of interest addressing open issues in coding efficiency, error resilience, multiview coding, scalability, and applications beyond coding. This survey provides with a snapshot of significant research activities in the field of DVC but is by no means exhaustive. It is foreseen that this relatively new topic will remain a dynamic area of research in the coming years, which will bring further significant developments and progresses.

References

Wiegand T, Sullivan GJ, Bjøntegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 2003,13(7):560-576.
Article Google Scholar
Slepian D, Wolf JK: Noiseless coding of correlated information sources. IEEE Transactions on Information Theory 1973,19(4):471-480. 10.1109/TIT.1973.1055037
Article MathSciNet MATH Google Scholar
Wyner AD, Ziv J: The rate-distortion function for source coding with side information at the decoder. IEEE Transactions on Information Theory 1976,22(1):1-10. 10.1109/TIT.1976.1055508
Article MathSciNet MATH Google Scholar
Puri R, Ramchandran K: PRISM: a new robust video coding architecture based on distributed compression principles. Proceedings of Allerton Conference on Communication, Control and Computing, October 2002, Allerton, Ill, USA
Google Scholar
Aaron A, Zhang RUI, Girod B: Wyner-Ziv coding of motion video. Proceedings of the 36th Asilomar Conference on Signals Systems and Computers, November 2002, Pacific Grove, Calif, USA 240-244.
Google Scholar
Guillemot C, Pereira F, Torres L, Ebrahimi T, Leonardi R, Ostermann J: Distributed monoview and multiview video coding. IEEE Signal Processing Magazine 2007,24(5):67-76.
Article Google Scholar
Dragotti PL, Gastpar M: Distributed Source Coding: Theory, Algorithms and Applications. Academic Press, New York, NY, USA; 2009.
Google Scholar
He DAK, Lastras-Montano LA, Yang ENH: A lower bound for variable rate slepian-wolf coding. Proceedings of IEEE International Symposium on Information Theory (ISIT '06), July 2006, Seattle, Wash, USA 341-345.
Google Scholar
Pradhan SS, Chou JIM, Ramchandran K: Duality between source coding and channel coding and its extension to the side information case. IEEE Transactions on Information Theory 2003,49(5):1181-1203. 10.1109/TIT.2003.810622
Article MathSciNet MATH Google Scholar
Puri R, Majumdar A, Ramchandran K: PRISM: a video coding paradigm with motion estimation at the decoder. IEEE Transactions on Image Processing 2007,16(10):2436-2448.
Article MathSciNet Google Scholar
Girod B, Aaron AM, Rane S, Rebollo-Monedero D: Distributed video coding. Proceedings of the IEEE 2005,93(1):71-83.
Article Google Scholar
Aaron A, Rane S, Setton E, Girod B: Transform-domain Wyner-Ziv codec for video. Visual Communications and Image Processing, January 2004, San Jose, Calif, USA, Proceedings of SPIE 5308: 520-528.
Google Scholar
Varodayan D, Aaron A, Girod B: Rate-adaptive distributed source coding using low-density parity-check codes. Proceedings of the 39th Asilomar Conference on Signals, Systems and Computers, November 2005, Pacific Grove, Calif, USA 1203-1207.
Google Scholar
Pereira F, Brites C, Ascenso J, Tagliasacchi M: Wyner-Ziv video coding: a review of the early architectures and further developments. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '08), June 2008, Hannover, Germany 625-628.
Google Scholar
Aaron A, Rane S, Girod B: Wyner-Ziv video coding with hash-based motion compensation at the receiver. Proceedings of International Conference on Image Processing (ICIP '04), October 2004, Singapore 3097-3100.
Google Scholar
Tagliasacchi M, Trapanese A, Tubaro S, Ascenso J, Brites C, Pereira F: Intra mode decision based on spatio-temporal cues in pixel domain Wyner-Ziv video coding. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France 2: 57-60.
Google Scholar
Brites C, Pereira F: Encoder rate control for transform domain Wyner-Ziv video coding. Proceedings of the 14th IEEE International Conference on Image Processing (ICIP '07), September 2007, San Antonio, Tex, USA 2: 5-8.
Google Scholar
Artigas X, Ascenso J, Dalai M, Klomp S, Kubasov D, Ouaret M: The discover codec: architecture, techniques and evaluation. Proceedings of Picture Coding Symposium (PCS '07), November 2007, Lisboa, Portugal
Google Scholar
Discover DVC Final Results http://www.img.lx.it.pt/~discover/home.html
Ascenso J, Pereira F: Integrated software tools for distributed video coding. VISNET II Deliverable D1.2.3, June 2009, http://ltswww.epfl.ch/~dufaux/visnet2/dels/d0072.pdf
Ascenso J, Brites C, Pereira F: Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding. Proceedings of the 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services, June-July 2005, Smolenice, Slovak Republic
Google Scholar
Brites C, Ascenso J, Pereira F: Improving transform domain Wyner-Ziv video coding performance. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06), May 2006, Toulouse, France 2: 525-528.
Google Scholar
Misra K, Karande S, Radha H: Multi-hypothesis based distributed video coding using LDPC codes. Proceedings of Allerton Conference on Commun, Control and Computing, September 2005, Allerton, Ill, USA
Google Scholar
Wei L, Zhao Y, Wang A: Improved side-information in distributed video coding. Proceedings of International Conference on Innovative Computing, Information and Control, August-September 2006, Beijing, China
Google Scholar
Ascenso J, Brites C, Pereira F: Motion compensated refinement for low complexity pixel based distributed video coding. Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS '05), September 2005, Como, Italy 593-598.
Google Scholar
Artigas X, Torres L: Iterative generation of motion-compensated side information for distributed video coding. Proceedings of IEEE International Conference on Image Processing (ICIP '05), September 2005, Genova, Italy 833-836.
Google Scholar
Weerakkody WARJ, Fernando WAC, Martínez JL, Cuenca P, Quiles F: An iterative refinement technique for side information generation in DVC. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '07), July 2007, Beijing, China 164-167.
Google Scholar
Ye S, Ouaret M, Dufaux F, Ebrahimi T: Improved side information generation with iterative decoding and frame interpolation for distributed video coding. Proceedings of IEEE International Conference on Image Processing (ICIP '08), October 2008, San Diego, Calif, USA
Google Scholar
Martinian E, Vetro A, Yedidia JS, Ascenso J, Khisti A, Malioutov D: Hybrid distributed video coding using SCA codes. Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing (MMSP '06), October 2006, Victoria, Canada 258-261.
Google Scholar
Maitre M, Guillemot C, Morin L: 3-D model-based frame interpolation for distributed video coding of static scenes. IEEE Transactions on Image Processing 2007,16(5):1246-1257.
Article MathSciNet Google Scholar
Brites C, Pereira F: Correlation noise modeling for efficient pixel and transform domain Wyner-Ziv video coding. IEEE Transactions on Circuits and Systems for Video Technology 2008,18(9):1177-1190.
Article Google Scholar
Wang H, Cheung NM, Ortega A: A framework for adaptive scalable video coding using Wyner-Ziv techniques. EURASIP Journal on Applied Signal Processing 2006, 2006:-18.
Google Scholar
Vatis Y, Klomp S, Ostermann J: Enhanced reconstruction of the quantised transform coefficients for Wyner-Ziv coding. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '07), July 2007, Beijing, China 172-175.
Google Scholar
Kubasov D, Nayak J, Guillemot C: Optimal reconstruction in Wyner-Ziv video coding with multiple side information. Proceedings of the 9th IEEE International Workshop on Multimedia Signal Processing (MMSP '07), October 2007, Crete, Greece 183-186.
Google Scholar
Weerakkody WARJ, Fernando WAC, Kondoz AM: An enhanced reconstruction algorithm for unidirectional distributed video coding. Proceedings of the 12th IEEE International Symposium on Consumer Electronics (ISCE '08), April 2008, Algarve, Portugal
Google Scholar
Tagliasacchi M, Frigerio L, Tubaro S: Rate-distortion analysis of motion-compensated interpolation at the decoder in distributed video coding. IEEE Signal Processing Letters 2007,14(9):625-628.
Article Google Scholar
Li Z, Liu L, Delp EJ: Rate distortion analysis of motion side estimation in Wyner-Ziv video coding. IEEE Transactions on Image Processing 2007,16(1):98-113.
Article MathSciNet Google Scholar
Wu B, Ji X, Zhao D, Gao W: Spatial-aided low-delay Wyner-Ziv video coding. EURASIP Journal on Image and Video Processing 2009, 2009:-11.
Google Scholar
Sofke S, Pereira F, Müller E: Dynamic quality control for transform domain Wyner-Ziv video coding. EURASIP Journal on Image and Video Processing 2009, 2009:-15.
Google Scholar
Ye S, Ouaret M, Dufaux F, Ebrahimi T: Improved side information generation for distributed video coding by exploiting spatial and temporal correlations. EURASIP Journal on Image and Video Processing 2009, 2009:-15.
Google Scholar
Pereira F, Ascenso J, Brites C: Studying the GOP size impact on the performance of a feedback channel-based Wyner-Ziv video codec. Proceedings of IEEE Pacific-Rim Symposium on Image and Video Technology, December 2007, Santiago, Chile
Google Scholar
Areia J, Ascenso J, Brites C, Pereira F: Low complexity hybrid rate control for lower complexity Wyner-Ziv video decoding. Proceedings of European Conference on Signal Processing (EUSIPCO '08), August 2008, Lausanne, Switzerland
Google Scholar
Lai C-H, Hsieh K-Y, Lai S-H, Lee J-K: Parallelization of belief propagation method on embedded multicore processors for stereo vision. Proceedings of the 6th IEEE Workshop on Embedded System for Real-Time Multimedia (ESTIMedia '08), October 2008, Atlanta, Ga, USA 39-44.
Google Scholar
Zhang J, Fossorier M: Shuffled belief propagation decoding. Proceedings of the 36th Annual Asilomar Conference on Signals Systems and Computers, November 2002, Pacific Grove, Calif, USA 8-15.
Google Scholar
Zhang J, Wang Y, Fossorier M, Yedidia JS: Replica shuffled belief propagation decoding of LDPC codes. In Proceedings of the 39th Conference on Information Sciences and Systems (CISS '05), March 2005, Baltimore, Md, USA. The Johns Hopkins University;
Google Scholar
Majumdar A, Chou JIM, Ramchandran K: Robust distributed video compression based on multilevel coset codes. Proceedings of the 37th Asilomar Conference on Signals, Systems and Computers, November 2003, Pacific Grove, Calif, USA 845-849.
Google Scholar
Tagliasacchi M, Majumdar A, Ramchandran K: A distributed-source-coding based robust spatio-temporal scalable video codec. Proceedings of Picture Coding Symposium (PCS '04), December 2004, San Francisco, Calif, USA 435-440.
Google Scholar
Tagliasacchi M, Majumdar A, Ramchandran K, Tubaro S: Robust wireless video multicast based on a distributed source coding approach. Signal Processing 2006,86(11):3196-3211. 10.1016/j.sigpro.2006.03.024
Article MATH Google Scholar
Sehgal A, Jagmohan A, Ahuja N: Wyner-Ziv coding of video: an error-resilient compression framework. IEEE Transactions on Multimedia 2004,6(2):249-258. 10.1109/TMM.2003.822995
Article Google Scholar
Xu Q, Stanković V, Xiong Z: Layered Wyner-Ziv video coding for transmission over unreliable channels. Signal Processing 2006,86(11):3212-3225. 10.1016/j.sigpro.2006.03.017
Article MATH Google Scholar
Fumagalli M, Tagliasacchi M, Tubaro S: Drift reduction in predictive video transmission using a distributed source coded side-channel. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '06), May 2006, Toulouse, France
Google Scholar
Rane S, Aaron A, Girod B: Systematic lossy forward error protection for error-resilient digital video broadcasting—a Wyner-Ziv coding approach. Proceedings of International Conference on Image Processing (ICIP '04), October 2004, Singapore 3101-3104.
Google Scholar
Liang L, Salama P, Delp EJ: Adaptive unequal error protection based on Wyner–Ziv coding. Proceedings of Picture Coding Symposium (PCS '07), November 2007, Lisbon, Portugal
Google Scholar
Bernardini R, Naccari M, Rinaldo R, Tagliasacchi M, Tubaro S, Zontone P: Rate allocation for robust video streaming based on distributed video coding. Signal Processing: Image Communication 2008,23(5):391-403. 10.1016/j.image.2008.04.004
Google Scholar
Tonoli C, Migliorati P, Leonardi R: Error resilience in current distributed video coding architectures. EURASIP Journal on Image and Video Processing 2009, 2009:-18.
Google Scholar
Liang L, Salama P, Delp EJ: Unequal error protection techniques based on Wyner-Ziv coding. EURASIP Journal on Image and Video Processing 2009, 2009:-13.
Google Scholar
Sehgal A, Jagmohan A, Ahuja N: Scalable video coding using Wyner-Ziv codes. Proceedings of Picture Coding Symposium (PCS '04), December 2004, San Francisco, Calif, USA 441-446.
Google Scholar
Ebrahimi T, Pereira F: The MPEG-4 Book. Prentice Hall, Englewood Cliffs, NJ, USA; 2002.
Google Scholar
Ouaret M, Dufaux F, Ebrahimi T: Codec-independent scalable distributed video coding. Proceedings of the 14th IEEE International Conference on Image Processing (ICIP '07), October 2007, San Antonio, Tex, USA 3: 9-12.
Google Scholar
Skodras A, Christopoulos C, Ebrahimi T: The JPEG 2000 still image compression standard. IEEE Signal Processing Magazine 2001,18(5):36-58. 10.1109/79.952804
Article Google Scholar
Macchiavello B, Brandi F, Peixoto E, de Queiroz RL, Mukherjee D: Side-information generation for temporally and spatially scalable Wyner-Ziv codecs. EURASIP Journal on Image and Video Processing 2009, 2009:-11.
Google Scholar
Devaux F-O, De Vleeschouwer C: Parity bit replenishment for JPEG 2000-based video streaming. EURASIP Journal on Image and Video Processing 2009, 2009:-18.
Google Scholar
Ouaret M, Dufaux F, Ebrahimi T: Multiview distributed video coding with encoder driven fusion. Proceedings of European Conference on Signal Processing (EUSIPCO '07), September 2007, Poznan, Poland
Google Scholar
Artigas X, Tarres F, Torres L: Comparison of different side information generation methods for multiview distributed video coding. Proceedings of International Conference on Signal Processing and Multimedia Applications (SIGMAP '07), July 2007, Barcelona, Spain
Google Scholar
Dufaux F, Ouaret M, Ebrahimi T: Recent advances in multi-view distributed video coding. Mobile Multimedia/Image Processing for Military and Security Applications, April 2007, Orlando, Fla, USA, Proceedings of SPIE 6579:
Google Scholar
Martinian E, Behrens A, Jun XIN, Vetro A: View synthesis for multiview video compression. Proceedings of the 25th Picture Coding Symposium (PCS '06), April 2006, Beijing, China
Google Scholar
Seitz SM, Dyer CR: View morphing. Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96), August 1996, New Orleans, La, USA 21-42.
Chapter Google Scholar
Ouaret M, Dufaux F, Ebrahimi T: Fusion-based multiview distributed video coding. Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks (VSSN '06), October 2006, Santa Barbara, Calif, USA 139-144.
Chapter Google Scholar
Artigas X, Angeli E, Torres L: Side information generation for multiview distributed video coding using a fusion approach. Proceedings of the 7th Nordic Signal Processing Symposium (NORSIG '06), June 2006, Reykjavik, Iceland 250-253.
Google Scholar
Guo X, Lu Y, Wu F, Gao W, Li S: Distributed multi-view video coding. Visual Communications and Image Processing, January 2006, San Jose, Calif, USA, Proceedings of SPIE 6077:
Google Scholar
Guo XUN, Lu YAN, Wu F, Zhao D, Gao WEN: Wyner-Ziv-based multiview video coding. IEEE Transactions on Circuits and Systems for Video Technology 2008,18(6):713-724.
Article Google Scholar
Flierl M, Girod B: Coding of multi-view image sequences with video sensors. Proceedings of IEEE International Conference on Image Processing (ICIP '06), October 2006, Atlanta, Ga, USA
Google Scholar
Taguchi Y, Naemura T: Rendering-oriented decoding for a distributed multiview coding system using a coset code. EURASIP Journal on Image and Video Processing 2009, 2009:-12.
Google Scholar
Ouaret M, Dufaux F, Ebrahimi T: Iterative multiview side information for enhanced reconstruction in distributed video coding. EURASIP Journal on Image and Video Processing 2009, 2009:-17.
Google Scholar
Lin YAOC, Varodayan D, Girod B: Image authentication based on distributed source coding. Proceedings of the 14th IEEE International Conference on Image Processing (ICIP '07), October 2007, San Antonio, Tex, USA 3: 5-8.
Google Scholar
Lin YAOC, Varodayan D, Girod B: Image authentication and tampering localization using distributed source coding. Proceedings of the 9th IEEE International Workshop on Multimedia Signal Processing (MMSP '07), October 2007, Chania, Greece 393-396.
Google Scholar
Khanna N, Roca A, Chiu GT-C, Allebach JP, Delp EJ: Improvements on image authentication and recovery using distributed source coding. Media Forensics and Security, January 2009, San Jose, Calif, USA, Proceedings of SPIE 7254:
Chapter Google Scholar
Tagliasacchi M, Valenzise G, Tubaro S: Localization of sparse image tampering via random projections. Proceedings of International Conference on Image Processing (ICIP '08), October 2008, San Diego, Calif, USA 2092-2095.
Google Scholar
Draper SC, Khisti A, Martinian E, Vetro A, Yedidia JS: Using distributed source coding to secure fingerprint biometrics. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '07), April 2007, Honolulu, Hawaii, USA 2: 129-132.
Google Scholar
Sutcu Y, Rane S, Yedidia JS, Draper SC, Vetro A: Feature transformation of biometric templates for secure biometric systems based on error correcting codes. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop (CVPR '08), June 2008, Anchorage, Alaska, USA
Google Scholar
Guo MEI, Lu YAN, Wu F, Zhao D, Gao WEN: Wyner-Ziv switching scheme for multiple bit-rate video streaming. IEEE Transactions on Circuits and Systems for Video Technology 2008,18(5):569-581.
Article Google Scholar
Valenzise G, Prandi G, Tagliasacchi M, Sarti A: Identification of sparse audio tampering using distributed source coding and compressive sensing techniques. EURASIP Journal on Image and Video Processing 2009, 2009:-12.
Google Scholar
Pereira F: Video compression: still evolution or time for revolution? Proceedings of the 10th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS '09), May 2009, London, UK
Google Scholar
Pereira F, Torres L, Guillemot C, Ebrahimi T, Leonardi R, Klomp S: Distributed video coding: selecting the most promising application scenarios. Signal Processing: Image Communication 2008,23(5):339-352. 10.1016/j.image.2008.04.002
Google Scholar

Download references

Acknowledgments

This work was partially supported by the European Network of Excellence VISNET2 (http://www.visnet-noe.org/) funded under the European Commission IST 6th Framework Program (IST Contract 1-038398) and by National Basic Research of China (973 Program) under contract 2009CB320900. The authors would like to thank the anonymous reviewers for their valuable comments, which have helped improving this manuscript.

Author information

Authors and Affiliations

Multimedia Signal Processing Group, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
Frederic Dufaux
School of Electronic Engineering and Computer Science, Peking University, Beijing, 100871, China
Wen Gao
Dipartimento di Elettronica e Informazione, Politecnico di Milano, 20133, Milano, Italy
Stefano Tubaro
Mitsubishi Electric Research Laboratories, Cambridge, MA, 02139, USA
Anthony Vetro

Authors

Frederic Dufaux
View author publications
You can also search for this author in PubMed Google Scholar
Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Tubaro
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Vetro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frederic Dufaux.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dufaux, F., Gao, W., Tubaro, S. et al. Distributed Video Coding: Trends and Perspectives. J Image Video Proc 2009, 508167 (2010). https://doi.org/10.1155/2009/508167

Download citation

Received: 03 July 2009
Revised: 13 December 2009
Accepted: 31 December 2009
Published: 11 April 2010
DOI: https://doi.org/10.1155/2009/508167

Distributed Video Coding: Trends and Perspectives

Abstract

1. Introduction

2. Background

2.1. PRISM Architecture

2.2. Stanford Architecture

2.3. Comparison

2.4. State-of-the-Art Performance

3. Current Topics of Interest

3.1. Coding Efficiency

3.2. Complexity

3.3. Robust Transmission

3.4. Scalability

3.5. Multiview

3.6. Applications beyond Coding

4. Perspectives

5. Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords