Distributed video coding

Research > Coding > DVC

So-called classical video compression (because more usual) aims at extracting inter frame correlation at the encoder. This approach thus relies on complex techniques (in terms of power requirements) such as motion estimation (or disparity estimation for multi- view sequences) in order to reduce the quantity of information to transmit to the decoder. This scheme is perfectly adapted to the following conditions: a compression performed on a powerfull station, and a light decoding with low-power systems (DVD player, TV broadcasting, etc.). However, whereas these configurations remain usually adopted, some new needs have risen in the last years. Indeed, more and more capture hardware systems need to perform video compression. Furthermore, more and more camera networks sys- tems (such as videosurveillance) require non-complex compression algorithms and above all coding techniques which do not need communication between cameras (necessary in classical video coding since it is needed to extract the intercamera correlation).

Based on all these arguments, distributed video coding paradigm has appeared in early 2000’s. This new paradigm proposes to shift all of the complex interframe comparisons to the decoder side. This idea is based on 30-year old theoretical results from Slepian and Wolf on one hand, and Wyner and Ziv on the other hand, which have stated that, under some specific conditions, two correlated sources could be encoded independently or jointly and transmitted with the same rate and the same distortion, as soon as the decoding is performed jointly.

These seductive theoretical results have led several research teams to develop distributed video coding schemes with the purpose (theoretically possible) to equal the performance of classical schemes such as MPEG-x, H.263, then H.264, etc. However, even if distributed video coding has been rapidly seen as a promising paradigm, the rate-distortion performance of current coders is far from the initial target. Indeed, several hypotheses of the founder theorems are not strictly verified and thus limit the efficiency of the existing codecs. Distributed video coding has nevertheless a lot of room for improvement since many modules can still be enhanced.

The European project Discover has permit to several research teams to develop a complete distributed video coding scheme which is nowadays one of the most efficient and popular existing architectures. This scheme will be the starting point of most of the works presented in this thesis manuscript. That is why we draw, here, the main characteristics of this approach. The images of the sequence are divided into two types, the key frames and the Wyner-Ziv (WZ) frames, split as follows: one key frame, then one WZ frame, another key frame, and so forth. The key frames are independently encoded and decoded using intra codecs such as H.264 Intra or JPEG2000. These are also used at the decoder to generate a WZ frame estimation, called side information. The WZ frames are encoded independently with the classical source coding process: a transformation followed by a quantization. Then, instead of the entropy coder (usually adopted in classical source cod- ing schemes) the output of the quantizer is processed with a channel encoder (LDPC or turbocodes), obtaining a systematic stream (a version of the input), and a parity stream (the redundancy information used to correct the channel errors). The idea consists in not transmitting the systematic information and in replacing it at the decoder by the side information generated with the key frames. Thereby, the parity information, initially de- signed to correct the channel error is transmitted in order to avoid the estimation errors. The WZ stream is then reconstructed and inverse transformed.


Generic Stanford architecture. In italic, the corresponding Discover approach.

The original idea of using channel codes for compression is what makes distributed video coding original and attractive, but it is on the other hand what raises the largest number of limiting aspects and research works. Firstly, the system needs to know the correlation between the side information and the original WZ frame, yet, these two elements are not together available, neither at the encoder nor at the decoder. Moreover, the encoder needs to know the exact number of parity bits to send. That is why, the Discover architecture (and almost all of the existing ones) make a progressive decoding by using a backward channel to request some more parity information as one goes along. It is one major limit of the system because it requires a hardly conceivable real-time transmission and decoding.

The second key element of this scheme is the side information generation task. Decoding performances strongly depend on the WZ estimation quality. That is why many works aims at enhancing the efficiency of motion/disparity estimation techniques.

The work conducted during this thesis led us to investigate many aspects of distributed video coding. First of all, we aimed at studying in detail the conditions of extending distributed video coding to multiview settings, which brings some new important questions, such as the disposition of the key and WZ frames in the time-view space, or the way of generating inter-view estimations and how to merge it with the temporal estimation so that the decoder has a unique side information. While proposing some solutions to these different problems, we have looked into several general aspects of distributed video coding (non specifically monoview or multiview), such as the improvement of temporal interpolation, a refinement of the correlation noise model, the backward channel suppression and a study of the side information quality metrics. Moreover, we have also studied other distributed video coding schemes by developing a hash-based scheme, and a wavelet-based coding architecture in collaboration with different research groups (LSS, IRISA and I3S).