Authors: Y. Gao, G. Cheung, T. Maugey, P. Frossard, J. Liang
Title: Encoder-Driven Inpainting Strategy in Multiview Video Compression, accepted in IEEE Transactions on Image Processing (J14)
Abstract: In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by texture and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: a 3D voxel may be represented by pixels in multiple viewpoint images (inter- view redundancy), a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy), and pixels in a local spatial region tend to be similar (inter- pixel redundancy). It is important to exploit these redundancies during inter-view prediction towards effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. Specifically, after pixels in a reference view are projected to a target view via depth-image-based rendering (DIBR) at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform (GFT) or a sparsification procedure using DCT, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3 dB gain in PSNR in reconstructed image quality over a comparable 3D-HEVC implementation using fixed 16 × 16 block size.