[Jan 2016] New paper accepted in IEEE TIP

Authors: T. Maugey, G. Petrazzuoli, P. Frossard, M. Cagnazzo, B. Pesquet-Popescu

Title: Reference view selection in DIBR-based multiview coding accepted in IEEE Transactions on Image Processing (J15)

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video and other emerging multimedia applications require large sets of images hence larger data volumes and increased resources compared to traditional video services. The significant increase of the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multi view video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the dataset. In such coding schemes, the two following questions become fundamental: i) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? ii) where to place these key views in the multiview dataset? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multi view coding scheme. We show that considering the 3D scene geometry in the reference view positioning problem brings significant rate-distortion improvements and outperforms traditional coding strategy that simply selects key frames based on the distance between cameras.

[Oct 2015] New paper accepted in IEEE TIP

Authors: Y. Gao, G. Cheung, T. Maugey, P. Frossard, J. Liang

Title: Encoder-Driven Inpainting Strategy in Multiview Video Compression accepted in IEEE Transactions on Image Processing (J14)

Abstract: In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by texture and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: a 3D voxel may be represented by pixels in multiple viewpoint images (inter- view redundancy), a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy), and pixels in a local spatial region tend to be similar (inter- pixel redundancy). It is important to exploit these redundancies during inter-view prediction towards effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. Specifically, after pixels in a reference view are projected to a target view via depth-image-based rendering (DIBR) at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform (GFT) or a sparsification procedure using DCT, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3 dB gain in PSNR in reconstructed image quality over a comparable 3D-HEVC implementation using fixed 16 × 16 block size.

[Sep 2015] New paper accepted in JVCIR

Authors: A. De Abreu, L. Toni, N. Thomos, T. Maugey, F. Pereira, P. Frossard

Title: Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming accepted in Journal of Visual Communication and Image Representation (Elsevier) (J13)

Abstract: We consider an interactive multiview video streaming (IMVS) system where clients select their preferred viewpoint in a given navigation window. To provide high quality IMVS, many high quality views should be transmitted to the clients. However, this is not always possible due to the limited and heterogeneous capabilities of the clients. In this paper, we propose a novel adaptive IMVS solution based on a layered multiview representation where camera views are organized into layered subsets to match the different clients constraints. We formulate an optimization problem for the joint selection of the views subsets and their encoding rates. Then, we propose an optimal and a reduced computational complexity greedy algorithms, both based on dynamic-programming. Simulation results show the good performance of our novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

[Jul. 2015] New TIP submission

Authors: Thomas Maugey, Giovanni Petrazzuoli, Pascal Frossard, Marco Cagnazzo, Béatrice Pesquet-Popescu

Title: Reference view selection in DIBR-based multiview coding submitted for review to IEEE Transactions on Image Processing

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video and other emerging multimedia applications require large sets of images hence larger data volumes and increased resources compared to traditional video services. The significant increase of the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multi view video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the dataset. In such coding schemes, the two following questions become fundamental: i) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? ii) where to place these key views in the multiview dataset? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multi view coding scheme. We show that considering the 3D scene geometry in the reference view positioning problem brings significant rate-distortion improvements and outperforms traditional coding strategy that simply selects key frames based on the distance between cameras.

[Jun. 2015] New paper accepted in IEEE TMM

Authors: L. Toni, T. Maugey, P. Frossard

Title: Optimized Packet Scheduling in Multiview Video Navigation Systems accepted in  IEEE Transactions on Multimedia (J12)

Abstract: In multiview video systems, multiple cameras generally acquire the same scene from different perspectives, such that users have the possibility to select their preferred viewpoint. This results in large amounts of highly redundant data, which needs to be properly handled during encoding and transmission over resource-constrained channels. In this work, we study coding and transmission strategies in mul- ticamera systems, where correlated sources send data through a bottleneck channel to a central server, which eventually transmits views to different interactive users. We propose a dynamic navigation-path aware packet scheduling optimization under delay, bandwidth, and interactivity constraints aimed at optimizing the quality-of-experience of interactive users. In particular, the scene distortion is minimized while also the distortion variations along most likely navigation paths is minimized. The optimization relies both on a novel rate-distortion model, which captures the importance of each view in the scene reconstruction, and on an objective function that optimizes resources based on a client navigation model. The latter takes into account the distortion experienced by interactive clients as well as the distortion variations that might be observed by clients during multiview navigation. We solve the scheduling problem with a novel trellis-based solution, which permits to formally decompose the multivariate optimization problem thereby significantly reducing the computation complexity. Simulation results show the gain of the proposed algorithm compared to baseline scheduling policies. More in details, we show the gain offered by our dynamic scheduling policy compared to static camera allocation strategies and to schemes with constant coding strategies. Finally, we show that the best scheduling policy consistently adapts to the most likely user navigation path and that it minimizes distortion variations that can be very disturbing for users in traditional navigation systems.

[May 2015] Two ICIP papers accepted

The two following papers have been accepted for presentation in ICIP 2015in Quebec City.

Authors:  T. Maugey,  P. Frossard, C. Guillemot
Title: Guided Inpainting with cluster-based auxiliary information
Abstract: In this paper, we propose a new guided inpainting algorithm based on the exemplar-based approach in order to effectively fill in holes in image synthesis applications. Guided inpainting techniques can be very useful in settings where one has access to the ground truth information like most multiview coding applications. We propose a new auxiliary information based on patch clustering, which is used to refine the candidate exemplar set in the inpainting. For that pur- pose, a new recursive clustering method based on locally linear em- bedding (LLE) is introduced. We then design the guided inpainting solution based on LLE with clustered patches, which contrains the reconstruction to operate in one patch cluster only. The index of the appropriate cluster considered as auxiliary information. Exper- imental results show that our clustering algorithm provides clusters that are well suited to the inpainting problem. They also show that the auxiliary information enables to significantly improve the qual- ity of the inpainted image for a small coding cost. This work is the first study to show that effective inpainting can be performed when the auxiliary information is properly adapted to the characteristics of both the hole and the known texture.

Authors:  A. Roumy, T. Maugey
Title: Universal lossless coding with random user access: the cost of interactivity
Abstract: We consider the problem of video compression with free viewpoint interactivity. It is well believed that allowing the user to choose its view will incur some loss in terms of com- pression efficiency. Here we derive the complete rate-storage region for universal lossless coding under the constraint of choosing the view at the receiver. Moreover we show a coun- terintuitive result: freely choosing its view at the receiver incurs a loss in terms of storage only and not in the trans- mission rate. The gain of the optimal scheme with respect to interactive schemes proposed so far is derived and a practical scheme that achieves this gain is proposed.

[Apr. 2015] New JVCIR soumission

Authors: Ana De Abreu, Laura Toni, Nikolaos Thomos, Thomas Maugey, Fernando Pereira, Pascal Frossard

Title: Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming submitted for review to Journal of Visual Communication and Image Representation (Elsevier)

Abstract: We consider an interactive multiview video streaming (IMVS) system where clients select their preferred viewpoint in a given navigation window. To provide high quality IMVS, many high quality views should be transmitted to the clients. However, this is not always possible due to the limited and heterogeneous capabilities of the clients. In this paper, we propose a novel adaptive IMVS solution based on a layered multiview representation where camera views are organized into layered subsets to match the different clients constraints. We formulate an optimization problem for the joint selection of the views subsets and their encoding rates. Then, we propose an optimal and a reduced computational complexity greedy algorithms, both based on dynamic-programming. Simulation results show the good performance of our novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

[Mar. 2015] New paper accepted in IEEE TCSVT

Authors: S. Khattak, T. Maugey, R. Hamzaoui, S. Ahmad, P. Frossard

Title: Temporal and Inter-view Consistent Error Concealment Technique for Multiview  plus Depth Video Broadcasting accepted in  IEEE Transactions on Circuits and System for Video Technology (J11)

Abstract: Multiview plus depth (MVD) is an emerging video format with many applications, including 3D television and free viewpoint television. During broadcast of MVD video, trans- mission errors may cause the loss of whole frames, resulting in significant degradation of video quality. Error concealment techniques have been widely used to deal with transmission errors in video communication. However, the existing solutions do not address the requirement that the reconstructed frames be consistent with other frames. We propose a consistency model for error concealment of MVD video that allows to maintain a high level of consistency between frames of the same view (temporal consistency) and those of the neighbouring views (inter-view consistency). Simulations with the reference software for the Multiview Video Coding (MVC) project of the Joint Video Team (JVT) of the ISO/IEC Moving Pictures Experts Group (MPEG) and the ITU-T Video Coding Experts Group (VCEG) show that our technique outperforms two standard error concealment techniques with respect to both reconstruction quality and view consistency.

[Feb. 2015] New paper accepted in IEEE TIP

Authors: Thomas Maugey, Antonio Ortega, Pascal Frossard

Title: Graph-based representation for multiview image geometry accepted in  IEEE Transactions on Image Processing (J10)

Abstract: In this paper, we propose a new geometry representation method for multiview image sets. Our approach relies on graphs to describe the multiview geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. Our multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our graph-based representation (GBR) carefully selects the amount of geometry information needed before coding. This is in contrast with depth coding, which directly compresses with losses the original geometry signal, thus making it difficult to quantify the impact of coding errors on geometry-based interpolation. We present the principles of this GBR and we build an efficient coding algorithm to represent it.
We compare our GBR approach to classical depth compression methods and compare their respective view synthesis qualities as a function of the compactness of the geometry description. We show that GBR can achieve significant gains in geometry coding rate over depth-based schemes operating at similar quality. Experimental results demonstrate the potential of this new representation.

More details: here

[Dec. 2014] New TMM submission

Authors: Laura Toni, Thomas Maugey, Pascal Frossard

Title: Optimized Packet Scheduling in Multiview Video Navigation Systems submitted for review to IEEE Transactions on Multimedia (ArXiv version)

Abstract: In multiview video systems, multiple cameras generally acquire the same scene from different perspectives, such that users have the possibility to select their preferred viewpoint. This results in large amounts of highly redundant data, which needs to be properly handled during encoding and transmission over resource-constrained channels. In this work, we study coding and transmission strategies in multicamera systems, where correlated sources send data through a bottleneck channel to a central server, which eventually transmits views to different interactive users. We propose a dynamic correlation-aware packet scheduling optimization under delay, bandwidth, and interactivity constraints. The optimization relies both on a novel rate-distortion model, which captures the importance of each view in the 3D scene reconstruction, and on an objective function that optimizes resources based on a client navigation model. The latter takes into account the distortion experienced by interactive clients as well as the distortion variations that might be observed by clients during multiview navigation.
We solve the scheduling problem with a novel trellis-based solution, which permits to formally decompose the multivariate optimization problem thereby significantly reducing the computation complexity. Simulation results show the gain of the proposed algorithm compared to baseline scheduling policies.
More in details, we show the gain offered by our dynamic scheduling policy compared to static camera allocation strategies and to schemes with constant coding strategies. Finally, we show that the best scheduling policy consistently adapts to the most likely user navigation path and that it minimizes distortion variations that can be very disturbing for users in traditional navigation systems.