[Feb 2017] Two ICIP papers submitted

Two papers have been submitted to ICIP 2017:

Authors: Navid Mahmoudian Bidgoli, Thomas Maugey, Aline Roumy
Title: Correlation Model Selection for interactive video communication
Abstract: Interactive video communication has been recently proposed for multi-view videos. In this scheme, the server has to store the views as compact as possible, while being able to transmit them independently to the users, who are allowed to navigate interactively among the views, hence requesting a subset of them. To achieve this goal, the compression must be done using a model-based coding in which the correlation between the predicted view generated on the user side and the original view has to be modeled by a statistical distribution. In this paper we propose a framework for lossless coding to select a model among a candidate set of models that incurs the lowest extra rate cost to the system. Moreover, in cases where the depth image is available, we provide a method to estimate the correlation model.

Authors: Xin Su, Mira Rizkallah, Thomas Maugey, Christine Guillemot
Title: Graph-based light fields representation and coding using geometry information (webpage)
Abstract: This paper describes a graph-based coding scheme for light fields (LF). It first adapts graph-based representations (GBR) to describe color and geometry information of LF. Graph connections describing scene geometry capture inter-view dependencies. They are used as the support of a weighted Graph Fourier Transform (wGFT) to encode disoccluded pixels. The quality of the LF reconstructed from the graph is enhanced by adding extra color information to the representation for a sub-set of sub-aperture images. Experiments show that the proposed scheme yields rate-distortion gains compared with HEVC based compression (directly compressing the LF as a video sequence by HEVC).

 

[Dec. 2016] New TIT submission

Authors: Elsa Dupraz, Thomas Maugey, Aline Roumy, Michel Kieffer

Title: Rate-storage regions for Massive Random Access submitted for review to IEEE Transactions on Information Theory (version ArXiv)

Abstract: This paper introduces a new source coding paradigm called Massive Random Access (MRA). In MRA, a set of correlated sources is jointly encoded and stored on a server, and clients want to access to only a subset of the sources. Since the number of simultaneous clients can be huge, the server is only authorized to extract a bitstream from the stored data: no re-encoding can be performed before the transmission of the specific client’s request. In this paper, we formally define the MRA framework and we introduce the notion of rate-storage region to characterize the performance of MRA. From an information theoretic analysis, we derive achievable rate-storage bounds for lossless source coding of i.i.d. and non i.i.d. sources, and rate-storage distortion regions for Gaussian sources. We also show two practical implementations of MRA systems based on rate-compatible LDPC codes. Both the theoretical and the experimental results demonstrate that MRA systems can reach the same transmission rates as in traditional point to point source coding schemes, while having a reasonable storage cost overhead. These results constitute a breakthrough for many recent data transmission applications in which only a part of the data is requested by the clients.

[Jul. 2016] New TIP submission

Authors: Xin Su, Thomas Maugey, Christine Guillemot

Title: Graph-based representation for multiview images with complex camera configurations submitted for review to IEEE Transactions on Image Processing

Abstract: Graph-Based Representation (GBR) has recently been proposed for describing color and geometry of multiview video content. The graph vertices represent the color information, while the edges represent the geometry information, i.e., the disparity, by connecting corresponding pixels in two camera views. In this paper, we generalize the GBR to multiview images with complex camera configurations. Compared with the existing GBR, the proposed representation can handle not only horizontal displacements of the cameras but also forward/backward translations, rotations, etc. However, contrary to the usual disparity that is a 2-dimensional vector (denoting horizontal and vertical displacements), each edge in GBR is represented by a one-dimensional disparity. This quantity can be seen as the disparity along an epipolar segment. In order to have a sparse (i.e., easy to code) graph structure, we propose a rate-distortion model to select the most meaningful edges. Hence the graph is constructed with “just enough” information for rendering the given predicted view. The experiments show that the proposed GBR allows high reconstruction quality with lower or equivalent coding rate compared with traditional depth-based representations.

[May 2016] New paper accepted at ICIP 2016

The following paper have been accepted for presentation at ICIP 2016 in Phoenix, Arizona, US.

Authors:  X. Su, T. Maugey,  C. Guillemot
Title: Graph-based Representation for Multiview Images With Complex Camera Configurations
Abstract: Graph-Based Representation (GBR) has recently been proposed for rectified multiview dataset. The core idea of GBR is to use graphs for describing the color and geometry information of a multiview dataset. The color information is represented by the vertices of the graph while the scene geometry is represented by the edges of the graph. In this paper, we generalize the GBR to multi-view images with complex camera configurations. Compared with previous work, the GBR representation introduced in this paper can handle not only horizontal displacements of the cameras but also forward/backward displacements, rotations etc. In order to have a sparse (i.e., easy to code) graph structure, we further propose to use a distortion metric to select the most meaningful connections. For the graph transmission, each selected connection is then replaced by a disparity-based quantity. Hence, the graph is constructed with just enough  information for rendering the given predicted view. The experiments show that the proposed GBR achieves high reconstructing quality with less or comparable coding rate compared with traditional depth-based representations, that directly compress the depth signal without considering the rendering task.

[Jan 2016] New paper accepted in IEEE TIP

Authors: T. Maugey, G. Petrazzuoli, P. Frossard, M. Cagnazzo, B. Pesquet-Popescu

Title: Reference view selection in DIBR-based multiview coding accepted in IEEE Transactions on Image Processing (J15)

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video and other emerging multimedia applications require large sets of images hence larger data volumes and increased resources compared to traditional video services. The significant increase of the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multi view video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the dataset. In such coding schemes, the two following questions become fundamental: i) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? ii) where to place these key views in the multiview dataset? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multi view coding scheme. We show that considering the 3D scene geometry in the reference view positioning problem brings significant rate-distortion improvements and outperforms traditional coding strategy that simply selects key frames based on the distance between cameras.

[Oct 2015] New paper accepted in IEEE TIP

Authors: Y. Gao, G. Cheung, T. Maugey, P. Frossard, J. Liang

Title: Encoder-Driven Inpainting Strategy in Multiview Video Compression accepted in IEEE Transactions on Image Processing (J14)

Abstract: In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by texture and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: a 3D voxel may be represented by pixels in multiple viewpoint images (inter- view redundancy), a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy), and pixels in a local spatial region tend to be similar (inter- pixel redundancy). It is important to exploit these redundancies during inter-view prediction towards effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. Specifically, after pixels in a reference view are projected to a target view via depth-image-based rendering (DIBR) at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform (GFT) or a sparsification procedure using DCT, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3 dB gain in PSNR in reconstructed image quality over a comparable 3D-HEVC implementation using fixed 16 × 16 block size.

[Sep 2015] New paper accepted in JVCIR

Authors: A. De Abreu, L. Toni, N. Thomos, T. Maugey, F. Pereira, P. Frossard

Title: Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming accepted in Journal of Visual Communication and Image Representation (Elsevier) (J13)

Abstract: We consider an interactive multiview video streaming (IMVS) system where clients select their preferred viewpoint in a given navigation window. To provide high quality IMVS, many high quality views should be transmitted to the clients. However, this is not always possible due to the limited and heterogeneous capabilities of the clients. In this paper, we propose a novel adaptive IMVS solution based on a layered multiview representation where camera views are organized into layered subsets to match the different clients constraints. We formulate an optimization problem for the joint selection of the views subsets and their encoding rates. Then, we propose an optimal and a reduced computational complexity greedy algorithms, both based on dynamic-programming. Simulation results show the good performance of our novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

[Jul. 2015] New TIP submission

Authors: Thomas Maugey, Giovanni Petrazzuoli, Pascal Frossard, Marco Cagnazzo, Béatrice Pesquet-Popescu

Title: Reference view selection in DIBR-based multiview coding submitted for review to IEEE Transactions on Image Processing

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video and other emerging multimedia applications require large sets of images hence larger data volumes and increased resources compared to traditional video services. The significant increase of the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multi view video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the dataset. In such coding schemes, the two following questions become fundamental: i) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? ii) where to place these key views in the multiview dataset? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multi view coding scheme. We show that considering the 3D scene geometry in the reference view positioning problem brings significant rate-distortion improvements and outperforms traditional coding strategy that simply selects key frames based on the distance between cameras.

[Jun. 2015] New paper accepted in IEEE TMM

Authors: L. Toni, T. Maugey, P. Frossard

Title: Optimized Packet Scheduling in Multiview Video Navigation Systems accepted in  IEEE Transactions on Multimedia (J12)

Abstract: In multiview video systems, multiple cameras generally acquire the same scene from different perspectives, such that users have the possibility to select their preferred viewpoint. This results in large amounts of highly redundant data, which needs to be properly handled during encoding and transmission over resource-constrained channels. In this work, we study coding and transmission strategies in mul- ticamera systems, where correlated sources send data through a bottleneck channel to a central server, which eventually transmits views to different interactive users. We propose a dynamic navigation-path aware packet scheduling optimization under delay, bandwidth, and interactivity constraints aimed at optimizing the quality-of-experience of interactive users. In particular, the scene distortion is minimized while also the distortion variations along most likely navigation paths is minimized. The optimization relies both on a novel rate-distortion model, which captures the importance of each view in the scene reconstruction, and on an objective function that optimizes resources based on a client navigation model. The latter takes into account the distortion experienced by interactive clients as well as the distortion variations that might be observed by clients during multiview navigation. We solve the scheduling problem with a novel trellis-based solution, which permits to formally decompose the multivariate optimization problem thereby significantly reducing the computation complexity. Simulation results show the gain of the proposed algorithm compared to baseline scheduling policies. More in details, we show the gain offered by our dynamic scheduling policy compared to static camera allocation strategies and to schemes with constant coding strategies. Finally, we show that the best scheduling policy consistently adapts to the most likely user navigation path and that it minimizes distortion variations that can be very disturbing for users in traditional navigation systems.

[May 2015] Two ICIP papers accepted

The two following papers have been accepted for presentation in ICIP 2015in Quebec City.

Authors:  T. Maugey,  P. Frossard, C. Guillemot
Title: Guided Inpainting with cluster-based auxiliary information
Abstract: In this paper, we propose a new guided inpainting algorithm based on the exemplar-based approach in order to effectively fill in holes in image synthesis applications. Guided inpainting techniques can be very useful in settings where one has access to the ground truth information like most multiview coding applications. We propose a new auxiliary information based on patch clustering, which is used to refine the candidate exemplar set in the inpainting. For that pur- pose, a new recursive clustering method based on locally linear em- bedding (LLE) is introduced. We then design the guided inpainting solution based on LLE with clustered patches, which contrains the reconstruction to operate in one patch cluster only. The index of the appropriate cluster considered as auxiliary information. Exper- imental results show that our clustering algorithm provides clusters that are well suited to the inpainting problem. They also show that the auxiliary information enables to significantly improve the qual- ity of the inpainted image for a small coding cost. This work is the first study to show that effective inpainting can be performed when the auxiliary information is properly adapted to the characteristics of both the hole and the known texture.

Authors:  A. Roumy, T. Maugey
Title: Universal lossless coding with random user access: the cost of interactivity
Abstract: We consider the problem of video compression with free viewpoint interactivity. It is well believed that allowing the user to choose its view will incur some loss in terms of com- pression efficiency. Here we derive the complete rate-storage region for universal lossless coding under the constraint of choosing the view at the receiver. Moreover we show a coun- terintuitive result: freely choosing its view at the receiver incurs a loss in terms of storage only and not in the trans- mission rate. The gain of the optimal scheme with respect to interactive schemes proposed so far is derived and a practical scheme that achieves this gain is proposed.