[June 2016] Guest Editor for a MMTC letter

Journal: Special Issue on Interactive Multi-view Video Services: from acquisition to Rendering, IEEE Multimedia Communication Technical Committee letters, Vol. 11(2), March 2016
Guest Editors: Erhan Ekmekcioglu, Thomas Maugey, Laura Toni

List of papers:
Merge Frame for Interactive Multiview Video Navigation
Gene Cheung, Ngai-Man Cheung
National Institute of Informatics, Tokyo, Japan
Singapore University of Technology and Design
An Information theoretical problem in interactive Multi-View Video services
Aline Roumy
Inria, Rennes, France
Free Viewpoint Video Streaming: Concepts, Techniques and Challenges
Árpád Huszák
Budapest University of Technology and Economics, Budapest, Hungary
Quality Assessment in the context of FTV: challenges, first answers and open issues
Federica Battisti, Patrick Le Callet
Roma Tre University, Rome, Italy
IRCCyN UMR CNRS, Polytech Nantes, France
3D Visual Attention for Improved Interaction, Quality Evaluation and Enhancement
Chaminda T.E.R. Hewage
Department of Computing & Information Systems, Cardiff Metropolitan University, Cardiff, UK
RE@CT: Immersive Production and Delivery of Interactive 3D Content
Marco Volino, Dan Casas, John Collomosse and Adrian Hilton
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
SceneNet: Crowd Sourcing of Audio Visual Information aiming to create 4D video streams
D. Eilot, Y. Schoenenberger, A. Egozi, E. Hirsch, T. Ben Nun, Y. Appelbaum-Elad, J. Gildenblat, E. Rubin, P. Maass, P. Vandergheynst, C. Sagiv
SagivTech Ltd.
EPFL
University of Bremen

 

[June 2016] New paper accepted at EUSIPCO 2016

The following paper have been accepted for presentation at EUSIPCO 2016 in Budapest, Hungary.

Authors:  M. Rizkallah, T. Maugey, C. Yaacoub, C. Guillemot
Title: Impact of Light Field Compression on Focus Stack and Extended Focus Images
Abstract: Light Fields capturing all light rays at every point in space and in all directions contain very rich information about the scene. This rich description of the scene enables advanced image creation capabilities, such as re-focusing or extended depth of field from a single capture. But, it yields to a very high volume of data which needs compression. This paper studies the impact of Light Fields compression on two key functionalities: refocusing and extended focus. The sub-aperture images forming the Light Field are compressed as a video sequence with HEVC. A focus stack and the scene depth map are computed from the compressed light field and are used to render an image with an extended depth of field (called the extended focus image). It has been first observed that the Light Field could be compressed with a factor up to 700 without significantly affecting the visual quality of both refocused and extended focus images. To further analyze the compression effect, a dedicated quality evaluation method based on contrast and gradient measurements is considered to differentiate the natural geometrical blur from the blur resulting from compression. As a second part of the experiments, it is shown that the texture distortion of the in-focus regions in the focus stacks is the main cause of the quality degradation in the extended focus and that the depth errors do not impact the extended focus quality unless the light field is significantly distorted with a compression ratio of around 2000:1.

[May 2016] New paper accepted at ICIP 2016

The following paper have been accepted for presentation at ICIP 2016 in Phoenix, Arizona, US.

Authors:  X. Su, T. Maugey,  C. Guillemot
Title: Graph-based Representation for Multiview Images With Complex Camera Configurations
Abstract: Graph-Based Representation (GBR) has recently been proposed for rectified multiview dataset. The core idea of GBR is to use graphs for describing the color and geometry information of a multiview dataset. The color information is represented by the vertices of the graph while the scene geometry is represented by the edges of the graph. In this paper, we generalize the GBR to multi-view images with complex camera configurations. Compared with previous work, the GBR representation introduced in this paper can handle not only horizontal displacements of the cameras but also forward/backward displacements, rotations etc. In order to have a sparse (i.e., easy to code) graph structure, we further propose to use a distortion metric to select the most meaningful connections. For the graph transmission, each selected connection is then replaced by a disparity-based quantity. Hence, the graph is constructed with just enough  information for rendering the given predicted view. The experiments show that the proposed GBR achieves high reconstructing quality with less or comparable coding rate compared with traditional depth-based representations, that directly compress the depth signal without considering the rendering task.

[Jan 2016] New paper accepted in IEEE TIP

Authors: T. Maugey, G. Petrazzuoli, P. Frossard, M. Cagnazzo, B. Pesquet-Popescu

Title: Reference view selection in DIBR-based multiview coding accepted in IEEE Transactions on Image Processing (J15)

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video and other emerging multimedia applications require large sets of images hence larger data volumes and increased resources compared to traditional video services. The significant increase of the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multi view video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the dataset. In such coding schemes, the two following questions become fundamental: i) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? ii) where to place these key views in the multiview dataset? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multi view coding scheme. We show that considering the 3D scene geometry in the reference view positioning problem brings significant rate-distortion improvements and outperforms traditional coding strategy that simply selects key frames based on the distance between cameras.

[Oct 2015] New paper accepted in IEEE TIP

Authors: Y. Gao, G. Cheung, T. Maugey, P. Frossard, J. Liang

Title: Encoder-Driven Inpainting Strategy in Multiview Video Compression accepted in IEEE Transactions on Image Processing (J14)

Abstract: In free viewpoint video systems, a user has the freedom to select a virtual view from which an image of the 3D scene is rendered, and the scene is commonly represented by texture and depth images of multiple nearby viewpoints. In such representation, there exists data redundancy across multiple dimensions: a 3D voxel may be represented by pixels in multiple viewpoint images (inter- view redundancy), a pixel patch may recur in a distant spatial region of the same image due to self-similarity (inter-patch redundancy), and pixels in a local spatial region tend to be similar (inter- pixel redundancy). It is important to exploit these redundancies during inter-view prediction towards effective multiview video compression. In this paper, we propose an encoder-driven inpainting strategy for inter-view predictive coding, where explicit instructions are transmitted minimally, and the decoder is left to independently recover remaining missing data via inpainting, resulting in lower coding overhead. Specifically, after pixels in a reference view are projected to a target view via depth-image-based rendering (DIBR) at the decoder, the remaining holes in the target view are filled via an inpainting process in a block-by-block manner. First, blocks are ordered in terms of difficulty-to-inpaint by the decoder. Then, explicit instructions are only sent for the reconstruction of the most difficult blocks. In particular, the missing pixels are explicitly coded via a graph Fourier transform (GFT) or a sparsification procedure using DCT, leading to low coding cost. For blocks that are easy to inpaint, the decoder independently completes missing pixels via template-based inpainting. We apply our proposed scheme to frames in a prediction structure defined by JCT-3V where inter-view prediction is dominant, and experimentally we show that our scheme achieves up to 3 dB gain in PSNR in reconstructed image quality over a comparable 3D-HEVC implementation using fixed 16 × 16 block size.

[Sep 2015] New paper accepted in JVCIR

Authors: A. De Abreu, L. Toni, N. Thomos, T. Maugey, F. Pereira, P. Frossard

Title: Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming accepted in Journal of Visual Communication and Image Representation (Elsevier) (J13)

Abstract: We consider an interactive multiview video streaming (IMVS) system where clients select their preferred viewpoint in a given navigation window. To provide high quality IMVS, many high quality views should be transmitted to the clients. However, this is not always possible due to the limited and heterogeneous capabilities of the clients. In this paper, we propose a novel adaptive IMVS solution based on a layered multiview representation where camera views are organized into layered subsets to match the different clients constraints. We formulate an optimization problem for the joint selection of the views subsets and their encoding rates. Then, we propose an optimal and a reduced computational complexity greedy algorithms, both based on dynamic-programming. Simulation results show the good performance of our novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

[Sep. 2015] Attending GRETSI and ICIP

In September 2015, I am attending GRETSI and ICIP conferences that takes place respectively in Lyon (France) and Quebec (Canada).

I present three papers:

◊ A. Roumy, T. Maugey, Compression et interactivité : étude de la navigation au récepteur, GRETSI, Lyon, France, Sep. 2015.

◊ T. Maugey, and P. Frossard, C. Guillemot, Guided inpainting with cluster-based auxiliary information, IEEE ICIP, Quebec, Canada, Sep. 2015.

◊ A. Roumy, T. Maugey, Universal lossless coding with random user access: the cost of interactivity, IEEE ICIP, Quebec, Canada, Sep. 2015. Top 10% papers

[Jul. 2015] New TIP submission

Authors: Thomas Maugey, Giovanni Petrazzuoli, Pascal Frossard, Marco Cagnazzo, Béatrice Pesquet-Popescu

Title: Reference view selection in DIBR-based multiview coding submitted for review to IEEE Transactions on Image Processing

Abstract: Augmented reality, interactive navigation in 3D scenes, multiview video and other emerging multimedia applications require large sets of images hence larger data volumes and increased resources compared to traditional video services. The significant increase of the number of images in multiview systems leads to new challenging problems in data representation and data transmission to provide high quality of experience on resource-constrained environments. In order to reduce the size of the data, different multi view video compression strategies have been proposed recently. Most of them use the concept of reference or key views that are used to estimate other images when there is high correlation in the dataset. In such coding schemes, the two following questions become fundamental: i) how many reference views have to be chosen for keeping a good reconstruction quality under coding cost constraints? ii) where to place these key views in the multiview dataset? As these questions are largely overlooked in the literature, we study the reference view selection problem and propose an algorithm for the optimal selection of reference views in multiview coding systems. Based on a novel metric that measures the similarity between the views, we formulate an optimization problem for the positioning of the reference views such that both the distortion of the view reconstruction and the coding rate cost are minimized. We solve this new problem with a shortest path algorithm that determines both the optimal number of reference views and their positions in the image set. We experimentally validate our solution in a practical multiview distributed coding system and in the standardized 3D-HEVC multi view coding scheme. We show that considering the 3D scene geometry in the reference view positioning problem brings significant rate-distortion improvements and outperforms traditional coding strategy that simply selects key frames based on the distance between cameras.

[Jun. 2015] New paper accepted in IEEE TMM

Authors: L. Toni, T. Maugey, P. Frossard

Title: Optimized Packet Scheduling in Multiview Video Navigation Systems accepted in  IEEE Transactions on Multimedia (J12)

Abstract: In multiview video systems, multiple cameras generally acquire the same scene from different perspectives, such that users have the possibility to select their preferred viewpoint. This results in large amounts of highly redundant data, which needs to be properly handled during encoding and transmission over resource-constrained channels. In this work, we study coding and transmission strategies in mul- ticamera systems, where correlated sources send data through a bottleneck channel to a central server, which eventually transmits views to different interactive users. We propose a dynamic navigation-path aware packet scheduling optimization under delay, bandwidth, and interactivity constraints aimed at optimizing the quality-of-experience of interactive users. In particular, the scene distortion is minimized while also the distortion variations along most likely navigation paths is minimized. The optimization relies both on a novel rate-distortion model, which captures the importance of each view in the scene reconstruction, and on an objective function that optimizes resources based on a client navigation model. The latter takes into account the distortion experienced by interactive clients as well as the distortion variations that might be observed by clients during multiview navigation. We solve the scheduling problem with a novel trellis-based solution, which permits to formally decompose the multivariate optimization problem thereby significantly reducing the computation complexity. Simulation results show the gain of the proposed algorithm compared to baseline scheduling policies. More in details, we show the gain offered by our dynamic scheduling policy compared to static camera allocation strategies and to schemes with constant coding strategies. Finally, we show that the best scheduling policy consistently adapts to the most likely user navigation path and that it minimizes distortion variations that can be very disturbing for users in traditional navigation systems.

[May 2015] Two ICIP papers accepted

The two following papers have been accepted for presentation in ICIP 2015in Quebec City.

Authors:  T. Maugey,  P. Frossard, C. Guillemot
Title: Guided Inpainting with cluster-based auxiliary information
Abstract: In this paper, we propose a new guided inpainting algorithm based on the exemplar-based approach in order to effectively fill in holes in image synthesis applications. Guided inpainting techniques can be very useful in settings where one has access to the ground truth information like most multiview coding applications. We propose a new auxiliary information based on patch clustering, which is used to refine the candidate exemplar set in the inpainting. For that pur- pose, a new recursive clustering method based on locally linear em- bedding (LLE) is introduced. We then design the guided inpainting solution based on LLE with clustered patches, which contrains the reconstruction to operate in one patch cluster only. The index of the appropriate cluster considered as auxiliary information. Exper- imental results show that our clustering algorithm provides clusters that are well suited to the inpainting problem. They also show that the auxiliary information enables to significantly improve the qual- ity of the inpainted image for a small coding cost. This work is the first study to show that effective inpainting can be performed when the auxiliary information is properly adapted to the characteristics of both the hole and the known texture.

Authors:  A. Roumy, T. Maugey
Title: Universal lossless coding with random user access: the cost of interactivity
Abstract: We consider the problem of video compression with free viewpoint interactivity. It is well believed that allowing the user to choose its view will incur some loss in terms of com- pression efficiency. Here we derive the complete rate-storage region for universal lossless coding under the constraint of choosing the view at the receiver. Moreover we show a coun- terintuitive result: freely choosing its view at the receiver incurs a loss in terms of storage only and not in the trans- mission rate. The gain of the optimal scheme with respect to interactive schemes proposed so far is derived and a practical scheme that achieves this gain is proposed.