[Nov 2017] New journal paper accepted in IEEE TMM

Authors: R. Ma, T. Maugey, P. Frossard

Title: Optimized Data Representation for Interactive Multiview Navigation accepted in IEEE Transactions on Multimedia, 2017

Abstract: In contrary to traditional media streaming services where a unique media content is delivered to different users, interactive multiview navigation applications enable users to choose their own viewpoints and freely navigate in a 3-D scene. The interactivity brings new challenges in addition to the classical rate-distortion trade-off, which considers only the compression performance and viewing quality. On the one hand, interactivity necessitates sufficient viewpoints for richer navigation; on the other hand, it requires to provide low bandwidth and delay costs for smooth navigation during view transitions. In this paper, we formally describe the novel trade-offs posed by the navigation interactivity and classical rate-distortion criterion. Based on an original formulation, we look for the optimal design of the data representation by introducing novel rate and distortion models and practical solving algorithms. Experiments show that the proposed data representation method outperforms the baseline solution by providing lower resource consumptions and higher visual quality in all navigation configurations, which certainly confirms the potential of the proposed data representation in practical interactive navigation systems.

[Jul 2017] New journal paper accepted in IEEE TIP

Authors: C. Verleysen, T. Maugey, C. De Vleeschouwer, P. Frossard

Title: Wide baseline image-based rendering based on shape prior regularisation accepted in IEEE Transactions on Image Processing, 2017

Abstract: We consider the synthesis of intermediate views of an object captured by two widely spaced and calibrated cameras. This problem is challenging because foreshortening effects and occlusions induce significant differences between the reference images when the cameras are far apart. That makes the association or disappearance/appearance of their pixels difficult to estimate. Our main contribution lies in disambiguating this illposed problem by making the interpolated views consistent with a plausible transformation of the object silhouette between the reference views. This plausible transformation is derived from an object-specific prior that consists of a nonlinear shape manifold learned from multiple previous observations of this object by the two reference cameras. The prior is used to estimate how the epipolar silhouette segments observed in the reference views evolve between those views. This information directly supports the definition of epipolar silhouette segments in the intermediate views, and the synthesis of textures in those segments. It permits to reconstruct the Epipolar Plane Images (EPIs) and the continuum of views associated with the Epipolar Plane Image Volume, obtained by aggregating the EPIs. Experiments on synthetic and natural images show that our method preserves the object topology in intermediate views and deals effectively with the selfoccluded regions and the severe foreshortening effect associated with wide-baseline camera configurations.

[Jul 2017] MMSP paper accepted

Authors: Thomas Maugey, Olivier Le Meur, Zhi Liu
Title: Saliency-based navigation in omnidirectional image, accepted in IEEE MMSP 2017
Abstract: Omnidirectional images describe the color information at a given position from all directions. Affordable 360° cameras have recently been developed leading to an explosion of the 360° data shared on the social networks. However, an omnidirectional image does not contain interesting content everywhere. Some part of the images are indeed more likely to be looked at by some users than others. Knowing these regions of interest might be useful for 360° image compression, streaming, retargeting or even editing. In this paper, a new approach based on 2D image saliency is proposed both to model the user navigation within a 360° image, and to detect which parts of an omnidirectional content might draw users’ attention.
Website: http://people.irisa.fr/Olivier.Le_Meur/publi/2017_MMSP/index.html

[Jun 2017] New MMSP submission

Authors: Thomas Maugey, Olivier Le Meur, Zhi Liu
Title: Saliency-based navigation in omnidirectional image, submitted to IEEE MMSP 2017
Abstract: Omnidirectional images describe the color information at a given position from all directions. Affordable 360° cameras have recently been developed leading to an explosion of the 360° data shared on the social networks. However, an omnidirectional image does not contain interesting content everywhere. Some part of the images are indeed more likely to be looked at by some users than others. Knowing these regions of interest might be useful for 360° image compression, streaming, retargeting or even editing. In this paper, a new approach based on 2D image saliency is proposed both to model the user navigation within a 360° image, and to detect which parts of an omnidirectional content might draw users’ attention.

[Mar 2017] New paper accepted in IEEE TIP

Authors: Xin Su, Thomas Maugey, Christine Guillemot

Title: Rate-distortion optimized graph-based representation for multiview images with complex camera configurations accepted in IEEE Transactions on Image Processing, 2017

Abstract: Graph-Based Representation (GBR) has recently been proposed for describing color and geometry of multiview video content. The graph vertices represent the color information, while the edges represent the geometry information, i.e., the disparity, by connecting corresponding pixels in two camera views. In this paper, we generalize the GBR to multiview images with complex camera configurations. Compared with the existing GBR, the proposed representation can handle not only horizontal displacements of the cameras but also forward/backward translations, rotations, etc. However, contrary to the usual disparity that is a 2-dimensional vector (denoting horizontal and vertical displacements), each edge in GBR is represented by a one-dimensional disparity. This quantity can be seen as the disparity along an epipolar segment. In order to have a sparse (i.e., easy to code) graph structure, we propose a rate-distortion model to select the most meaningful edges. Hence the graph is constructed with « just enough » information for rendering the given predicted view. The experiments show that the proposed GBR allows high reconstruction quality with lower or equivalent coding rate compared with traditional depth-based representations.

[Feb 2017] New SPL submission

Authors: Thomas Maugey, Xin Su, Christine Guillemot
Title: Reference camera selection for virtual view synthesis, submitted to IEEE Signal Processing Letters
Abstract: View synthesis using image-based rendering algorithms relies on one or more reference images. The latter has to be as close as possible than the virtual view that is generated. The notion of “closeness” is straightforward when the virtual view is parallel to the reference ones. Indeed the geometrical transformation between the cameras is a simple translation, whose amplitude can be naturally measured by a norm metric. However, we show in this paper that when the camera trajectory becomes general (i.e., translation and rotation are involved), no intuitive distance metric exists. In that case, choosing the best reference camera for view synthesis becomes a difficult problem. Some similarity metrics have been proposed in the literature, but they rely on the scene, and are thus complex to calculate. In this paper, we propose a distance metric that only relies on the camera parameters, and that is thus very simple to compute. We then use that distance to formulate and solve a reference camera selection problem in a general camera configuration. The obtained results show that our distance leads to an efficient and accurate choice of the reference views compared to a “naive” euclidian distance between camera parameters.

[Feb 2017] Two ICIP papers submitted

Two papers have been submitted to ICIP 2017:

Authors: Navid Mahmoudian Bidgoli, Thomas Maugey, Aline Roumy
Title: Correlation Model Selection for interactive video communication
Abstract: Interactive video communication has been recently proposed for multi-view videos. In this scheme, the server has to store the views as compact as possible, while being able to transmit them independently to the users, who are allowed to navigate interactively among the views, hence requesting a subset of them. To achieve this goal, the compression must be done using a model-based coding in which the correlation between the predicted view generated on the user side and the original view has to be modeled by a statistical distribution. In this paper we propose a framework for lossless coding to select a model among a candidate set of models that incurs the lowest extra rate cost to the system. Moreover, in cases where the depth image is available, we provide a method to estimate the correlation model.

Authors: Xin Su, Mira Rizkallah, Thomas Maugey, Christine Guillemot
Title: Graph-based light fields representation and coding using geometry information (webpage)
Abstract: This paper describes a graph-based coding scheme for light fields (LF). It first adapts graph-based representations (GBR) to describe color and geometry information of LF. Graph connections describing scene geometry capture inter-view dependencies. They are used as the support of a weighted Graph Fourier Transform (wGFT) to encode disoccluded pixels. The quality of the LF reconstructed from the graph is enhanced by adding extra color information to the representation for a sub-set of sub-aperture images. Experiments show that the proposed scheme yields rate-distortion gains compared with HEVC based compression (directly compressing the LF as a video sequence by HEVC).

 

[Dec. 2016] New TIT submission

Authors: Elsa Dupraz, Thomas Maugey, Aline Roumy, Michel Kieffer

Title: Rate-storage regions for Massive Random Access submitted for review to IEEE Transactions on Information Theory (version ArXiv)

Abstract: This paper introduces a new source coding paradigm called Massive Random Access (MRA). In MRA, a set of correlated sources is jointly encoded and stored on a server, and clients want to access to only a subset of the sources. Since the number of simultaneous clients can be huge, the server is only authorized to extract a bitstream from the stored data: no re-encoding can be performed before the transmission of the specific client’s request. In this paper, we formally define the MRA framework and we introduce the notion of rate-storage region to characterize the performance of MRA. From an information theoretic analysis, we derive achievable rate-storage bounds for lossless source coding of i.i.d. and non i.i.d. sources, and rate-storage distortion regions for Gaussian sources. We also show two practical implementations of MRA systems based on rate-compatible LDPC codes. Both the theoretical and the experimental results demonstrate that MRA systems can reach the same transmission rates as in traditional point to point source coding schemes, while having a reasonable storage cost overhead. These results constitute a breakthrough for many recent data transmission applications in which only a part of the data is requested by the clients.

[Jul. 2016] New TIP submission

Authors: Xin Su, Thomas Maugey, Christine Guillemot

Title: Graph-based representation for multiview images with complex camera configurations submitted for review to IEEE Transactions on Image Processing

Abstract: Graph-Based Representation (GBR) has recently been proposed for describing color and geometry of multiview video content. The graph vertices represent the color information, while the edges represent the geometry information, i.e., the disparity, by connecting corresponding pixels in two camera views. In this paper, we generalize the GBR to multiview images with complex camera configurations. Compared with the existing GBR, the proposed representation can handle not only horizontal displacements of the cameras but also forward/backward translations, rotations, etc. However, contrary to the usual disparity that is a 2-dimensional vector (denoting horizontal and vertical displacements), each edge in GBR is represented by a one-dimensional disparity. This quantity can be seen as the disparity along an epipolar segment. In order to have a sparse (i.e., easy to code) graph structure, we propose a rate-distortion model to select the most meaningful edges. Hence the graph is constructed with “just enough” information for rendering the given predicted view. The experiments show that the proposed GBR allows high reconstruction quality with lower or equivalent coding rate compared with traditional depth-based representations.