The 27th of June, I have defended my Habilitation à Diriger des Recherches (HDR). Thanks again to the jury members for their time and feedbacks.
The defense is available for replay at the following link: https://youtu.be/jQKk6xfEdCo
Title: Visual data compression: beyond conventional approaches
– Frédéric DUFAUX, DR CNRS, (reviewer)
– Enrico MAGLI, Prof. Politecnico di Torino (reviewer)
– Marta MRAK, Prof. Queen Mary University & BBC (reviewer)
– Marc ANTONINI, DR CNRS
– Luce MORIN, Prof. INSA
– Dong TIAN, Senior Scientist, InterDigital
Authors: Tom Bachard, Anju Jose Tom, Thomas Maugey
Title: Semantic Alignment for Multi-Item Compression
Abstract: Coding algorithms usually compress independently the images of a collection, in particular when the correlation be tween them only resides at the semantic level (information related to the high-level image content). In this work, we propose a coding solution able to exploit this semantic redundancy to decrease the storage cost of a data collection. First we introduce the multi-item compression framework. Then we derive a loss term to shape the latent space of a variational auto-encoder so that the latent vectors of semantically identical images can be aligned. Finally, we experimentally demonstrate that this alignment leads to a more compact representation of the data collection.
Title: Omni-NeRF: Neural Radiance Field from 360° image captures
Authors: K. Gu, T. Maugey, S. Knorr, C. Guillemot
Abstract: This paper tackles the problem of novel view synthesis (NVS) from 360° images with imperfect camera poses or intrinsic parameters. We propose a novel end-to-end framework for training Neural Radiance Field (NeRF) models given only 360° RGB images and their rough poses, which we refer to as Omni-NeRF. We extend the pinhole camera model of NeRF to a more general camera model that better fits omni-directional fish-eye lenses.
The approach jointly learns the scene geometry and optimizes the camera parameters without knowing the fisheye projection.
Title: Machine learning for multimedia communications
Authors: Nikolaos Thomos, Thomas Maugey, Laura Toni
Abstract: Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been reached all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user’s perception modeling have widely benefited from the recent
learning-oriented developments. However, learning-based algorithms often imply drastic changes on the way data is represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all over the transmission chain, and we discuss their potential impact and the research challenges that they raise.
Grant: Young scientist ANR funding
Leader: Thomas Maugey
Abstract: Compression algorithms are nowadays overwhelmed by the tsunami of visual data created everyday. Despite a growing efficiency, they are always constrained to minimize the compression error, computed in the pixel domain. The Data Repurposing framework, proposed in the MADARE project, will tear down this barrier, by allowing the compression algorithm to “reinvent” part of the data at the decoding phase, and thus saving a lot of bit-rate by not coding it. Concretely, a data collection is only encoded to a compact description that is used to guarantee that the regenerated content is semantically coherent with the initial one. By revisiting the compression problem, the MADARE project aims gigantic compression ratios enabling, among other benefits, to reduce the impact of exploding data creation on the cloud servers’ energy consumption.
More infos: here
Title: Immersive Video Coding: Should Geometry Information be Transmitted as Depth Maps?
Authors: P. Garus, F. Henry, J. Jung, T. Maugey, C. Guillemot
Abstract: Immersive video often refers to multiple views with texture and scene geometry information, from which different viewports can be synthesized on the client side. To design efficient immersive video coding solutions, it is desirable to minimize bitrate, pixel rate and complexity. We investigate whether the classical approach of sending the geometry of a scene as depth maps is appropriate to serve this purpose. Previous work shows that bypassing depth transmission entirely and estimating depth at the client side improves the synthesis performance while saving bitrate and pixel rate. In order to understand if the encoder side depth maps contain information that is beneficial to be transmitted, we first explore a hybrid approach which enables partial depth map transmission using a block-based RD-based decision in the depth coding process.
This approach reveals that partial depth map transmission may improve the rendering performance but does not present a good compromise in terms of compression efficiency. This led us to address the remaining drawbacks of decoder side depth estimation: complexity and depth map inaccuracy. We propose a novel system that takes advantage of high quality depth maps at the server side by encoding them into lightweight features that support the depth estimator at the client side. These features allow reducing the amount of data that has to be handled during decoder side depth estimation by 88%, which significantly speeds up the cost computation and the energy minimization of the depth estimator. Furthermore, -46.0% and -37.9% average synthesis BD-Rate gains are achieved compared to the classical approach with depth maps estimated at the encoder.
Title: Rate-Distortion Optimized Graph Coarsening and Partitioning for Light Field Coding
Authors: M. Rizkallah, T. Maugey, C. Guillemot
Abstract: Graph-based transforms are powerful tools for signal representation and energy compaction. However, their usefor high dimensional signals such as light fields poses obvious problems of complexity. To overcome this difficulty, one canconsider local graph transforms defined on supports of limited dimension, which may however not allow us to fully exploit long-term signal correlation. In this paper, we present methods to optimize local graph supports in a rate distortion sense for efficient light field compression. A large graph support can be well adapted for compression efficiency, however at the expense of high complexity. In this case, we use graph reduction technique sto make the graph transform feasible. We also consider spectral clustering to reduce the dimension of the graph supports while controlling both rate and complexity. We derive the distortion andrate models which are then used to guide the graph optimization.We describe a complete light field coding scheme based onthe proposed graph optimization tools. Experimental results show rate-distortion performance gains compared to the useof fixed graph support. The method also provides competitive results when compared against HEVC-based and the JPEG Plenolight field coding schemes. We also assess the method against a homography-based low rank approximation and a Fourier disparity layer based coding method
Title: Rate-distortion optimized motion estimation for on-the-sphere compression of 360 videos
Authors: Alban Marie, Navid Mahmoudian Bidgoli, Thomas Maugey, Aline Roumy
Abstract: On-the-sphere compression of omnidirectional videos is a very promising approach. First, it saves computational complexity as it avoids to project the sphere onto a 2D map, as classically done. Second, and more importantly, it allows to achieve a better rate-distortion tradeoff, since neither the visual data nor its domain of definition are distorted. In this paper, the on-the-sphere compression for omnidirectional still images is extended to videos. We first propose a complete review of existing spherical motion models. Then we propose a new one called tangent-linear+t. We finally propose a rate-distortion optimized algorithm to locally choose the best motion model for efficient motion estimation/compensation. For that purpose, we additionally propose a finer search pattern, called spherical-uniform, for the motion parameters, which leads to a more accurate block prediction. The novel algorithm leads to rate-distortion gains compared to methods based on a unique motion model.
Title: Bit-Plane Coding in Extractable Source Coding: optimality, modeling, and application to 360° data
Authors: Fangping Ye, Navid Mahmoudian Bidgoli, Elsa Dupraz, Aline Roumy, Karine Amis, Thomas Maugey
In extractable source coding, multiple correlated sources are jointly compressed but can be individually accessed in the compressed domain. Performance is measured in terms of storage and transmission rates. This problem has multiple applications in interactive video compression such as Free Viewpoint Television or navigation in 360° videos. In this paper, we analyze and improve a practical coding scheme. We consider a binarized coding scheme, which insures a low decoding complexity. First, we show that binarization does not impact the transmission rate but only slightly the storage with respect to a symbol based approach. Second, we propose a Q-ary symmetric model to represent the pairwise joint distribution of the sources instead of the widely used Laplacian model. Third, we introduce a novel pre-estimation strategy, which allows to infer the symbols of some bit planes without any additional data and therefore permits to reduce the storage and transmission rates. In the context of 360° images, the proposed scheme allows to save 14\% and 34\% bitrate in storage and transmission rates respectively.
Title: Large Database Compression Based on Perceived Information
Authors: Thomas Maugey and Laura Toni
Abstract: Lossy compression algorithms trade bits for quality, aiming at reducing as much as possible the bitrate needed to represent the original source (or set of sources), while preserving the source quality. In this letter, we propose a novel paradigm of compression algorithms, aimed at minimizing the information loss perceived by the final user instead of the actual source quality loss, under compression rate constraints.
As main contributions, we first introduce the concept of perceived information (PI), which reflects the information perceived by a given user experiencing a data collection, and which is evaluated as the volume spanned by the sources features in a personalized latent space.
We then formalize the rate-PI optimization problem and propose an algorithm to solve this compression problem. Finally, we validate our algorithm against benchmark solutions with simulation results, showing the gain in taking into account users’ preferences while also maximizing the perceived information in the feature domain.