The 27th of June, I have defended my Habilitation à Diriger des Recherches (HDR). Thanks again to the jury members for their time and feedbacks.
The defense is available for replay at the following link: https://youtu.be/jQKk6xfEdCo
Title: Visual data compression: beyond conventional approaches
– Frédéric DUFAUX, DR CNRS, (reviewer)
– Enrico MAGLI, Prof. Politecnico di Torino (reviewer)
– Marta MRAK, Prof. Queen Mary University & BBC (reviewer)
– Marc ANTONINI, DR CNRS
– Luce MORIN, Prof. INSA
– Dong TIAN, Senior Scientist, InterDigital
Authors: Tom Bachard, Anju Jose Tom, Thomas Maugey
Title: Semantic Alignment for Multi-Item Compression
Abstract: Coding algorithms usually compress independently the images of a collection, in particular when the correlation be tween them only resides at the semantic level (information related to the high-level image content). In this work, we propose a coding solution able to exploit this semantic redundancy to decrease the storage cost of a data collection. First we introduce the multi-item compression framework. Then we derive a loss term to shape the latent space of a variational auto-encoder so that the latent vectors of semantically identical images can be aligned. Finally, we experimentally demonstrate that this alignment leads to a more compact representation of the data collection.
Title: Omni-NeRF: Neural Radiance Field from 360° image captures
Authors: K. Gu, T. Maugey, S. Knorr, C. Guillemot
Abstract: This paper tackles the problem of novel view synthesis (NVS) from 360° images with imperfect camera poses or intrinsic parameters. We propose a novel end-to-end framework for training Neural Radiance Field (NeRF) models given only 360° RGB images and their rough poses, which we refer to as Omni-NeRF. We extend the pinhole camera model of NeRF to a more general camera model that better fits omni-directional fish-eye lenses.
The approach jointly learns the scene geometry and optimizes the camera parameters without knowing the fisheye projection.
Grant: Young scientist ANR funding
Leader: Thomas Maugey
Abstract: Compression algorithms are nowadays overwhelmed by the tsunami of visual data created everyday. Despite a growing efficiency, they are always constrained to minimize the compression error, computed in the pixel domain. The Data Repurposing framework, proposed in the MADARE project, will tear down this barrier, by allowing the compression algorithm to “reinvent” part of the data at the decoding phase, and thus saving a lot of bit-rate by not coding it. Concretely, a data collection is only encoded to a compact description that is used to guarantee that the regenerated content is semantically coherent with the initial one. By revisiting the compression problem, the MADARE project aims gigantic compression ratios enabling, among other benefits, to reduce the impact of exploding data creation on the cloud servers’ energy consumption.
More infos: here
Title: Immersive Video Coding: Should Geometry Information be Transmitted as Depth Maps?
Authors: P. Garus, F. Henry, J. Jung, T. Maugey, C. Guillemot
Abstract: Immersive video often refers to multiple views with texture and scene geometry information, from which different viewports can be synthesized on the client side. To design efficient immersive video coding solutions, it is desirable to minimize bitrate, pixel rate and complexity. We investigate whether the classical approach of sending the geometry of a scene as depth maps is appropriate to serve this purpose. Previous work shows that bypassing depth transmission entirely and estimating depth at the client side improves the synthesis performance while saving bitrate and pixel rate. In order to understand if the encoder side depth maps contain information that is beneficial to be transmitted, we first explore a hybrid approach which enables partial depth map transmission using a block-based RD-based decision in the depth coding process.
This approach reveals that partial depth map transmission may improve the rendering performance but does not present a good compromise in terms of compression efficiency. This led us to address the remaining drawbacks of decoder side depth estimation: complexity and depth map inaccuracy. We propose a novel system that takes advantage of high quality depth maps at the server side by encoding them into lightweight features that support the depth estimator at the client side. These features allow reducing the amount of data that has to be handled during decoder side depth estimation by 88%, which significantly speeds up the cost computation and the energy minimization of the depth estimator. Furthermore, -46.0% and -37.9% average synthesis BD-Rate gains are achieved compared to the classical approach with depth maps estimated at the encoder.
Title: Rate-Distortion Optimized Graph Coarsening and Partitioning for Light Field Coding
Authors: M. Rizkallah, T. Maugey, C. Guillemot
Abstract: Graph-based transforms are powerful tools for signal representation and energy compaction. However, their usefor high dimensional signals such as light fields poses obvious problems of complexity. To overcome this difficulty, one canconsider local graph transforms defined on supports of limited dimension, which may however not allow us to fully exploit long-term signal correlation. In this paper, we present methods to optimize local graph supports in a rate distortion sense for efficient light field compression. A large graph support can be well adapted for compression efficiency, however at the expense of high complexity. In this case, we use graph reduction technique sto make the graph transform feasible. We also consider spectral clustering to reduce the dimension of the graph supports while controlling both rate and complexity. We derive the distortion andrate models which are then used to guide the graph optimization.We describe a complete light field coding scheme based onthe proposed graph optimization tools. Experimental results show rate-distortion performance gains compared to the useof fixed graph support. The method also provides competitive results when compared against HEVC-based and the JPEG Plenolight field coding schemes. We also assess the method against a homography-based low rank approximation and a Fourier disparity layer based coding method
Title: Rate-distortion optimized motion estimation for on-the-sphere compression of 360 videos
Authors: Alban Marie, Navid Mahmoudian Bidgoli, Thomas Maugey, Aline Roumy
Abstract: On-the-sphere compression of omnidirectional videos is a very promising approach. First, it saves computational complexity as it avoids to project the sphere onto a 2D map, as classically done. Second, and more importantly, it allows to achieve a better rate-distortion tradeoff, since neither the visual data nor its domain of definition are distorted. In this paper, the on-the-sphere compression for omnidirectional still images is extended to videos. We first propose a complete review of existing spherical motion models. Then we propose a new one called tangent-linear+t. We finally propose a rate-distortion optimized algorithm to locally choose the best motion model for efficient motion estimation/compensation. For that purpose, we additionally propose a finer search pattern, called spherical-uniform, for the motion parameters, which leads to a more accurate block prediction. The novel algorithm leads to rate-distortion gains compared to methods based on a unique motion model.
Title: Bit-Plane Coding in Extractable Source Coding: optimality, modeling, and application to 360° data
Authors: Fangping Ye, Navid Mahmoudian Bidgoli, Elsa Dupraz, Aline Roumy, Karine Amis, Thomas Maugey
In extractable source coding, multiple correlated sources are jointly compressed but can be individually accessed in the compressed domain. Performance is measured in terms of storage and transmission rates. This problem has multiple applications in interactive video compression such as Free Viewpoint Television or navigation in 360° videos. In this paper, we analyze and improve a practical coding scheme. We consider a binarized coding scheme, which insures a low decoding complexity. First, we show that binarization does not impact the transmission rate but only slightly the storage with respect to a symbol based approach. Second, we propose a Q-ary symmetric model to represent the pairwise joint distribution of the sources instead of the widely used Laplacian model. Third, we introduce a novel pre-estimation strategy, which allows to infer the symbols of some bit planes without any additional data and therefore permits to reduce the storage and transmission rates. In the context of 360° images, the proposed scheme allows to save 14\% and 34\% bitrate in storage and transmission rates respectively.
Title: Fine granularity access in interactive compression of 360-degree images based on rate adaptive channel codes
Authors: N. Mahmoudian-Bidgoli, T. Maugey, A. Roumy
Abstract: In this paper, we propose a new interactive compression scheme for omnidirectional images. This requires two characteristics: efficient compression of data, to lower the storage cost, and random access ability to extract part of the compressed stream requested by the user (for reducing the transmission rate). For efficient compression, data needs to be predicted by a series of references that have been pre-defined and compressed. This contrasts with the spirit of random accessibility. We propose a solution for this problem based on incremental codes implemented by rate adaptive channel codes. This scheme encodes the image while adapting to any user request and leads to an efficient coding that is flexible in extracting data depending on the available information at the decoder. Therefore, only the information that is needed to be displayed at the users side is transmitted during the user’s request as if the request was already known at the encoder. The experimental results demonstrate that our coder obtains better transmission rate than the state-of-the-art tile-based methods at a small cost in storage. Moreover, the transmission rate grows gradually with the size of the request and avoids a staircase effect, which shows the perfect suitability of our coder for interactive transmission.
Title: Optimal reference selection for random access in predictive coding schemes
Authors: M. Q. Pham, A. Roumy, T. Maugey, E. Dupraz, M. Kieffer
Abstract: Data acquired over long periods of time like High Definition (HD) videos or records from a sensor over long time intervals, have to be efficiently compressed, to reduce their size. The compression has also to allow efficient access to random parts of the data upon request from the users. Efficient compression is usually achieved with prediction between data points at successive time instants. However, this creates dependencies between the compressed representations, which is contrary to the idea of random access. Prediction methods rely in particular on reference data points, used to predict other data points, and the placement of these references balances compression efficiency and random access. Existing solutions to position the references use ad hoc methods. In this paper, we study this joint problem of compression efficiency and random access. We introduce the storage cost as a measure of the compression efficiency and the transmission cost for the random access ability. We show that the reference placement problem that trades off storage with transmission cost is an integer linear programming problem, that can be solved by standard optimizer. Moreover, we show that the classical periodic placement of the references is optimal, when the encoding costs of each data point are equal and when requests of successive data points are made. In this particular case, a closed form expression of the optimal period is derived. Finally, the optimal proposed placement strategy is compared with an ad hoc method, where the references correspond to sources where the prediction does not help reducing significantly the encoding cost. The optimal proposed algorithm shows a bit saving of -20% with respect to the ad hoc method.