[Jun. 24] New journal article accepted

Title: Can image compression rely on CLIP?

Authors: Tom Bachard, Thomas Maugey

Abstract: Coding algorithms are usually designed to faithfully reconstruct images, which limits the expected gains in compression. A new approach based on generative models allows for new compression algorithms that can reach drastically lower compression rates. Instead of pixel fidelity, these algorithms aim at faithfully generating images that have the same high-level interpretation as their inputs. In that context, the challenge becomes to set a good representation for the semantics of an image. While text or segmentation maps have been investigated and have shown their limitations, in this paper, we ask the following question: do powerful foundation models such as CLIP provide a semantic description suited for compression? By suited for compression, we mean that this description is robust to traditional compression tools and, in particular, quantization. We show that CLIP fulfills semantic robustness properties. This makes it an interesting support for generative compression. To make that intuition concrete, we propose a proof-of-concept for a generative codec based on CLIP. Results demonstrate that our CLIP-based coder beats state-of-the-art compression pipelines at extremely low bitrates (0.0012 BPP), both in terms of image quality (65.3 for MUSIQ) and semantic preservation (0.86 for the Clip score).

[Feb 24] One paper accepted at PCS 2024

Title: CoCliCo: Extremely low bitrate image compression based on CLIP semantic and tiny color map

Authors: Tom Bachard*, Tom Bordin*, Thomas Maugey
* authors contributed equally

Abstract: Coding algorithms are usually designed to pixel-wisely reconstruct images, which limits the expected gains in terms of compression. In this work, we introduce a semantic compressed representation for images: CoCliCo. We encode the inputs into a CLIP latent vector and a tiny color map, and we use a conditional diffusion model for reconstruction. When compared to the most recent traditional and generative coders, our approach reaches drastic compression gains while keeping most of the high-level information and a good level of realism.

[Jul 23] Two MMSP accepted

Title: Semantic based generative compression of images at extremely low bitrates

Authors: T. Bordin, T. Maugey

Abstract:

We propose a framework for image compression in which the fidelity criterion is replaced by a semantic and quality preservation objective. Encoding the image thus becomes a simple extraction of semantic enabling to reach drastic compression ratio. The decoding side is handled by a generative model relying on the diffusion process for the reconstruction of images. We first propose to describe the semantic using low resolution
segmentation maps as guide. We further improve the generation introducing colors map guidance without retraining the generative decoder. We show that it is possible to produce images of high visual quality with preserved semantic at extremely low bitrates when compared with classical codecs.

*******************

Title: Towards digital sobriety: why improving the energy efficiency of video streaming is not enough

Author: T. Maugey

Abstract: IPCC conclusions are unequivocal: we must divide our greenhouse gas emissions by two before 2030 if we want to maintain the global warming below 1.5°C in 2100. Hence, it becomes urgent to aim sobriety. Contrary to what is often claimed, digital technologies must also target global emission reduction, as their impact on the climate is huge and exploding every year. Among these digital technologies greenhouse gas emissions, those related to video processing and streaming are ones of the biggest. At the same time, a lot of research effort is currently done to reduce the energy consumed by algorithms or infrastructure. In this paper, we demonstrate that, even though such research works are crucial, they are not sufficient to enable global emissions reductions. The conclusion is that we must collectively think of other complementary solutions.

[Sep 22] New IEEE TIP accepted

Title: OSLO: On-the-Sphere Learning for Omnidirectional images and its application to 360-degree image compression

Authors: N. Mahmoudian Bidgoli, R. Azevedo, T. Maugey, A. Roumy, P. Frossard

Abstract : State-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. First, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not well-defined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold.

Shared source code : here

[Jun 22] HDR completed

The 27th of June, I have defended my Habilitation à Diriger des Recherches (HDR). Thanks again to the jury members for their time and feedbacks.

The defense is available for replay here or here.

The slides are available here
The manuscript is available here

Title: Visual data compression: beyond conventional approaches

Jury:

– Frédéric DUFAUX, DR CNRS, (reviewer)
– Enrico MAGLI, Prof. Politecnico di Torino (reviewer)
– Marta MRAK, Prof. Queen Mary University & BBC (reviewer)
– Marc ANTONINI, DR CNRS
– Luce MORIN, Prof. INSA
– Dong TIAN, Senior Scientist, InterDigital

[Jun 22] new ICIP paper accepted

Authors: Tom Bachard, Anju Jose Tom, Thomas Maugey

Title: Semantic Alignment for Multi-Item Compression

Abstract: Coding algorithms usually compress independently the images of a collection, in particular when the correlation be tween them only resides at the semantic level (information related to the high-level image content). In this work, we propose a coding solution able to exploit this semantic redundancy to decrease the storage cost of a data collection. First we introduce the multi-item compression framework. Then we derive a loss term to shape the latent space of a variational auto-encoder so that the latent vectors of semantically identical images can be aligned. Finally, we experimentally demonstrate that this alignment leads to a more compact representation of the data collection.

[Mar 22] New ICME paper accepted

Title: Omni-NeRF: Neural Radiance Field from 360° image captures

Authors: K. Gu, T. Maugey, S. Knorr, C. Guillemot

Abstract: This paper tackles the problem of novel view synthesis (NVS) from 360° images with imperfect camera poses or intrinsic parameters. We propose a novel end-to-end framework for training Neural Radiance Field (NeRF) models given only 360° RGB images and their rough poses, which we refer to as Omni-NeRF. We extend the pinhole camera model of NeRF to a more general camera model that better fits omni-directional fish-eye lenses.
The approach jointly learns the scene geometry and optimizes the camera parameters without knowing the fisheye projection.

[Jul 21] “maDARE” ANR Project accepted

Grant: Young scientist ANR funding

Leader: Thomas Maugey

Abstract: Compression algorithms are nowadays overwhelmed by the tsunami of visual data created everyday. Despite a growing efficiency, they are always constrained to minimize the compression error, computed in the pixel domain. The Data Repurposing framework, proposed in the MADARE project, will tear down this barrier, by allowing the compression algorithm to “reinvent” part of the data at the decoding phase, and thus saving a lot of bit-rate by not coding it. Concretely, a data collection is only encoded to a compact description that is used to guarantee that the regenerated content is semantically coherent with the initial one. By revisiting the compression problem, the MADARE project aims gigantic compression ratios enabling, among other benefits, to reduce the impact of exploding data creation on the cloud servers’ energy consumption.

More infos: here


[Jul 21] New TCSVT accepted

Title: Immersive Video Coding: Should Geometry Information be Transmitted as Depth Maps?

Authors: P. Garus, F. Henry, J. Jung, T. Maugey, C. Guillemot

Abstract: Immersive video often refers to multiple views with texture and scene geometry information, from which different viewports can be synthesized on the client side. To design efficient immersive video coding solutions, it is desirable to minimize bitrate, pixel rate and complexity. We investigate whether the classical approach of sending the geometry of a scene as depth maps is appropriate to serve this purpose. Previous work shows that bypassing depth transmission entirely and estimating depth at the client side improves the synthesis performance while saving bitrate and pixel rate. In order to understand if the encoder side depth maps contain information that is beneficial to be transmitted, we first explore a hybrid approach which enables partial depth map transmission using a block-based RD-based decision in the depth coding process.
This approach reveals that partial depth map transmission may improve the rendering performance but does not present a good compromise in terms of compression efficiency. This led us to address the remaining drawbacks of decoder side depth estimation: complexity and depth map inaccuracy. We propose a novel system that takes advantage of high quality depth maps at the server side by encoding them into lightweight features that support the depth estimator at the client side. These features allow reducing the amount of data that has to be handled during decoder side depth estimation by 88%, which significantly speeds up the cost computation and the energy minimization of the depth estimator. Furthermore, -46.0% and -37.9% average synthesis BD-Rate gains are achieved compared to the classical approach with depth maps estimated at the encoder.

[May 21] New TIP journal accepted

Title: Rate-Distortion Optimized Graph Coarsening and Partitioning for Light Field Coding

Authors: M. Rizkallah, T. Maugey, C. Guillemot

Abstract: Graph-based transforms are powerful tools for signal representation and energy compaction. However, their usefor high dimensional signals such as light fields poses obvious problems of complexity. To overcome this difficulty, one canconsider local graph transforms defined on supports of limited dimension, which may however not allow us to fully exploit long-term signal correlation. In this paper, we present methods to optimize local graph supports in a rate distortion sense for efficient light field compression. A large graph support can be well adapted for compression efficiency, however at the expense of high complexity. In this case, we use graph reduction technique sto make the graph transform feasible. We also consider spectral clustering to reduce the dimension of the graph supports while controlling both rate and complexity. We derive the distortion andrate models which are then used to guide the graph optimization.We describe a complete light field coding scheme based onthe proposed graph optimization tools. Experimental results show rate-distortion performance gains compared to the useof fixed graph support. The method also provides competitive results when compared against HEVC-based and the JPEG Plenolight field coding schemes. We also assess the method against a homography-based low rank approximation and a Fourier disparity layer based coding method