[Jun. 24] New journal article accepted

Title: Can image compression rely on CLIP?

Authors: Tom Bachard, Thomas Maugey

Abstract: Coding algorithms are usually designed to faithfully reconstruct images, which limits the expected gains in compression. A new approach based on generative models allows for new compression algorithms that can reach drastically lower compression rates. Instead of pixel fidelity, these algorithms aim at faithfully generating images that have the same high-level interpretation as their inputs. In that context, the challenge becomes to set a good representation for the semantics of an image. While text or segmentation maps have been investigated and have shown their limitations, in this paper, we ask the following question: do powerful foundation models such as CLIP provide a semantic description suited for compression? By suited for compression, we mean that this description is robust to traditional compression tools and, in particular, quantization. We show that CLIP fulfills semantic robustness properties. This makes it an interesting support for generative compression. To make that intuition concrete, we propose a proof-of-concept for a generative codec based on CLIP. Results demonstrate that our CLIP-based coder beats state-of-the-art compression pipelines at extremely low bitrates (0.0012 BPP), both in terms of image quality (65.3 for MUSIQ) and semantic preservation (0.86 for the Clip score).

[Feb 24] One paper accepted at PCS 2024

Title: CoCliCo: Extremely low bitrate image compression based on CLIP semantic and tiny color map

Authors: Tom Bachard*, Tom Bordin*, Thomas Maugey
* authors contributed equally

Abstract: Coding algorithms are usually designed to pixel-wisely reconstruct images, which limits the expected gains in terms of compression. In this work, we introduce a semantic compressed representation for images: CoCliCo. We encode the inputs into a CLIP latent vector and a tiny color map, and we use a conditional diffusion model for reconstruction. When compared to the most recent traditional and generative coders, our approach reaches drastic compression gains while keeping most of the high-level information and a good level of realism.

[Jul 23] Two MMSP accepted

Title: Semantic based generative compression of images at extremely low bitrates

Authors: T. Bordin, T. Maugey

Abstract:

We propose a framework for image compression in which the fidelity criterion is replaced by a semantic and quality preservation objective. Encoding the image thus becomes a simple extraction of semantic enabling to reach drastic compression ratio. The decoding side is handled by a generative model relying on the diffusion process for the reconstruction of images. We first propose to describe the semantic using low resolution
segmentation maps as guide. We further improve the generation introducing colors map guidance without retraining the generative decoder. We show that it is possible to produce images of high visual quality with preserved semantic at extremely low bitrates when compared with classical codecs.

*******************

Title: Towards digital sobriety: why improving the energy efficiency of video streaming is not enough

Author: T. Maugey

Abstract: IPCC conclusions are unequivocal: we must divide our greenhouse gas emissions by two before 2030 if we want to maintain the global warming below 1.5°C in 2100. Hence, it becomes urgent to aim sobriety. Contrary to what is often claimed, digital technologies must also target global emission reduction, as their impact on the climate is huge and exploding every year. Among these digital technologies greenhouse gas emissions, those related to video processing and streaming are ones of the biggest. At the same time, a lot of research effort is currently done to reduce the energy consumed by algorithms or infrastructure. In this paper, we demonstrate that, even though such research works are crucial, they are not sufficient to enable global emissions reductions. The conclusion is that we must collectively think of other complementary solutions.

[June 23] One EUSIPCO accepted

Title:A Water-filling Algorithm Maximizing the Volume of Submatrices Above the Rank

Authors: C. Petit, A. Roumy, T. Maugey

Abstract: In this paper, we propose an algorithm to extract, from a given rectangular matrix, a submatrix with maximum volume, whose number of extracted columns is greater than the initial matrix rank. This problem arises in compression and summarization of databases, recommender systems, learning, numerical analysis or applied linear algebra. We use a continuous relaxation of the maximum volume matrix extraction problem, which admits a simple and closed form solution: the nonzero singular values of the extracted matrix must be equal. The proposed algorithm extracts matrices with singular values, which are close to be equal. It is inspired by a water-filling technique, traditionally dedicated to equalization strategies in communication channels. Simulations show that the proposed algorithm performs better than sampling methods based on determinantal point processes (DPPs) and achieves similar performance as the best known algorithm, but with a lower complexity.

[Feb 23] our ICASSP paper accepted

Title: Learning on entropy coded data using CNN

Authors: R. Piau, T. Maugey, A. Roumy

Abstract: We propose an empirical study to see whether learning with convolutional neural networks (CNNs) on entropy coded data is possible. First, we define spatial and semantic closeness, two key properties that we experimentally show to be necessary to guarantee the efficiency of the convolution. Then, we show that these properties are not satisfied by the data processed by an entropy coder. Despite this, our experimental results show that learning in such difficult conditions is still possible, and that the performance are far from a random guess. These results have been obtained thanks to the construction of CNN architectures designed for 1D data (one based on VGG, the other on ResNet). Finally we propose some experiments that explains why CNN are still performing reasonably well on entropy coded data.

[Sep 22] New IEEE TIP accepted

Title: OSLO: On-the-Sphere Learning for Omnidirectional images and its application to 360-degree image compression

Authors: N. Mahmoudian Bidgoli, R. Azevedo, T. Maugey, A. Roumy, P. Frossard

Abstract : State-of-the-art 2D image compression schemes rely on the power of convolutional neural networks (CNNs). Although CNNs offer promising perspectives for 2D image compression, extending such models to omnidirectional images is not straightforward. First, omnidirectional images have specific spatial and statistical properties that can not be fully captured by current CNN models. Second, basic mathematical operations composing a CNN architecture, e.g., translation and sampling, are not well-defined on the sphere. In this paper, we study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. In particular, we: i) propose the definition of a new convolution operation on the sphere that keeps the high expressiveness and the low complexity of a classical 2D convolution; ii) adapt standard CNN techniques such as stride, iterative aggregation, and pixel shuffling to the spherical domain; and then iii) apply our new framework to the task of omnidirectional image compression. Our experiments show that our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images. Also, compared to learning models based on graph convolutional networks, our solution supports more expressive filters that can preserve high frequencies and provide a better perceptual quality of the compressed images. Such results demonstrate the efficiency of the proposed framework, which opens new research venues for other omnidirectional vision tasks to be effectively implemented on the sphere manifold.

Shared source code : here

[Jun 22] HDR completed

The 27th of June, I have defended my Habilitation à Diriger des Recherches (HDR). Thanks again to the jury members for their time and feedbacks.

The defense is available for replay here or here.

The slides are available here
The manuscript is available here

Title: Visual data compression: beyond conventional approaches

Jury:

– Frédéric DUFAUX, DR CNRS, (reviewer)
– Enrico MAGLI, Prof. Politecnico di Torino (reviewer)
– Marta MRAK, Prof. Queen Mary University & BBC (reviewer)
– Marc ANTONINI, DR CNRS
– Luce MORIN, Prof. INSA
– Dong TIAN, Senior Scientist, InterDigital

[Jun 22] new ICIP paper accepted

Authors: Tom Bachard, Anju Jose Tom, Thomas Maugey

Title: Semantic Alignment for Multi-Item Compression

Abstract: Coding algorithms usually compress independently the images of a collection, in particular when the correlation be tween them only resides at the semantic level (information related to the high-level image content). In this work, we propose a coding solution able to exploit this semantic redundancy to decrease the storage cost of a data collection. First we introduce the multi-item compression framework. Then we derive a loss term to shape the latent space of a variational auto-encoder so that the latent vectors of semantically identical images can be aligned. Finally, we experimentally demonstrate that this alignment leads to a more compact representation of the data collection.

[Mar 22] New ICME paper accepted

Title: Omni-NeRF: Neural Radiance Field from 360° image captures

Authors: K. Gu, T. Maugey, S. Knorr, C. Guillemot

Abstract: This paper tackles the problem of novel view synthesis (NVS) from 360° images with imperfect camera poses or intrinsic parameters. We propose a novel end-to-end framework for training Neural Radiance Field (NeRF) models given only 360° RGB images and their rough poses, which we refer to as Omni-NeRF. We extend the pinhole camera model of NeRF to a more general camera model that better fits omni-directional fish-eye lenses.
The approach jointly learns the scene geometry and optimizes the camera parameters without knowing the fisheye projection.

[Jan 22] New journal article accepted in MDPI Sensors

Title: Machine learning for multimedia communications

Authors: Nikolaos Thomos, Thomas Maugey, Laura Toni

Abstract: Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been reached all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user’s perception modeling have widely benefited from the recent
learning-oriented developments. However, learning-based algorithms often imply drastic changes on the way data is represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all over the transmission chain, and we discuss their potential impact and the research challenges that they raise.