Objectives

Context

In the last years, we have observed a great popularization of omnidirectional cameras and the development of applications that enable users to watch 360 videos online, on their smartphone, tablet or Virtual Reality glasses. A 360 video captured by an omnidirectional camera, describes a scene from a given position in all the possible directions. The specificity of the acquired video is that pixels are lying on a sphere, which makes the compression tools adopted in the existing codecs inefficient. The development of an efficient 360° video is essential to circumvent the data size explosion. Some coding tools have been proposed in the literature, but they present two main limitations. First, they do not exploit the inherent geometrical structure between the pixels. Second, they have not explored the multi-view compression of 360 cameras, whereas it presents a great interest, for free viewpoint applications for example. The processing and the compression of 360 multi-view videos are tackled in this project GOP, by exploiting the geometry (intra- and inter-views) relations between the pixels.
More precisely, a graph will be defined, in which the pixels are the vertices and their neighborhood in the 3D space is represented by edges. We will then be able to develop fundamental results in Graph Signal Processing with hypothesis that differ from the one usually assumed in the literature. In our case, the graph is fixed (based on geometry rules), and the purpose is to find efficient transform and coding tools for the compression of pixels lying on this irregular structure.

Objectives

The main scientific goal of the project is to investigate new tools for the representation and the coding of multi-view omnidirectional videos systems.
The precise objectives are the following:

1) Build new graph-based representation for 360° multi-view videos:
We will explicit the geometrical relationship between two and more omnidirectional cameras, and build a graph G=(V,E): the vertices V are the pixels with no inter-view redundancy, and the edges E link pixels that represent neighboring 3D points. The edges E represent the geometry information and give fruitful information about the neighborhood, hence correlation, between pixels to code [6]. For that purpose, we will need to better characterize the relationship between pixels thanks to the two following tasks:
– Task 1.1: one omnidirectional camera calibration
– Task 1.2: two omnidirectional cameras calibration
The realization of these two tasks will enable the development of a wide dataset, with multiple omnidirectional calibrated videos. Then, we will be able to build the graph by explicating the relationships between the pixels thanks to edges:
– Task 1.3: graph definition on one view
– Task 1.4: graph definition on two views

2) Develop new fundamental results on graph signal processing:
The specificity of graph-based representations as developed in objective 1) is that a graph is given when developing coding tools. In that case, the question is what is the best graph-based transform for an optimal extraction of the correlation between the pixels. This depends on the model, and this is what will be studied first:
– Task 2.1: find models for pixel correlation on graphs
Based on the model, we can theoretically derive the optimal transform to have the highest energy compaction:
– Task 2.2: find optimal transforms adapted to signal models
However, these models are not always accurate, and we will study the consequences of errors in the signal correlation modeling:
– Task 2.3: determine the performance loss for biased models

3) Development of a coder for omnidirectional multi-camera systems:
Once given a graph and optimal graph transforms, the final objective of the project will be to develop a practical coder for multi-view omnidirectional signals. First, we will implement the transform studied in previous objective:
– Task 3.1: develop spherical transforms for omnidirectional videos
Then, we will extend 2D video coding tools to omnidirectional videos such as motion estimation:
– Task 3.2: build motion estimation tool
– Task 3.3: gather the developed compression tools in one practical coder
After that, we will compare the performance of our coder with some other baseline methods, using the datasets developed in objective 1).