Aline ROUMY: teaching

High Dimensional Statistical Learning (HDL) - Master 2 SIF

Description

This module provides a detailed overview of the mathematical foundations of modern statistical learning by describing the theoretical basis and the conceptual tools needed to analyze and justify the algorithms. The emphasis is on problems involving high volumes of high dimensional datasets, and on dimension reduction techniques allowing to tackle them. The course involves detailed proofs of the main results and associated exercices.

Keywords

PAC (probably approximately correct), random projection, PCA (principal component analysis), concentration inequalities, measures of statistical complexity.

Prerequisites

The prerequisites for this course include previous coursework in linear algebra, multivariate calculus, basic probability (continuous and discrete) and statistics. Previous coursework in convex analysis, information theory, and optimization theory would be helpful but is not required. Students are expected to be able to follow a rigorous proof.

Content

The PAC framework (probably approximately correct) for statistical learning
Measuring the complexity of a statistical learning problem
Dimension reduction
Sparsity and convex optimization for large scale learning (time allowing)
Notion of algorithmic stability (time allowing)

Acquired skills

Understanding the links between complexity and overfitting
Knowing the mathematical tools to measure learning complexity
Understanding the statistical and algorithmic stakes of large-scale learning
Understanding dimension reduction tools for learning

Teachers

Aline Roumy (responsible), Adrien Saumard, Maël Le Treust (invited teacher).

Course schedule (2021-2022)

The course is scheduled (check detailed times and rooms on ENT (click ISTIC>M2 SIF))

Tuesday 8:00-10:00,
Wednesday 14:00-16:00

Detailed schedule

16/11, 17/11, 23/11, 24/11, 30/11, 01/12, (bat2 B, room E 208) Adrien Saumard
07/12, 08/08, 14/12, 15/12 NO COURSE, room available to prepare oral presentation
04/01 (bat 12D-i57), 05/01 (bat 12D-i57), 11/01 (bat 12D-i53), Aline Roumy
12/01 (bat 12D-i55), Maël Le Treust
13/01 8:00-10:00, (bat 12D-i60), oral presentation
19/01 14:00-16:00, (bat 12D-i60), written exam

Evaluation

Modalities: read carefully the recommendation in (PDF)
Dates and link to the documents:

Oral presentation on 13/01/2022 THURSDAY 8:00-10:00. B12D-i-60.
- Student group and chosen article to be sent by email to the teachers before Monday 29/11/21 5:00pm.
- Summary to be sent by email to the teachers before Friday 07/01/2022 5:00pm.
- Slides to be sent by email to the teachers before Wednesday 12/01/2022 5:00pm.
Each group of students will have to present the content of one item from the list below
- chapter 11 Model selection and validation of the Book by Shai Shalev-Shwarz & Shai Ben-David, Understanding Machine Learning
- chapter 14 Stochastic Gradient Descent of the Book by Shai Shalev-Shwarz & Shai Ben-David, Understanding Machine Learning
- Léon Bottou and Olivier Bousquet: The Tradeoffs of Large Scale Learning, Optimization for Machine Learning, 351-368, Edited by Suvrit Sra, Sebastian Nowozin and Stephen J. Wright, MIT Press, 2011. (PDF)
Written exam on 19/01/2022 WEDNESDAY 14:00-16:00. Room B02B-E208.

Some references

Chapter of the book of Martin Wainwright (concentration inequalities)
Book by Shai Shalev-Shwarz & Shai Ben-David, Understanding Machine Learning
Book by Roman Vershynin, High-Dimensional Probability – An Introduction with Applications in Data Science

Course Notes

Aline Roumy, Slides for Dimensionality Reduction (compressive sensing), 2022.
Aline Roumy, Quiz with solution (for Question 9), 2022.
Mael Le Treust, Course Notes on finite length Information theory, 2021.