High Dimensional Statistical Learning (HDL) - Master 2 SIF
Description
This module provides a detailed overview of the mathematical foundations of modern statistical learning by describing the theoretical basis and the conceptual tools needed to analyze and justify the algorithms. The emphasis is on problems involving high volumes of high dimensional datasets, and on dimension reduction techniques allowing to tackle them. The course involves detailed proofs of the main results and associated exercices.
Keywords
PAC (probably approximately correct), random projection, PCA (principal component analysis), concentration inequalities, measures of statistical complexity.
Prerequisites
The prerequisites for this course include previous coursework in linear algebra, multivariate calculus, basic probability (continuous and discrete) and statistics.
Previous coursework in convex analysis, information theory, and optimization theory would be helpful but is not required. Students are expected to be able to follow a rigorous proof.
Content
- The PAC framework (probably approximately correct) for statistical learning
- Measuring the complexity of a statistical learning problem
- Dimension reduction
- Sparsity and convex optimization for large scale learning (time allowing)
- Notion of algorithmic stability (time allowing)
Acquired skills
- Understanding the links between complexity and overfitting
- Knowing the mathematical tools to measure learning complexity
- Understanding the statistical and algorithmic stakes of large-scale learning
- Understanding dimension reduction tools for learning
Teachers
Aline Roumy (responsible),
Adrien Saumard,
Maël Le Treust (invited teacher).
Course schedule (2021-2022)
The course is scheduled (check detailed times and rooms on ENT (click ISTIC>M2 SIF))
- Tuesday 8:00-10:00,
- Wednesday 14:00-16:00
Detailed schedule
- 16/11, 17/11, 23/11, 24/11, 30/11, 01/12, (bat2 B, room E 208) Adrien Saumard
- 07/12, 08/08, 14/12, 15/12 NO COURSE, room available to prepare oral presentation
- 04/01 (bat 12D-i57), 05/01 (bat 12D-i57), 11/01 (bat 12D-i53), Aline Roumy
- 12/01 (bat 12D-i55), Maël Le Treust
- 13/01 8:00-10:00, (bat 12D-i60), oral presentation
- 19/01 14:00-16:00, (bat 12D-i60), written exam
Evaluation
Modalities: read carefully the recommendation in (PDF)
Dates and link to the documents:
- Oral presentation on 13/01/2022 THURSDAY 8:00-10:00. B12D-i-60.
- Student group and chosen article to be sent by email to the teachers before Monday 29/11/21 5:00pm.
- Summary to be sent by email to the teachers before Friday 07/01/2022 5:00pm.
- Slides to be sent by email to the teachers before Wednesday 12/01/2022 5:00pm.
Each group of students will have to present the content of one item from the list below
- chapter 11 Model selection and validation of the Book by Shai Shalev-Shwarz & Shai Ben-David, Understanding Machine Learning
- chapter 14 Stochastic Gradient Descent of the Book by Shai Shalev-Shwarz & Shai Ben-David, Understanding Machine Learning
- Léon Bottou and Olivier Bousquet: The Tradeoffs of Large Scale Learning, Optimization for
Machine Learning, 351-368, Edited by Suvrit Sra, Sebastian Nowozin and Stephen J. Wright, MIT Press, 2011.
(PDF)
- Written exam on 19/01/2022 WEDNESDAY 14:00-16:00. Room B02B-E208.
Some references
Course Notes