SML

Supervised Machine Learning

Course of master SIF

Instructors:

List of students

Tentative schedule

(look also at ADE UR1 Planning for actual room, search for M2 SIF)

1 13/09 16h15-18h15 B02B-E208 F. Coste
Learning and machine learning
A video of the DARPA perspective on IA, cited by this blog post
2 15/09 16h15-18h15 B02B-E208 F. Coste Methodology
slides, 8pp for L1 and L2
Notebook: Name gender prediction
3 20/09 16h15-18h15 B02B-E208 F. Coste Text classification (slides, 8pp)
Practice: SMS Spam classification
Notebooks: requirementsskeleton, complete
4 22/09 16h15-18h15 B02B-E208 F. Coste Practice: parameter search, natural language representation
Formal languages learning
Learning automata from positive and negative examples
5 27/09 16h15-18h15 B02B-E208 F. Coste Learning languages from positive examples
slides, 8pp for L4 and L5

New! Article to read and understand before final exam : Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition by P. García and E. Vidal
If you want to use this algorithm in your project, the corresponding C code seems to be in kTs.tgz from this directory.

 – 29/09
6 4/10 16h15-18h15 B02B-E208 E. Kijak Decision trees (slides 6pp)
7 6/10 16h15-18h15 B02B-E208 E. Kijak
Logistic regression (slides 6pp)
Neural networks (slides 6pp)
8 11/10 16h15-18h15 B02B-E208 E. Kijak Support Vector Machine (slides 6pp)
9 13/10 16h15-18h15 B02B-E208 E. Kijak Model combination (slides 6pp)
10 18/10 16h15-18h15 B02B-E208 E. Kijak Bayesian learning (slides 6pp)
27/10
16h15-18h15
B02B-E208
E. Kijak
F. Coste
Projects
8/11
16h15-18h15
B02B-E208
E. Kijak
F. Coste
Exam

We will use:

and the following datasets:

Run fo instance (conda might be an alternative): pip3 install -U numpy, scipy, scikit-learn, nltk, pandas
Then download this notebook and run it to check that required packages are available on on your computer and download in advance useful resources.

Many other datasets are available to play with. See for instance UCI Machine Learning Repository, Kaggle, Wikipedia list of datasets for ML or Google dataset search engine.

Projects:

  • General goal: Implement a learning process using the methodology and the methods seen during the module
  • Subject: Compare and study influence of the different choices that can be made to tackle one of the machine learning task proposed here.
  • Instructions: Perform a rigourous and reproducible comparative study of at least 3 learning approaches on the chosen task.
    The learning approaches should be evaluated. The study should explain the choice of representations, as well as parameters of the learning approaches. You should provide a short analysis and discussion of the results. We expect conclusions on the pros and cons (learning quality, required amount of data or computing resources, …) of each learning approaches on the chosen task.Master 2 students will be asked to go one step further by exploring one (or several) aspect(s) of their choice, such as the impact of representation, noise in the data, the number of learning examples, the influence of some parameter, the study of another learning strategy (like semi-supervised)…
  • Deliverable: Jupyter notebook presenting and enabling to reproduce experiments (notebook file or link to it + pdf export) + short report in pdf format + 15mn presentation with slides.
    The report and presentation should contain the presentation of the task, the methodology, the experiments and the results (under the form of a table), the short analysis and conclusions, in a synthetic, precise and clear manner.
    If you use code from others, don’t forget to acknowledge the source…
  • 2-3 person / group

Assessment:

60% written exam (notes taken during lectures and slides allowed), 40% project

Excerpt of 2016 exam

Comments are closed.