Learning linguistic models and application to biological sequences

Keywords: Grammatical Inference, Machine Learning, Functions and Structures of Protein Sequences, DNA…


PhD Students

  • Pablo Espana Gutierrez, Learning models with explicit dependencies between residues to predict protein functions, since Sep. 2023
Previous Ph.D. Students

Funded projects

  • Pepper: Vers la nouvelle génération de méthodes d’alignement protéiques avec les modèles de Potts, coordinated by Mathilde Carpentier, Émergence 2021-2022 from Alliance Sorbonne Université
  • IDEALG Seaweed for the future, ANR Investissements d’avenir, Biotechnology and Bioressource
  • Characterization of desaturases with Pleiade team, IPL Algae in silico
  • Grammatical inference methods in classification of amyloidogenic proteins with Politechnika Wroclawska, Polland, funded by Polish National Science Center
  • “Omics”-Line of the Chilean CIRIC-Inria Center
  • PEPS project: Characterisation and identification of viral sequences in marine metagenomes
  • ANR Biotempo: Languages, time representations and hybrid models for the analysis of incomplete models in molecular biology
  • ANR LepidOLF: Microgénomique de la sensille phéromonale d’un lépidoptère : une approche novatrice pour comprendre les mécanismes olfactifs et leur modulation
  • ANR Pelican : Competing for light in the ocean: An integrative genomic approach of the ecology, diversity and evolution of cyanobacterial pigment types in the marine environment
  • Collaboration MINCyT (ex SECyT) – INRIA with the  “Grupo de Procesamiento de Lenguaje Natural ” of Gabriel Infante-Lopez: Modélisation linguistique de séquences génomiques par apprentissage de grammaires
  • ANR Proteus: Reconnaissance de pli et repliement inverse : vers une prédiction à grande échelle des structures de protéines
  • ANR Modulome: Deciphering and modelling the structural organization of genomes

Selected publications

Primers and reviews
Looking at long-distance correlations
With Transformers’ attention
Residues coevolution
Protein sequences and structures
Learning context-free grammars
Learning automata (and partial local multiple alignment applications)

Ph.D. Thesis

Grammatical Inference Benchmarks and Competitions

  • I gathered classical grammatical inference benchmarks in this GIB repository. Don’t hesitate to contribute with your own data sets, especially real-world ones!
  • I set up the Gowachin server, a continuation of the Abbadingo One DFA learning competition, allowing parametrized problems to be generated. I also co-organized Omphalos, the competition on learning context-free languages, which is now over but the data sets are still available… If you are interested in grammatical inference competitions, look at this page.

More complete list of publications here.
This page is updated on an irregular basis: browse HAL for new publications.

Comments are closed.