Inria research scientist, Dyliss team
Inria, Univ Rennes, CNRS, IRISA
F-35000 Rennes
email: francois.coste@inria.fr
tel: (+33) 2 99 84 74 91
ORCID: 0000-0001-9134-6557
Research topic: computational learning of linguistic models and application to biological sequences
Keywords: Grammatical Inference, Machine Learning, Sequences, Functions, Structures, Protein, DNA…
News and highlights:
- Nov 2024: slides of my talk Enzymatic annotation of protein sequences with a deep language model at IA pour l’annotation des génomes days of MERIT CNRS network
- Our 2021-2022 work with Nicolas Buton and Yann Le Cunff showing the interest of Transformers for the prediction of Enzymes has finally been accepted: Predicting enzymatic function of protein sequences with attention in Bioinformatics, 2023
- Publication, where Partial Local Multiple Alignment of Sequences meets Phylogeny: Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family, with Olivier Dennler, Samuel Blanquart, Catherine Belleannée and Nathalie Théret, in PLOS Computational Biology, 2023
- July 10-13, 2023, Rabat, Morocco: ICGI 2023, the conference on research at the intersection of Machine Learning and Formal Language Theory.
- Publication: PPalign, Potts to Potts alignment of protein sequences taking into account residues coevolution, with Hugo Talibart, in BMC Bioinformatics, 2021 (supplementary material, github)
- Deep learning languages: a key fundamental shift from probabilities to weights? A short “position” paper submitted to Deep Learning and Formal Languages: Building Bridges Workshop at ACL 2019. If you know publications or experiments on the fundamental differences between learning weights and probabilities or simply want to discuss this, please don’t hesitate to drop me a line…
- Slides of my talk at ICGI’18 on “Learning local substitutable context-free languages from positive examples in polynomial time and data by reduction” (with Jacques Nicolas) introducing a general definition of grammars in Reduced Normal Form (RNF) and ReGLiScore, a new algorithm by reduction to learn them efficiently with nice theoretical properties.
- Presentation of my team from an Artificial Intelligence perspective (slides for Artificial Intelligence days organized by Inria RBA and IRISA)
- The chapter Learning the Language of Biological Sequences was published with other nice chapters in the book Topics in Grammatical Inference edited by Jeffrey Heinz and José Sempere. Following a talk given for the 10th anniversary of ICGI (ICGI’10), it reviews advances in modeling biological sequences, from Pattern/Motif Discovery to Grammatical Inference, trying to help intuition with practical examples. Feedback welcome!