All the tools are freely available on various github repositories. I have been involved in their development in various ways, either as a supervisor or as a developer.
They are all licensed, mainly under the GNU-Affero public license.
Data structures for indexation
- kmindex (2023 – development by Téo Lemane)
- Indexing massive bloom filters. Enabled the creation of the ORA server.
- Fimpera (2023 – development by Lucas Robidou)
- Reduction of false positives and overestimations in counting bloom filters (or any counting Approximate Membership Query data structure)
- Findere (2021 – development by Lucas Robidou)
- Reduction of false positives in bloom filters (or any Approximate Membership Query data structure)
- kmtricks (2020 – development by Téo Lemane)
- Counting kmers, constructing bloom filters or counted kmer matrices from large and numerous read sets.
- BBHash (2016)
- Scalable MPHF in term of execution time and memory & disk footprints. First able to index 1000 billion elements.
- Quasi-Dictionary (2017)
- Indexing data structure using BBhash, that scales to billions of element
- SRC (2017)
- Using the quasi-dictionary, this tool enables to find similar reads within a controlled alignment-free approach, between read sets or inside a read set.
de novo differential studies
- kmdiff (2023 – development by Téo Lemane)
- Large-scale and user-friendly differential k-mer analyses
de novo variant predictions
Following tools are related to the de novo detection of biological variants from raw sequencing data.
- DiscoSnp-RAD (2020)
- DiscoSnp++ adapted to RAD-seq (including specific variant clustering, and ad-hoc downstream analyses of predictions).
- DiscoSnp (2014) and DiscoSnp++ (2016)
- Detection of SNPs (DiscoSnp) and later of SNPs + insertions and deletions (discoSnp++).
- Kissplice (2012)
- Detection of alternative splicing events from raw transcriptomics reads.
- TakeABreak (2014)
- Detection of inversion breakpoints from raw transcriptomics reads.
Long reads analyses
- Carnac-LR (2018)
- Clusterization of transcriptomics long reads for detection and quantification of gene isoforms
de novo comparative metagenomics
The three following tools were developed for de novo comparative metagenomics computations. Each version is not only an improvement of the previous one but proposes new features. I fully implemented the two first versions and I supervised and guided the Simka’s implementation.
- Compareads (2012)
- Pairwise comparisons of raw metagenomic read sets. Based on a new data structure extending the Bloom Filter. Outputs similar reads between two read sets.
- Commet (2014)
- Extends Compareads: n.ncomparisons of raw metagenomic read sets. Factorization of all redundant computations. Offers a way to quickly and easily compute bitwise operations between obtained results. This enables, for instance, to recover reads from one set that are similar with two others read sets.
- Simka(2016)
- Proposes only similarity metrics (Jaccard, Bray Curtis, …) between nread sets. Finding similar reads is not possible (conversely to Compareads and Commet) but scales much larger datasets: it was applied to all Tara ocean data within a few days.
NGS data analyses
- BGreat (2015-2017)
- Read and Paired-end mapping on the de Bruijn graph. This is a fundamental feature required for various downstream analyses (quantification, read correction, phasing).
- BCool (2017)
- Using Bgreat: correction of NGS reads by mapping them on their own de Bruijngraph
- BWise(2018-2019)
- Using Bgreat: improve short read assembly by phasingvariants from same haplotypes, that belong to the same short reads.
- Mapsembler(2014)
- Local short read assembler. Given one or several starting sequences, it locally assembles surrounding sequences. It offers the way to visualize locally the assembly graph.
RNA methylation
- methMap (2018)
- Seed-and-extend mapping tool, able to map bisulfite-converted sequences. Core tool that has been used for highlighting miRNA methylation (work under-review in Genome Biology)