All the tools are freely available on various github repositories. I have been involved in their development in various ways, either as a supervisor or as a developer.
They are all licensed, mainly under the GNU-Affero public license.
De novo variant predictions
Following tools are related to the de novo detection of biological variants from raw sequencing data.
- DiscoSnp (2014) andDiscoSnp++ (2016)
- Detection of SNPs (DiscoSnp) and later of SNPs + insertions and deletions (discoSnp++).
- Kissplice (2012)
- Detection of alternative splicing events from raw transcriptomics reads.
- TakeABreak (2014)
- Detection of inversion breakpoints from raw transcriptomics reads.
Long reads analyses
- Carnac-LR (2018)
- Clusterization of transcriptomics long reads for detection and quantification of gene isoforms
Data structures for indexation
We proposed a new MPHF library (BBHash) on the top of which we designed a new user-friendly tool (Quasi-dictionary) for general read similarity detection within or across read sets (SRC)
- BBHash (2016)
- Scalable MPHF in term of execution time and memory & disk footprints. First able to index 1000 billion elements.
- Quasi-Dictionary (2017)
- Indexing data structure using BBhash, that scales to billions of element
- SRC (2017)
- Using the quasi-dictionary, this tool enables to find similar reads within a controlled alignment-free approach, between read sets or inside a read set.
de novo comparative metagenomics
The three following tools were developed for de novo comparative metagenomics computations. Each version is not only an improvement of the previous one but proposes new features. I fully implemented the two first versions and I supervised and guided the Simka’s implementation.
- Compareads (2012)
- Pairwise comparisons of raw metagenomic read sets. Based on a new data structure extending the Bloom Filter. Outputs similar reads between two read sets.
- Commet (2014)
- Extends Compareads: n.ncomparisons of raw metagenomic read sets. Factorization of all redundant computations. Offers a way to quickly and easily compute bitwise operations between obtained results. This enables, for instance, to recover reads from one set that are similar with two others read sets.
- Proposes only similarity metrics (Jaccard, Bray Curtis, …) between nread sets. Finding similar reads is not possible (conversely to Compareads and Commet) but scales much larger datasets: it was applied to all Tara ocean data within a few days.
NGS data analyses
- Read and Paired-end mapping on the de Bruijn graph. This is a fundamental feature required for various downstream analyses (quantification, read correction, phasing).
- BCool (2017)
- Using Bgreat: correction of NGS reads by mapping them on their own de Bruijngraph
- Using Bgreat: improve short read assembly by phasingvariants from same haplotypes, that belong to the same short reads.
- Local short read assembler. Given one or several starting sequences, it locally assembles surrounding sequences. It offers the way to visualize locally the assembly graph.
- Seed-and-extend mapping tool, able to map bisulfite-converted sequences. Core tool that has been used for highlighting miRNA methylation (work under-review in Genome Biology)