Homepage of Claire Lemaitre

All of my bioinformatics software are freely distributed and open-source. Most of them are distributed on Github, see my Github profile page.

List of software:

Variant calling and/or genotyping:

MindTheGap: detection and assembly of insertion variants
https://github.com/GATB/MindTheGap
SVJedi and SVjedi-graph: Structural Variation genotyping with long reads data
https://github.com/llecompte/SVJedi
https://github.com/SandraLouise/SVJedi-graph
DiscoSNP and discoSnpRAD: reference-free small variant (SNPs, indels) detection
https://github.com/GATB/DiscoSnp
LEVIATHAN: SV caller for linked-read data
https://github.com/morispi/LEVIATHAN
TakeABreak (no longer maintained): reference-free inversion detection
https://github.com/GATB/TakeABreak
DrjBreakpointFinder (no longer maintained): Excision site identification in viral sequencing data
https://github.com/stephanierobin/DrjBreakpointFinder

Sequence assembly:

MinYS: targeted genome assembly of bacterial genomes in metagenomics sequencing datasets
https://github.com/cguyomar/MinYS
MTG-link: local assembly and gap-filling with linked-read data
https://github.com/anne-gcd/MTG-Link

Comparative metagenomics with k-mers:

Simka and SimkaMin: massive multiple comparative metagenomics
https://github.com/GATB/simka
Compareads/COMMET (replaced by Simka, no longer maintained): massive comparative metagenomics
https://github.com/pierrepeterlongo/commet

Libraries or management tools for high throughput sequencing data:
- C++ GATB library: short read sequencing data structures
  https://github.com/GATB/gatb-core
- LRez library and toolkit: barcode-based management and indexation of linked-read datasets
  https://github.com/morispi/LRez
- Leon (now fully integrated in GATB): reference-free short read compression
  https://github.com/GATB/leon
- Bloocoo (no longer maintained): reference-free short read correction
  https://github.com/GATB/bloocoo
Comparative genomics:
- Cassis: synteny block and rearrangement breakpoint refinement
  Version 2 (revival 2023):https://github.com/clemaitre/Cassis
  Version 1 (published, 2010):http://pbil.univ-lyon1.fr/software/Cassis/

MindTheGap

MindTheGap is a tool for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads.

MindTheGap is described in [6]. Sources (licensing A-GPL), binaries, documentation and news are available at MindTheGap web page.

Simka and SimkaMin

Simka is a comparative metagenomics method dedicated to NGS datasets. It computes a large collection of distances classically used in ecology to compare communities by approximating species counts by k-mer counts. The method is ultra-fast and can be applied to large metagenomics projects such as TARA oceans or the Human Microbiome Project.

Simka is described in [8]. Sources, binaries, documentation and news are available on Github.

DiscoSnp

DiscoSnp enables to extract small polymorphism (SNPs and indels) in or between read datasets without assembly, nor mapping on a reference genome.

DiscoSnp is described in [4]. Sources, binaries, documentation and news are available at discoSnp web page.

Compareads/COMMET

Compareads is a tool designed to compare and extract similar sequences between two datasets. One important feature of Compareads is its time and memory performances that permit to deal with potentially huge sequence datasets (i.e., hundreds of millions reads per dataset) : it is for instance 30 times faster than the popular Blast. It was notably used for metagenomic analyses.

Compareads is described in [3]. Sources, binaries, documentation and news are available at Compareads web page.

Note : Compareads is no longer maintained, it has been replaced by COMMET.

TakeABreak

TakeABreak is a tool that can detect inversion breakpoints directly from raw NGS reads, without the need of any reference genome and without de novo assembling the genomes. Its implementation is based on the Genome Assembly Tool Box (GATB) library, and has a very limited memory impact allowing its usage on common desktop computers and acceptable runtime (Illumina reads simulated at 2x40x coverage from human chromosome 22 can be treated in less than two hours, with less than 1GB of memory).

TakeABreak is described in [5]. Sources (licensing A-GPL), binaries, documentation and news are available at TakeABreak web page.

Leon

Leon is a software to compress Next Generation Sequencing data. It can compress Fasta or Fastq format. The method does not require any reference genome, instead a reference is built de novo from the set of reads as a probabilist de Bruijn Graph. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. Each read is then encoded as a path in this graph, storing only an anchoring kmer and a list of bifurcations indicating which path to follow in the graph if several are possible.

Leon is described in [7]. Sources (licensing A-GPL), binaries, documentation and news are available at Leon web page.

Cassis

The package Cassis implements methods for precise detection of rearrangement breakpoints in a sequenced genome by comparison with a genome of a related species.

The algorithms and methods are described in Lemaitre et al., 2008 [1]. The implementation of the package is described in Baudet et al., 2010 [2].

Cassis is implemented in Perl and R. The package sources are free and licensed under the GNU General Public License.

The package and documentation can be downloaded at: Cassis webpage.

References

[1] : Precise detection of rearrangement breakpoints in mammalian genomes.C. Lemaitre, E. Tannier, C. Gautier, M.-F. Sagot. BMC Bioinformatics, 2008 9(1):286.

[2] : Cassis: detection of genomic rearrangement breakpoints. C. Baudet, C. Lemaitre, D. Zanoni, C. Gautier, E. Tannier, M.-F. Sagot. Bioinformatics. 2010 26(15):1897-1898.

[3] : Compareads: comparing huge metagenomic experiments. N. Maillet, C. Lemaitre, R. Chikhi, D. Lavenier, P. Peterlongo. BMC Bioinformatics 2012 13 (Suppl 19):S10.

[5] :Reference-free detection of isolated SNPs R. Uricaru, G. Rizk, V. Lacroix, E. Quillery, O. Plantard, R. Chikhi, C. Lemaitre, P. Peterlongo. Nucleic Acids Research 2015 43(2):e11.

[4] : Mapping-Free and Assembly-Free Discovery of Inversion Breakpoints from Raw NGS Reads. C. Lemaitre, L. Ciortuz, P. Peterlongo. AlCoB 2014, July 2014, Tarragona, Spain. To appear in LNBI vol. 8542, pp. 119--130.

[6] : MindTheGap : integrated detection and assembly of short and long insertions. G. Rizk, A. Gouin, R. Chikhi, C. Lemaitre. Bioinformatics 2014 30(24):3451-3457.

[7] : Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. BMC Bioinformatics 2015 16:288.

[8] : Multiple comparative metagenomics using multiset k-mer counting. Benoit G, Peterlongo P, Mariadassou M, Drezen E, Schbath S, Lavenier D, Lemaitre C. PeerJ Computer Science 2016 2:e94.