MindTheGap is a tool for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads.
MindTheGap is described in [6]. Sources (licensing A-GPL), binaries, documentation and news are available at MindTheGap web page.
Simka is a comparative metagenomics method dedicated to NGS datasets. It computes a large collection of distances classically used in ecology to compare communities by approximating species counts by k-mer counts. The method is ultra-fast and can be applied to large metagenomics projects such as TARA oceans or the Human Microbiome Project.
Simka is described in [8]. Sources, binaries, documentation and news are available on Github.
DiscoSnp enables to extract small polymorphism (SNPs and indels) in or between read datasets without assembly, nor mapping on a reference genome.
DiscoSnp is described in [4]. Sources, binaries, documentation and news are available at discoSnp web page.
Compareads is a tool designed to compare and extract similar sequences between two datasets. One important feature of Compareads is its time and memory performances that permit to deal with potentially huge sequence datasets (i.e., hundreds of millions reads per dataset) : it is for instance 30 times faster than the popular Blast. It was notably used for metagenomic analyses.
Compareads is described in [3]. Sources, binaries, documentation and news are available at Compareads web page.
Note : Compareads is no longer maintained, it has been replaced by COMMET.
TakeABreak is a tool that can detect inversion breakpoints directly from raw NGS reads, without the need of any reference genome and without de novo assembling the genomes. Its implementation is based on the Genome Assembly Tool Box (GATB) library, and has a very limited memory impact allowing its usage on common desktop computers and acceptable runtime (Illumina reads simulated at 2x40x coverage from human chromosome 22 can be treated in less than two hours, with less than 1GB of memory).
TakeABreak is described in [5]. Sources (licensing A-GPL), binaries, documentation and news are available at TakeABreak web page.
Leon is a software to compress Next Generation Sequencing data. It can compress Fasta or Fastq format. The method does not require any reference genome, instead a reference is built de novo from the set of reads as a probabilist de Bruijn Graph. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. Each read is then encoded as a path in this graph, storing only an anchoring kmer and a list of bifurcations indicating which path to follow in the graph if several are possible.
Leon is described in [7]. Sources (licensing A-GPL), binaries, documentation and news are available at Leon web page.
The package Cassis implements methods for precise detection of rearrangement breakpoints in a sequenced genome by comparison with a genome of a related species.
The algorithms and methods are described in Lemaitre et al., 2008 [1]. The implementation of the package is described in Baudet et al., 2010 [2].
Cassis is implemented in Perl and R. The package sources are free and licensed under the GNU General Public License.
The package and documentation can be downloaded at: Cassis webpage.
References
[1] : Precise detection of rearrangement breakpoints in mammalian genomes.C. Lemaitre, E. Tannier, C. Gautier, M.-F. Sagot. BMC Bioinformatics, 2008 9(1):286.
[2] : Cassis: detection of genomic rearrangement breakpoints. C. Baudet, C. Lemaitre, D. Zanoni, C. Gautier, E. Tannier, M.-F. Sagot. Bioinformatics. 2010 26(15):1897-1898.
[3] : Compareads: comparing huge metagenomic experiments. N. Maillet, C. Lemaitre, R. Chikhi, D. Lavenier, P. Peterlongo. BMC Bioinformatics 2012 13 (Suppl 19):S10.
[5] :Reference-free detection of isolated SNPs R. Uricaru, G. Rizk, V. Lacroix, E. Quillery, O. Plantard, R. Chikhi, C. Lemaitre, P. Peterlongo. Nucleic Acids Research 2015 43(2):e11.
[4] : Mapping-Free and Assembly-Free Discovery of Inversion Breakpoints from Raw NGS Reads. C. Lemaitre, L. Ciortuz, P. Peterlongo. AlCoB 2014, July 2014, Tarragona, Spain. To appear in LNBI vol. 8542, pp. 119--130.
[6] : MindTheGap : integrated detection and assembly of short and long insertions. G. Rizk, A. Gouin, R. Chikhi, C. Lemaitre. Bioinformatics 2014 30(24):3451-3457.
[7] : Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph G. Benoit, C. Lemaitre, D. Lavenier, E. Drezen, T. Dayris, R. Uricaru, G. Rizk. BMC Bioinformatics 2015 16:288.
[8] : Multiple comparative metagenomics using multiset k-mer counting. Benoit G, Peterlongo P, Mariadassou M, Drezen E, Schbath S, Lavenier D, Lemaitre C. PeerJ Computer Science 2016 2:e94.