BioCozy: NGS

Showing posts with label NGS. Show all posts

Wednesday, October 26, 2011

BWA on FPGA architecture

Convey Computing is offering a FPGA-based computational cluster that has been optimized for BWA.

Friday, September 9, 2011

indespensible bioinformatic resources

Essentials tools for working with high-throughput sequencing data:

bowtie -- bowtie-bio.sourceforge.net/
my current NGS aligner of choice

samtools -- samtools.sourceforge.net/
essential. the lingua franca for working with the output of alignment

picard -- picard.sourceforge.net/
also essential for working with alignment files. very nicely complements samtools.

fastx toolkit -- hannonlab.cshl.edu/fastx_toolkit/
this looks like it will be handy... though not yet part of any of the ubuntu software repositories I regularly use, it does look like it is included in the upcoming Oneiric Ocelot 11.10 upgrade.

cutadapt -- code.google.com/p/cutadapt/
may be made obsolete by fastx toolkit

Other useful resources, not necessarily singularly focused on sequencing data:

Bioinformatics manual at UC Riverside -- manuals.bioinformatics.ucr.edu/
Includes lots of good stuff on R and HTS data analysis

Statistical Computing pages at UCLA -- www.ats.ucla.edu/stat/
Provides very useful examples for different stats packages, very well organized.

Thursday, August 4, 2011

Bio::SeqIO very very slow

While it may be a convenient and flexible solution when reading in 10s or 100s or 1000s of sequences, the SeqIO module of BioPerl is just not a workable solution for reading in Next Gen Sequencing files.

Consider reading in 100,000 short reads from a fastq file:

It takes 30 seconds to do it the bioperl way:

use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-format => "fastq",-file => "infilename",);
while( $seq = $seqin->next_seq() ) { $seqhash{$seq->seq() } ++; }

But only 1/2 of a second to do it the home-made way (using basic file IO module):

use File::Util;
my $f = File::Util->new();
$ifh = $f->open_handle('file' => $infilename, 'mode' => 'read');
while(<$ifh>)
{
if($_ =~ /\@.*/)
{
$counter++;
$trigger = 1;
}
elsif($trigger == 1)
{
$_ =~ /(\w+)/;
$seq = $1;
$seqhash{$seq} ++;
$trigger = 0;
}
}

Not as elegant, but it works much faster!!