Showing posts with label NGS. Show all posts
Showing posts with label NGS. Show all posts
Wednesday, October 26, 2011
Friday, September 9, 2011
indespensible bioinformatic resources
Essentials tools for working with high-throughput sequencing data:
bowtie -- bowtie-bio.sourceforge.net/
my current NGS aligner of choice
samtools -- samtools.sourceforge.net/
essential. the lingua franca for working with the output of alignment
picard -- picard.sourceforge.net/
also essential for working with alignment files. very nicely complements samtools.
fastx toolkit -- hannonlab.cshl.edu/fastx_toolkit/
this looks like it will be handy... though not yet part of any of the ubuntu software repositories I regularly use, it does look like it is included in the upcoming Oneiric Ocelot 11.10 upgrade.
cutadapt -- code.google.com/p/cutadapt/
may be made obsolete by fastx toolkit
Other useful resources, not necessarily singularly focused on sequencing data:
Bioinformatics manual at UC Riverside -- manuals.bioinformatics.ucr.edu/
Includes lots of good stuff on R and HTS data analysis
Statistical Computing pages at UCLA -- www.ats.ucla.edu/stat/
Provides very useful examples for different stats packages, very well organized.
Thursday, August 4, 2011
Bio::SeqIO very very slow
While it may be a convenient and flexible solution when reading in 10s or 100s or 1000s of sequences, the SeqIO module of BioPerl is just not a workable solution for reading in Next Gen Sequencing files.
Consider reading in 100,000 short reads from a fastq file:
It takes 30 seconds to do it the bioperl way:
use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-format => "fastq",-file => "infilename",);
while( $seq = $seqin->next_seq() ) { $seqhash{$seq->seq() } ++; }
But only 1/2 of a second to do it the home-made way (using basic file IO module):
use File::Util;
my $f = File::Util->new();
$ifh = $f->open_handle('file' => $infilename, 'mode' => 'read');
while(<$ifh>)
{
if($_ =~ /\@.*/)
{
$counter++;
$trigger = 1;
}
elsif($trigger == 1)
{
$_ =~ /(\w+)/;
$seq = $1;
$seqhash{$seq} ++;
$trigger = 0;
}
}
Not as elegant, but it works much faster!!
Consider reading in 100,000 short reads from a fastq file:
It takes 30 seconds to do it the bioperl way:
use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-format => "fastq",-file => "infilename",);
while( $seq = $seqin->next_seq() ) { $seqhash{$seq->seq() } ++; }
But only 1/2 of a second to do it the home-made way (using basic file IO module):
use File::Util;
my $f = File::Util->new();
$ifh = $f->open_handle('file' => $infilename, 'mode' => 'read');
while(<$ifh>)
{
if($_ =~ /\@.*/)
{
$counter++;
$trigger = 1;
}
elsif($trigger == 1)
{
$_ =~ /(\w+)/;
$seq = $1;
$seqhash{$seq} ++;
$trigger = 0;
}
}
Not as elegant, but it works much faster!!
Subscribe to:
Posts (Atom)