Wednesday, October 26, 2011

BWA on FPGA architecture

Convey Computing is offering a FPGA-based computational cluster that has been optimized for BWA.

Monday, October 3, 2011

Length filter for fastq files

I wanted to remove all sequences less than n long from a fastq file.  This is a solution that uses sed "paragraphs" -- see my earlier post.  It works... could probably be done more efficiently... but this works for me, for now.

Save this as a script, and then use in this manner:

./fastqlenfilt.sh 15 input.fastq >output.filtered.fastq
#!/bin/sh
sed -n '                                                                                                                                                     
 # thanks to http://www.grymoire.com/Unix/Sed.html                                                                                                           
 #                                                                                                                                                           
 # if matching description, check the paragraph                                                                                                              
 /^@/ b para                                                                                                                                                 
 # else add it to the hold buffer                                                                                                                            
 H                                                                                                                                                           
 # at end of file, check paragraph                                                                                                                           
 $ b para                                                                                                                                                    
 # now branch to end of script                                                                                                                               
 b                                                                                                                                                           
 # this is where a paragraph is checked for the pattern                                                                                                      
 :para                                                                                                                                                       
 # return the entire paragraph                                                                                                                               
 # into the pattern space                                                                                                                                    
 x                                                                                                                                                           
 # look for the pattern, if there - print                                                                                                                    
 # /'.*\n.{$1,}'/ p                                                                                                                                          
  /'.*\\n[ACGTURYKMSWBDHVNX]\\{$1,\\}'/ p                                                                                                                    
 ' $2