bio-gff3
GFF3 plugin for BioRuby, aimed at parsing big data
Features:
# Take GFF (genome browser) information and digest mRNA and CDS sequences # Options for low memory use and caching of records # Support for external FASTA files
You can use this plugin in two ways. First as a standalone program, next as a plugin library to BioRuby.
For example, fetch mRNA and CDS information from GFF3 files and output to FASTA:
./bin/gff3-fetch mrna test/data/gff/test.gff3
./bin/gff3-fetch cds test/data/gff/test.gff3
Or clone this repository and add the ‘lib’ dir to the Ruby search path and
require 'bio/db/gff/gffdb'
You can also run RSpec with something like
rspec -I ../bioruby/lib/ spec/*.rb
This implementation depends on BioRuby’s basic GFF3 parser, with the possible advantage that the plugin is faster and does not consume all memory. The Gff3 specs are based on the output of the Wormbase genome browser.
For a write-up see thebird.nl/bioruby/BioRuby_GFF3.html
Fetch and assemble mRNAs, or CDS and print in FASTA format.
gff3-fetch [--no-cache] mRNA|CDS [filename.fa] filename.gff
Where:
--no-cache : do not load everything in memory (slower)
mRNA : assemble mRNA
CDS : assemble CDS
Multiple GFF3 files can be used. For external FASTA files, always the last
one before the GFF file is used.
Examples:
Find mRNA and CDS information from test.gff3 (which includes sequence information)
gff3-fetch mRNA test/data/gff/test.gff3
gff3-fetch CDS test/data/gff/test.gff3
Find CDS from external FASTA file
gff3-fetch CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3
Find mRNA from external FASTA file, without loading everything in RAM
gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475
Copyright
Copyright © 2010,2011 Pjotr Prins <[email protected]>