Module: Mspire::Fasta
- Defined in:
- lib/mspire/fasta.rb
Overview
A convenience class for working with fasta formatted sequence databases. the file which includes this class also includes Enumerable with Bio::FlatFile so you can do things like this:
accessions = Mspire::Fasta.open("file.fasta") do |fasta|
fasta.map(&:accession)
end
A few aliases are added to Bio::FastaFormat
entry.header == entry.definition
entry.sequence == entry.seq
Mspire::Fasta.new accepts both an IO object or a String (a fasta formatted string itself)
# taking an io object:
File.open("file.fasta") do |io|
fasta = Mspire::Fasta.new(io)
... do something with it
end
# taking a string
string = ">id1 a simple header\nAAASDDEEEDDD\n>id2 header again\nPPPPPPWWWWWWTTTTYY\n"
fasta = Mspire::Fasta.new(string)
(simple, not_simple) = fasta.partition {|entry| entry.header =~ /simple/ }
Class Method Summary collapse
-
.foreach(file, &block) ⇒ Object
yields each Bio::FastaFormat object in turn.
-
.new(io) ⇒ Object
takes an IO object or a string that is the fasta data itself.
-
.open(file, &block) ⇒ Object
opens the flatfile and yields a Bio::FlatFile object.
-
.uniprot_id(header) ⇒ Object
takes the header string and returns the uniprot id.
Class Method Details
.foreach(file, &block) ⇒ Object
yields each Bio::FastaFormat object in turn
48 49 50 51 52 53 |
# File 'lib/mspire/fasta.rb', line 48 def self.foreach(file, &block) block or return enum_for(__method__, file) Bio::FlatFile.open(Bio::FastaFormat, file) do |fasta| fasta.each(&block) end end |
.new(io) ⇒ Object
takes an IO object or a string that is the fasta data itself
56 57 58 59 |
# File 'lib/mspire/fasta.rb', line 56 def self.new(io) io = StringIO.new(io) if io.is_a?(String) Bio::FlatFile.new(Bio::FastaFormat, io) end |
.open(file, &block) ⇒ Object
opens the flatfile and yields a Bio::FlatFile object
43 44 45 |
# File 'lib/mspire/fasta.rb', line 43 def self.open(file, &block) Bio::FlatFile.open(Bio::FastaFormat, file, &block) end |
.uniprot_id(header) ⇒ Object
takes the header string and returns the uniprot id
'sp|Q04917|1433F_HUMAN' #=> 'Q04917'
This can also be found with BioFastaFormat#accession (but it may be much slower)
66 67 68 |
# File 'lib/mspire/fasta.rb', line 66 def self.uniprot_id(header) header[/^[^\|]+\|([^\|]+)\|/, 1] end |