Class: FastaFile
- Inherits:
-
File
- Object
- File
- FastaFile
- Defined in:
- lib/parse_fasta/fasta_file.rb
Overview
Provides simple interface for parsing fasta format files. Gzipped files are no problem.
Class Method Summary collapse
-
.open(fname, *args) ⇒ FastaFile
Use it like IO::open.
Instance Method Summary collapse
-
#each_record(separate_lines = nil) {|header, sequence| ... } ⇒ Object
Analagous to IO#each_line, #each_record is used to go through a fasta file record by record.
-
#each_record_fast {|header, sequence| ... } ⇒ Object
Fast version of #each_record.
-
#to_hash ⇒ Hash
Returns the records in the fasta file as a hash map with the headers as keys and the Sequences as values.
Class Method Details
.open(fname, *args) ⇒ FastaFile
Use it like IO::open
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# File 'lib/parse_fasta/fasta_file.rb', line 30 def self.open(fname, *args) begin handle = Zlib::GzipReader.open(fname) rescue Zlib::GzipFile::Error => e handle = File.open(fname) end unless handle.each_char.peek[0] == '>' raise ParseFasta::DataFormatError end handle.close super end |
Instance Method Details
#each_record(separate_lines = nil) {|header, sequence| ... } ⇒ Object
Analagous to IO#each_line, #each_record is used to go through a fasta file record by record. It will accept gzipped files as well.
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# File 'lib/parse_fasta/fasta_file.rb', line 95 def each_record(separate_lines=nil) begin f = Zlib::GzipReader.open(self) rescue Zlib::GzipFile::Error => e f = self end if separate_lines f.each("\n>") do |line| header, sequence = parse_line_separately(line) yield(header.strip, sequence) end # f.each_with_index(">") do |line, idx| # if idx.zero? # if line != ">" # raise ParseFasta::DataFormatError # end # else # header, sequence = parse_line_separately(line) # yield(header.strip, sequence) # end # end else f.each("\n>") do |line| header, sequence = parse_line(line) yield(header.strip, Sequence.new(sequence || "")) end # f.each_with_index(sep=/^>/) do |line, idx| # if idx.zero? # if line != ">" # raise ParseFasta::DataFormatError # end # else # header, sequence = parse_line(line) # yield(header.strip, Sequence.new(sequence || "")) # end # end end f.close if f.instance_of?(Zlib::GzipReader) return f end |
#each_record_fast {|header, sequence| ... } ⇒ Object
If the fastA file has spaces in the sequence, they will be retained. If this is a problem, use #each_record instead.
Fast version of #each_record
Yields the sequence as a String, not Sequence. No separate lines option.
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
# File 'lib/parse_fasta/fasta_file.rb', line 157 def each_record_fast begin f = Zlib::GzipReader.open(self) rescue Zlib::GzipFile::Error => e f = self end f.each("\n>") do |line| header, sequence = parse_line(line) raise ParseFasta::SequenceFormatError if sequence.include? ">" yield(header.strip, sequence) end f.close if f.instance_of?(Zlib::GzipReader) return f end |
#to_hash ⇒ Hash
Returns the records in the fasta file as a hash map with the headers as keys and the Sequences as values.
56 57 58 59 60 61 62 63 |
# File 'lib/parse_fasta/fasta_file.rb', line 56 def to_hash hash = {} self.each_record do |head, seq| hash[head] = seq end hash end |