Class: FastaFile

Inherits:
File
  • Object
show all
Defined in:
lib/parse_fasta/fasta_file.rb

Overview

Provides simple interface for parsing fasta format files. Gzipped files are no problem.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.open(fname, *args) ⇒ FastaFile

Use it like IO::open

Parameters:

  • fname (String)

    the name of the file to open

Returns:



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/parse_fasta/fasta_file.rb', line 30

def self.open(fname, *args)
  begin
    handle = Zlib::GzipReader.open(fname)
  rescue Zlib::GzipFile::Error => e
    handle = File.open(fname)
  end

  unless handle.each_char.peek[0] == '>'
    raise ParseFasta::DataFormatError
  end

  handle.close

  super
end

Instance Method Details

#each_record(separate_lines = nil) {|header, sequence| ... } ⇒ Object

Analagous to IO#each_line, #each_record is used to go through a fasta file record by record. It will accept gzipped files as well.

Examples:

Parsing a fasta file (default behavior, gzip files are fine)

FastaFile.open('reads.fna.gz').each_record do |header, sequence|
  puts [header, sequence.gc].join("\t")
end

Parsing a fasta file (with truthy value param)

FastaFile.open('reads.fna').each_record(1) do |header, sequence|
  # header => 'sequence_1'
  # sequence => ['AACTG', 'AGTCGT', ... ]
end

Parameters:

  • separate_lines (Object) (defaults to: nil)

    If truthy, separate lines of record into an array of Sequences, but if falsy, yield a Sequence object for the sequence instead.

Yields:

  • The header and sequence for each record in the fasta file to the block

Yield Parameters:

  • header (String)

    The header of the fasta record without the leading ‘>’

  • sequence (Sequence, Array<Sequence>)

    The sequence of the fasta record. If separate_lines is falsy (the default behavior), will be Sequence, but if truthy will be Array<String>.



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# File 'lib/parse_fasta/fasta_file.rb', line 91

def each_record(separate_lines=nil)
  begin
    f = Zlib::GzipReader.open(self)
  rescue Zlib::GzipFile::Error => e
    f = self
  end

  if separate_lines
    f.each("\n>") do |line|
      header, sequence = parse_line_separately(line)
      yield(header.strip, sequence)
    end
  else
    f.each("\n>") do |line|
      header, sequence = parse_line(line)
      yield(header.strip, Sequence.new(sequence || ""))
    end
  end

  f.close if f.instance_of?(Zlib::GzipReader)
  return f
end

#to_hashHash

Returns the records in the fasta file as a hash map with the headers as keys and the Sequences as values.

Examples:

Read a fastA into a hash table.

seqs = FastaFile.open('reads.fa').to_hash

Returns:

  • (Hash)

    A hash with headers as keys, sequences as the values (Sequence objects)



54
55
56
57
58
59
60
61
# File 'lib/parse_fasta/fasta_file.rb', line 54

def to_hash
  hash = {}
  self.each_record do |head, seq|
    hash[head] = seq
  end

  hash
end