Class: SeqFile

Inherits:
File
  • Object
show all
Defined in:
lib/parse_fasta/seq_file.rb

Overview

Provides a class that will parse either fastA or fastQ files, depending on what the user provides. Handles, gzipped files.

Instance Method Summary collapse

Instance Method Details

#each_record {|header, sequence| ... } ⇒ Object

Analagous to IO#each_line, #each_record will go through a fastA or fastQ file record by record.

This #each_record is used in a similar fashion as FastaFile#each_record except that it yields the header and the sequence regardless of whether the input is a fastA file or a fastQ file.

If the input is a fastQ file, this method will yield the header and the sequence and ignore the description and the quality string. This SeqFile class should only be used if your program needs to work on either fastA or fastQ files, thus it ignores the quality string and description and treats either file type as if it were a fastA file.

If you need the description or quality, you should use FastqFile#each_record instead.

Examples:

Parse a gzipped fastA file

SeqFile.open('reads.fa.gz').each_record do |head, seq|
  puts [head, seq.length].join "\t"
end

Parse an uncompressed fastQ file

SeqFile.open('reads.fq.gz').each_record do |head, seq|
  puts [head, seq.length].join "\t"
end

Yield Parameters:

  • header (String)

    The header of the record without the leading ‘>’ or ‘@’

  • sequence (Sequence)

    The sequence of the record.



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/parse_fasta/seq_file.rb', line 76

def each_record
  first_char = get_first_char(self)

  if first_char == '>'
    FastaFile.open(self).each_record do |header, sequence|
      yield(header, sequence)
    end
  elsif first_char == '@'
    FastqFile.open(self).each_record do |head, seq, desc, qual|
      yield(head, seq)
    end
  else
    raise ArgumentError, "Input does not look like FASTA or FASTQ"
  end
end

#to_hashHash

Returns the records in the sequence file as a hash map with the headers as keys and the Sequences as values. For a fastq file, acts the same as ‘FastaFile#to_hash`

Examples:

Read a fastA into a hash table.

seqs = SeqFile.open('reads.fa').to_hash

Returns:

  • (Hash)

    A hash with headers as keys, sequences as the values (Sequence objects)



32
33
34
35
36
37
38
39
40
41
42
# File 'lib/parse_fasta/seq_file.rb', line 32

def to_hash
  first_char = get_first_char(self)

  if first_char == '>'
    FastaFile.open(self).to_hash
  elsif first_char == '@'
    FastqFile.open(self).to_hash
  else
    raise ArgumentError, "Input does not look like FASTA or FASTQ"
  end
end