Class: BioDSL::ReadFasta

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/read_fasta.rb

Overview

Read FASTA entries from one or more files.

read_fasta read in sequence entries from FASTA files. Each sequence entry consists of a sequence name prefixed by a ‘>’ followed by the sequence name on a line of its own, followed by one or my lines of sequence until the next entry or the end of the file. The resulting Biopiece record consists of the following record type:

{:SEQ_NAME=>"test",
 :SEQ=>"AGCATCGACTAGCAGCATTT",
 :SEQ_LEN=>20}

Input files may be compressed with gzip og bzip2.

For more about the FASTA format:

en.wikipedia.org/wiki/Fasta_format

Usage

read_fasta(input: <glob>[, first: <uint>|last: <uint>])

Options

  • input <glob> - Input file or file glob expression.

  • first <uint> - Only read in the first number of entries.

  • last <uint> - Only read in the last number of entries.

Examples

To read all FASTA entries from a file:

read_fasta(input: "test.fna")

To read all FASTA entries from a gzipped file:

read_fasta(input: "test.fna.gz")

To read in only 10 records from a FASTA file:

read_fasta(input: "test.fna", first: 10)

To read in the last 10 records from a FASTA file:

read_fasta(input: "test.fna", last: 10)

To read all FASTA entries from multiple files:

read_fasta(input: "test1.fna,test2.fna")

To read FASTA entries from multiple files using a glob expression:

read_fasta(input: "*.fna")

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ ReadFasta

Constructor for the ReadFasta class.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :input (String, Array)

    String or Array with glob expressions.

  • :first (Integer)

    Dump first number of records.

  • :last (Integer)

    Dump last number of records.



93
94
95
96
97
98
99
# File 'lib/BioDSL/commands/read_fasta.rb', line 93

def initialize(options)
  @options = options
  @count   = 0
  @buffer  = []

  check_options
end

Instance Method Details

#lmbProc

Return a lambda for the read_fasta command.

Returns:

  • (Proc)

    Returns the read_fasta command lambda.



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/BioDSL/commands/read_fasta.rb', line 104

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    read_input(input, output)

    options_glob(@options[:input]).each do |file|
      BioDSL::Fasta.open(file) do |ios|
        if @options[:first] && read_first(ios, output)
        elsif @options[:last] && read_last(ios)
        else
          read_all(ios, output)
        end
      end
    end

    write_buffer(output) if @options[:last]
  end
end