Class: Bio::Velvet::Sequences

Inherits:
Hash
  • Object
show all
Includes:
Logging
Defined in:
lib/bio-velvet/sequences.rb

Overview

Parser and container class for textual Sequence files

After parsing, the result is a hash of read_id => sequence where read_id is an Integer and sequence a String

The definition of this file is given in the velvet manual, at www.ebi.ac.uk/~zerbino/velvet/Manual.pdf

Class Method Summary collapse

Methods included from Logging

#log

Class Method Details

.logObject



17
18
19
# File 'lib/bio-velvet/sequences.rb', line 17

def self.log
  self.new.log
end

.parse_from_file(path_to_sequence_file, options = {}) ⇒ Object

Options:

  • :interesting_read_ids: If not nil, is a Set of nodes that we are interested in. Reads

not of interest will not be parsed in (the NR part of the velvet LastGraph file). Regardless all nodes and edges are parsed in. Using this options saves both memory and CPU.

  • :grep_hack: to make the parsing of read associations go even faster, a grep-based, rather

hacky method is applied to the graph file, so only sequence data of interesting_read_ids is presented to the parser. This can save days of parsing time, but is a bit of a hack and its usage may not be particularly future-proof. The value of this option is the amount of context coming out of grep (the -A flag). In the Sequence file the sequences are wrapped at 60 characters, so you’ll need at least (longest_sequence_length / 60) + 2 amount of context. The reason for adding 2 is that the parser will then be able to detect insufficient context and raise an Exception, without throwing up false positive Exceptions.



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/bio-velvet/sequences.rb', line 33

def self.parse_from_file(path_to_sequence_file, options={})
  seq_object = Bio::Velvet::Sequences.new

  if options[:apply_grep_hack]
    apply_grep_hack(seq_object, path_to_sequence_file, options[:interesting_read_ids], options[:apply_grep_hack])
  else
    # Parse all the sequences
    Bio::FlatFile.foreach(path_to_sequence_file) do |seq|
      read_id = seq.definition.split("\t")[1].to_i
      if options[:interesting_read_ids].nil? or options[:interesting_read_ids].include?(read_id)
        seq_object[read_id] = seq.seq.to_s
      end
    end
  end
  log.info "Read in #{seq_object.length} velvet stored sequences"
  return seq_object
end