Class: Bio::Velvet::Sequences
- Inherits:
-
Hash
- Object
- Hash
- Bio::Velvet::Sequences
- Includes:
- Logging
- Defined in:
- lib/bio-velvet/sequences.rb
Overview
Parser and container class for textual Sequence files
After parsing, the result is a hash of read_id => sequence where read_id is an Integer and sequence a String
The definition of this file is given in the velvet manual, at www.ebi.ac.uk/~zerbino/velvet/Manual.pdf
Class Method Summary collapse
- .log ⇒ Object
-
.parse_from_file(path_to_sequence_file, options = {}) ⇒ Object
Options: * :interesting_read_ids: If not nil, is a Set of nodes that we are interested in.
Methods included from Logging
Class Method Details
.log ⇒ Object
17 18 19 |
# File 'lib/bio-velvet/sequences.rb', line 17 def self.log self.new.log end |
.parse_from_file(path_to_sequence_file, options = {}) ⇒ Object
Options:
-
:interesting_read_ids: If not nil, is a Set of nodes that we are interested in. Reads
not of interest will not be parsed in (the NR part of the velvet LastGraph file). Regardless all nodes and edges are parsed in. Using this options saves both memory and CPU.
-
:grep_hack: to make the parsing of read associations go even faster, a grep-based, rather
hacky method is applied to the graph file, so only sequence data of interesting_read_ids is presented to the parser. This can save days of parsing time, but is a bit of a hack and its usage may not be particularly future-proof. The value of this option is the amount of context coming out of grep (the -A flag). In the Sequence file the sequences are wrapped at 60 characters, so you’ll need at least (longest_sequence_length / 60) + 2 amount of context. The reason for adding 2 is that the parser will then be able to detect insufficient context and raise an Exception, without throwing up false positive Exceptions.
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/bio-velvet/sequences.rb', line 33 def self.parse_from_file(path_to_sequence_file, ={}) seq_object = Bio::Velvet::Sequences.new if [:apply_grep_hack] apply_grep_hack(seq_object, path_to_sequence_file, [:interesting_read_ids], [:apply_grep_hack]) else # Parse all the sequences Bio::FlatFile.foreach(path_to_sequence_file) do |seq| read_id = seq.definition.split("\t")[1].to_i if [:interesting_read_ids].nil? or [:interesting_read_ids].include?(read_id) seq_object[read_id] = seq.seq.to_s end end end log.info "Read in #{seq_object.length} velvet stored sequences" return seq_object end |