Class: Bio::Velvet::Graph
- Inherits:
-
Object
- Object
- Bio::Velvet::Graph
- Includes:
- Logging
- Defined in:
- lib/bio-velvet/graph.rb
Overview
Parser for a velvet assembler’s graph file (Graph or LastGraph) output from velvetg
The definition of this file is given in the velvet manual, at www.ebi.ac.uk/~zerbino/velvet/Manual.pdf
Defined Under Namespace
Classes: Arc, ArcArray, Node, NodeArray, NodedRead, NodedReadArray
Instance Attribute Summary collapse
-
#arcs ⇒ Object
ArcArray of Arc objects.
-
#hash_length ⇒ Object
Taken directly from the graph, statistics and information about the Graph i.e.
-
#nodes ⇒ Object
NodeArray object of all the graph’s node objects.
-
#number_of_nodes ⇒ Object
Taken directly from the graph, statistics and information about the Graph i.e.
-
#number_of_sequences ⇒ Object
Taken directly from the graph, statistics and information about the Graph i.e.
Class Method Summary collapse
- .log ⇒ Object
-
.parse_from_file(path_to_graph_file, options = {}) ⇒ Object
Parse a graph file from a Graph, Graph2 or LastGraph output file from velvet into a Bio::Velvet::Graph object.
Instance Method Summary collapse
-
#delete_nodes_if(&block) ⇒ Object
Deletes nodes and associated arcs from the graph if the block passed evaluates to true (as in Array#delete_if).
-
#get_arcs_by_node(node1, node2) ⇒ Object
Return an array of Arc objects between two nodes (specified by node objects), or an empty array if none exists.
-
#get_arcs_by_node_id(node_id1, node_id2) ⇒ Object
Return an array of Arc objects between two nodes (specified by integer IDs), or an empty array if none exists.
-
#neighbours_into_start(node) ⇒ Object
Return the adjacent nodes in the graph that connect to the end of a node.
-
#neighbours_off_end(node) ⇒ Object
Return the adjacent nodes in the graph that connect to the end of a node.
-
#parse_additional_noded_reads(path_to_graph_file, options) ⇒ Object
Add more noded reads to this already parsed graph.
Methods included from Logging
Instance Attribute Details
#arcs ⇒ Object
ArcArray of Arc objects
23 24 25 |
# File 'lib/bio-velvet/graph.rb', line 23 def arcs @arcs end |
#hash_length ⇒ Object
Taken directly from the graph, statistics and information about the Graph i.e. from the velvet manual “$NUMBER_OF_NODES $NUMBER_OF_SEQUENCES $HASH_LENGTH”
17 18 19 |
# File 'lib/bio-velvet/graph.rb', line 17 def hash_length @hash_length end |
#nodes ⇒ Object
NodeArray object of all the graph’s node objects
20 21 22 |
# File 'lib/bio-velvet/graph.rb', line 20 def nodes @nodes end |
#number_of_nodes ⇒ Object
Taken directly from the graph, statistics and information about the Graph i.e. from the velvet manual “$NUMBER_OF_NODES $NUMBER_OF_SEQUENCES $HASH_LENGTH”
17 18 19 |
# File 'lib/bio-velvet/graph.rb', line 17 def number_of_nodes @number_of_nodes end |
#number_of_sequences ⇒ Object
Taken directly from the graph, statistics and information about the Graph i.e. from the velvet manual “$NUMBER_OF_NODES $NUMBER_OF_SEQUENCES $HASH_LENGTH”
17 18 19 |
# File 'lib/bio-velvet/graph.rb', line 17 def number_of_sequences @number_of_sequences end |
Class Method Details
.log ⇒ Object
25 26 27 |
# File 'lib/bio-velvet/graph.rb', line 25 def self.log self.new.log end |
.parse_from_file(path_to_graph_file, options = {}) ⇒ Object
Parse a graph file from a Graph, Graph2 or LastGraph output file from velvet into a Bio::Velvet::Graph object
Options:
-
:dont_parse_noded_reads: if true, then parsing of the NR section is skipped
-
:interesting_read_ids: If not nil, is a Set of nodes that we are interested in. Reads
not of interest will not be parsed in (the NR part of the velvet LastGraph file). Regardless all nodes and edges are parsed in. Using this options saves both memory and CPU.
-
:interesting_node_ids: like :interesting_read_ids except it allows targeting of particular nodes
rather than particular reads.
-
:grep_hack: to make the parsing of read associations go even faster, a grep-based, rather
hacky method is applied to the graph file, so only NR data of interesting_read_ids is presented to the parser. This can save days of parsing time, but is a bit of a hack and its usage may not be particularly future-proof. The value of this option is the amount of context coming out of grep (the -A and -B flags). Using 500 should probably work for most circumstances, if not an Exception will be raised.
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# File 'lib/bio-velvet/graph.rb', line 44 def self.parse_from_file(path_to_graph_file, ={}) graph = self.new state = :header current_node = nil graph.nodes = NodeArray.new graph.arcs = ArcArray.new current_node_direction = nil line_number = 0 Hopcsv.foreach(path_to_graph_file,"\t") do |row| line_number += 1 if state == :header raise "parse exception on header line, this line #{line_number}: #{row.inspect}" unless row.length >= 3 graph.number_of_nodes = row[0].to_i graph.number_of_sequences = row[1].to_i graph.hash_length = row[2].to_i #Not quite sure what the function of the 4th column is state = :nodes_0 log.debug "Now parsing velvet graph nodes" if log.debug? next end if state == :nodes_0 # NODE $NODE_ID $COV_SHORT1 $O_COV_SHORT1 $COV_SHORT2 $O_COV_SHORT2 # $ENDS_OF_KMERS_OF_NODE # $ENDS_OF_KMERS_OF_TWIN_NODE if row[0] == 'NODE' raise unless row.length > 2 current_node = Node.new current_node.node_id = row[1].to_i current_node.length = row[2].to_i current_node.coverages = row[3...row.length].collect{|c| c.to_i} current_node.parent_graph = graph state = :nodes_1 raise "Duplicate node name" unless graph.nodes[current_node.node_id].nil? graph.nodes[current_node.node_id] = current_node next else state = :arc log.debug "Now parsing velvet graph arcs" if log.debug? # No next in the loop so that this line gets parsed as an ARC further down the loop end elsif state == :nodes_1 # Sometimes nodes can be empty row[0] ||= '' current_node.ends_of_kmers_of_node = row[0] raise "Unexpected nodes_1 type line on line #{line_number}: #{row.inspect}" if row.length != 1 state = :nodes_2 next elsif state == :nodes_2 # Sometimes nodes can be empty row[0] ||= '' raise if row.length != 1 current_node.ends_of_kmers_of_twin_node = row[0] state = :nodes_0 next end if state == :arc if row[0] == 'ARC' # ARC $START_NODE $END_NODE $MULTIPLICITY arc = Arc.new raise unless row.length == 4 arc.begin_node_id = row[1].to_i.abs arc.end_node_id = row[2].to_i.abs arc.multiplicity = row[3].to_i arc.begin_node_direction = (row[1].to_i > 0) arc.end_node_direction = (row[2].to_i > 0) graph.arcs.push arc next else state = :nr log.debug "Finished parsing velvet graph arcs. Now parsing the rest of the file" if log.debug? end end if state == :nr if row[0] == 'SEQ' log.warn "velvet graph parse warning: SEQ lines in the Graph file parsing not implemented yet, tracking of reads now not parsed either" break end # If short reads are tracked, for every node a block of read identifiers: # NR $NODE_ID $NUMBER_OF_SHORT_READS # $READ_ID $OFFSET_FROM_START_OF_NODE $START_COORD # $READ_ID2 etc. #p row if row[0] == 'NR' break if [:dont_parse_noded_reads] # We are done if NR things aren't parsed if [:grep_hack] unless [:interesting_read_ids] or [:interesting_node_ids] raise "Programming error using bio-velvet: if :grep_hack is specified, then :interesting_read_ids or :interesting_node_ids must also be" end apply_grep_hack graph, path_to_graph_file, [:interesting_read_ids], [:interesting_node_ids], [:grep_hack] break #no more parsing is required else raise unless row.length == 3 node_pm = row[1].to_i current_node_direction = node_pm > 0 current_node = graph.nodes[node_pm.abs] current_node.number_of_short_reads ||= 0 current_node.number_of_short_reads += row[2].to_i next end else raise unless row.length == 3 read_id = row[0].to_i if ([:interesting_node_ids] and ![:interesting_node_ids].include?(current_node.node_id)) or ([:interesting_read_ids] and ![:interesting_read_ids].include?(read_id)) # We have come across an uninteresting read. Ignore it. next end nr = NodedRead.new nr.read_id = read_id nr.offset_from_start_of_node = row[1].to_i nr.start_coord = row[2].to_i nr.direction = current_node_direction current_node.short_reads ||= NodedReadArray.new current_node.short_reads.push nr next end end end log.debug "Finished parsing velvet graph file" if log.debug? return graph end |
Instance Method Details
#delete_nodes_if(&block) ⇒ Object
Deletes nodes and associated arcs from the graph if the block passed evaluates to true (as in Array#delete_if). Actually the associated arcs are deleted first, and then the node, so that the graph remains sane at all times - there is never dangling arcs, as such.
Returns a [deleted_nodes, deleted_arcs] tuple, which are both enumerables, each in no particular order.
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
# File 'lib/bio-velvet/graph.rb', line 225 def delete_nodes_if(&block) deleted_nodes = [] deleted_arcs = [] nodes.each do |node| if yield(node) deleted_nodes.push node # delete associated arcs arcs_to_del = @arcs.get_arcs_by_node_id(node.node_id) deleted_arcs.push arcs_to_del arcs_to_del.each do |arc| @arcs.delete arc end # delete the arc itself nodes.delete node end end return deleted_nodes, deleted_arcs.flatten end |
#get_arcs_by_node(node1, node2) ⇒ Object
Return an array of Arc objects between two nodes (specified by node objects), or an empty array if none exists. There is four possible arcs between two nodes, connecting their beginnings and ends
184 185 186 |
# File 'lib/bio-velvet/graph.rb', line 184 def get_arcs_by_node(node1, node2) @arcs.get_arcs_by_node_id(node1.node_id, node2.node_id) end |
#get_arcs_by_node_id(node_id1, node_id2) ⇒ Object
Return an array of Arc objects between two nodes (specified by integer IDs), or an empty array if none exists. There is four possible arcs between two nodes, connecting their beginnings and ends
177 178 179 |
# File 'lib/bio-velvet/graph.rb', line 177 def get_arcs_by_node_id(node_id1, node_id2) @arcs.get_arcs_by_node_id(node_id1, node_id2) end |
#neighbours_into_start(node) ⇒ Object
Return the adjacent nodes in the graph that connect to the end of a node
204 205 206 207 208 209 210 211 212 213 214 215 |
# File 'lib/bio-velvet/graph.rb', line 204 def neighbours_into_start(node) # Find all arcs that include this node in the right place passable_nodes = [] @arcs.get_arcs_by_node_id(node.node_id).each do |arc| if arc.end_node_id == node.node_id and arc.end_node_direction passable_nodes.push nodes[arc.begin_node_id] elsif arc.begin_node_id == node.node_id and !arc.begin_node_direction passable_nodes.push nodes[arc.end_node_id] end end return passable_nodes.uniq end |
#neighbours_off_end(node) ⇒ Object
Return the adjacent nodes in the graph that connect to the end of a node
189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/bio-velvet/graph.rb', line 189 def neighbours_off_end(node) # Find all arcs that include this node in the right place passable_nodes = [] @arcs.get_arcs_by_node_id(node.node_id).each do |arc| if arc.begin_node_id == node.node_id and arc.begin_node_direction # The most intuitive case passable_nodes.push nodes[arc.end_node_id] elsif arc.end_node_id == node.node_id and !arc.end_node_direction passable_nodes.push nodes[arc.begin_node_id] end end return passable_nodes.uniq end |
#parse_additional_noded_reads(path_to_graph_file, options) ⇒ Object
Add more noded reads to this already parsed graph. There is no gaurantee that old NodedRead information is preserved, or removed.
Options:
-
:interesting_read_ids: If not nil, is a Set of nodes that we are interested in. Reads
not of interest will not be parsed in (the NR part of the velvet LastGraph file). Regardless all nodes and edges are parsed in. Using this options saves both memory and CPU.
-
:interesting_node_ids: like :interesting_read_ids except it allows targeting of particular nodes
rather than particular reads.
-
:grep_hack: to make the parsing of read associations go even faster, a grep-based, rather
hacky method is applied to the graph file, so only NR data of interesting_read_ids is presented to the parser. This can save days of parsing time, but is a bit of a hack and its usage may not be particularly future-proof. The value of this option is the amount of context coming out of grep (the -A and -B flags). Using 500 should probably work for most circumstances, if not an Exception will be raised.
260 261 262 263 264 265 266 267 268 269 270 271 |
# File 'lib/bio-velvet/graph.rb', line 260 def parse_additional_noded_reads(path_to_graph_file, ) grep_context = [:grep_hack] if grep_context.nil? raise "Calling Graph#parse_additional_noded_reads without specifying :grep_hack is currently not implemented" end self.class.apply_grep_hack(self, path_to_graph_file, [:interesting_read_ids], [:interesting_node_ids], grep_context ) end |