Class: RGFA

Inherits:
Object show all
Includes:
Connectivity, Containments, Headers, LinearPaths, Lines, Links, LoggerSupport, Multiplication, Paths, RGL, Segments, RGFATools
Defined in:
lib/rgfa.rb,
lib/rgfa.rb,
lib/rgfatools.rb

Overview

Main class of the RGFA library.

RGFA provides a representation of a GFA graph. It supports creating a graph from scratch, input and output from/to file or strings, as well as several operations on the graph. The examples below show how to create a RGFA object from scratch or from a GFA file, write the RGFA to file, output the string representation or a statistics report, and control the validation level.

Interacting with the graph

Examples:

Creating an empty RGFA object

gfa = RGFA.new

Parsing and writing GFA format

gfa = RGFA.from_file(filename) # parse GFA file
gfa.to_file(filename) # write to GFA file
puts gfa # show GFA representation of RGFA object

Basic statistics report

puts gfa.info # print report
puts gfa.info(short = true) # compact format, in one line

Validation

gfa = RGFA.from_file(filename, validate: 1) # default level is 2
gfa.validate = 3 # change validation level
gfa.turn_off_validations # equivalent to gfa.validate = 0
gfa.validate! # run post-validations (e.g. check segment names in links)

Defined Under Namespace

Modules: Connectivity, Containments, FieldParser, FieldValidator, FieldWriter, Headers, LinearPaths, Lines, Links, LoggerSupport, Multiplication, Paths, Segments, Sequence Classes: ByteArray, CIGAR, DuplicatedLabelError, Error, FieldArray, Line, LineMissingError, Logger, NumericArray, OrientedSegment, SegmentEnd, SegmentEndsPath, SegmentInfo

Constant Summary

Constants included from RGFATools::Multiplication

RGFATools::Multiplication::LINKS_DISTRIBUTION_POLICY

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from RGFATools::PBubbles

#remove_p_bubble, #remove_p_bubbles

Methods included from RGFATools::LinearPaths

#merge_linear_path

Methods included from RGFATools::SuperfluousLinks

#enforce_all_mandatory_links, #enforce_segment_mandatory_links, #remove_self_link, #remove_self_links

Methods included from RGFATools::Multiplication

#multiply_extended, #multiply_with_rgfatools

Methods included from RGFATools::InvertibleSegments

#randomly_orient_invertible, #randomly_orient_invertibles

Methods included from RGFATools::CopyNumber

#apply_copy_number, #apply_copy_numbers, #compute_copy_numbers, #delete_low_coverage_segments, #set_count_unit_length, #set_default_count_tag

Methods included from RGFATools::Artifacts

#remove_dead_ends, #remove_small_components

Methods included from LoggerSupport

#enable_progress_logging, #progress_log, #progress_log_end, #progress_log_init

Methods included from Multiplication

#multiply

Methods included from Connectivity

#connected_components, #connectivity, #cut_link?, #cut_segment?, #segment_connected_component, #split_connected_components

Methods included from LinearPaths

#linear_path, #linear_paths, #merge_linear_path, #merge_linear_paths

Methods included from Paths

#delete_path, #path, #path!, #paths, #paths_with

Methods included from Containments

#contained_in, #containing, #containment, #containment!, #containments, #containments_between, #delete_containment

Methods included from Links

#delete_link, #delete_other_links, #link, #link!, #link_from_to, #link_from_to!, #links, #links_between, #links_from, #links_from_to, #links_of, #links_to, #neighbours

Methods included from Segments

#connected_segments, #delete_segment, #segment, #segment!, #segments, #unconnect_segments

Methods included from Headers

#delete_headers, #header, #headers

Methods included from Lines

#<<, #rename, #rm

Constructor Details

#initialize(validate: 2) ⇒ RGFA

Returns a new instance of RGFA.

Parameters:

  • validate (Integer) (defaults to: 2)

    (defaults to: 2) the validation level; see “Validation level” under RGFA::Line#initialize.



103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/rgfa.rb', line 103

def initialize(validate: 2)
  @validate = validate
  init_headers
  @segments = {}
  @links = []
  @containments = []
  @paths = {}
  @segments_first_order = false
  @progress = false
  @default = {:count_tag => :RC, :unit_length => 1}
  @extensions_enabled = false
end

Instance Attribute Details

#validateObject

Returns the value of attribute validate.



97
98
99
# File 'lib/rgfa.rb', line 97

def validate
  @validate
end

Class Method Details

.from_file(filename, validate: 2) ⇒ RGFA

Creates a RGFA instance parsing the file with specified filename

Parameters:

  • filename (String)
  • validate (Integer) (defaults to: 2)

    (defaults to: 2) the validation level; see “Validation level” under RGFA::Line#initialize.

Returns:

Raises:

  • if file cannot be opened for reading



202
203
204
205
206
# File 'lib/rgfa.rb', line 202

def self.from_file(filename, validate: 2)
  gfa = RGFA.new(validate: validate)
  gfa.read_file(filename)
  return gfa
end

Instance Method Details

#==(other) ⇒ Boolean

Compare two RGFA instances.

Returns:

  • (Boolean)

    are the lines of the two instances equivalent?



283
284
285
286
287
288
289
# File 'lib/rgfa.rb', line 283

def ==(other)
  segments == other.segments and
    links == other.links and
    containments == other.containments and
    headers == other.headers and
    paths == other.paths
end

#cloneRGFA

Create a copy of the RGFA instance.

Returns:



170
171
172
173
174
175
176
# File 'lib/rgfa.rb', line 170

def clone
  cpy = to_s.to_rgfa(validate: 0)
  cpy.validate = @validate
  cpy.enable_progress_logging if @progress
  cpy.require_segments_first_order if @segments_first_order
  return cpy
end

#disable_extensionsvoid

This method returns an undefined value.

Disable RGFATools extensions of RGFA methods



98
99
100
# File 'lib/rgfatools.rb', line 98

def disable_extensions
  @extensions_enabled = false
end

#enable_extensionsvoid

This method returns an undefined value.

Enable RGFATools extensions of RGFA methods



92
93
94
# File 'lib/rgfatools.rb', line 92

def enable_extensions
  @extensions_enabled = true
end

#info(short = false) ⇒ String

Output basic statistics about the graph’s sequence and topology information.

Compact output has the following keys:

  • ns: number of segments

  • nl: number of links

  • cc: number of connected components

  • de: number of dead ends

  • tl: total length of segment sequences

  • 50: N50 segment sequence length

Normal output outputs a table with the same information, plus some additional one: the length of the largest component, as well as the shortest and largest and 1st/2nd/3rd quartiles of segment sequence length.

Parameters:

  • short (boolean) (defaults to: false)

    compact output as a single text line

Returns:

  • (String)

    sequence and topology information collected from the graph.



237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
# File 'lib/rgfa.rb', line 237

def info(short = false)
  q, n50, tlen = lenstats
  nde = n_dead_ends()
  pde = "%.2f%%" % ((nde.to_f*100) / (segments.size*2))
  cc = connected_components()
  cc.map!{|c|c.map{|sn|segment!(sn).length!}.inject(:+)}
  if short
    return "ns=#{segments.size}\t"+
           "nl=#{links.size}\t"+
           "cc=#{cc.size}\t"+
           "de=#{nde}\t"+
           "tl=#{tlen}\t"+
           "50=#{n50}"
  end
  retval = []
  retval << "Segment count:               #{segments.size}"
  retval << "Links count:                 #{links.size}"
  retval << "Total length (bp):           #{tlen}"
  retval << "Dead ends:                   #{nde}"
  retval << "Percentage dead ends:        #{pde}"
  retval << "Connected components:        #{cc.size}"
  retval << "Largest component (bp):      #{cc.last}"
  retval << "N50 (bp):                    #{n50}"
  retval << "Shortest segment (bp):       #{q[0]}"
  retval << "Lower quartile segment (bp): #{q[1]}"
  retval << "Median segment (bp):         #{q[2]}"
  retval << "Upper quartile segment (bp): #{q[3]}"
  retval << "Longest segment (bp):        #{q[4]}"
  return retval
end

#n_dead_endsInteger

Counts the dead ends.

Dead ends are here defined as segment ends without connections.

Returns:

  • (Integer)

    number of dead ends in the graph



274
275
276
277
278
279
# File 'lib/rgfa.rb', line 274

def n_dead_ends
  segments.inject(0) do |n,s|
    [:E, :B].each {|e| n+= 1 if links_of([s.name, e]).empty?}
    n
  end
end

#path_namesArray<Symbol>

List all names of path lines in the graph

Returns:



140
141
142
# File 'lib/rgfa.rb', line 140

def path_names
  @paths.keys.compact
end

#read_file(filename) ⇒ self

Populates a RGFA instance reading from file with specified filename

Parameters:

Returns:

  • (self)

Raises:

  • if file cannot be opened for reading



182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/rgfa.rb', line 182

def read_file(filename)
  if @progress
    linecount = `wc -l #{filename}`.strip.split(" ")[0].to_i
    progress_log_init(:read_file, "lines", linecount,
                      "Parse file with #{linecount} lines")
  end
  File.foreach(filename) do |line|
    self << line.chomp
    progress_log(:read_file) if @progress
  end
  progress_log_end(:read_file) if @progress
  validate! if @validate >= 1
  self
end

#require_segments_first_ordervoid

This method returns an undefined value.

Require that the links, containments and paths referring to a segment are added after the segment. Default: do not require any particular ordering.



121
122
123
# File 'lib/rgfa.rb', line 121

def require_segments_first_order
  @segments_first_order = true
end

#segment_namesArray<Symbol>

List all names of segments in the graph

Returns:



134
135
136
# File 'lib/rgfa.rb', line 134

def segment_names
  @segments.keys.compact
end

#to_file(filename) ⇒ void

This method returns an undefined value.

Write RGFA to file with specified filename; overwrites it if it exists

Parameters:

Raises:

  • if file cannot be opened for writing



213
214
215
# File 'lib/rgfa.rb', line 213

def to_file(filename)
  File.open(filename, "w") {|f| each_line {|l| f.puts l}}
end

#to_rgfaself

Return the gfa itself

Returns:

  • (self)


164
165
166
# File 'lib/rgfa.rb', line 164

def to_rgfa
  self
end

#to_sString

Creates a string representation of RGFA conforming to the current specifications

Returns:



156
157
158
159
160
# File 'lib/rgfa.rb', line 156

def to_s
  s = ""
  each_line {|line| s << line.to_s; s << "\n"}
  return s
end

#turn_off_validationsvoid

This method returns an undefined value.

Set the validation level to 0. See “Validation level” under RGFA::Line#initialize.



128
129
130
# File 'lib/rgfa.rb', line 128

def turn_off_validations
  @validate = 0
end

#validate!void

This method returns an undefined value.

Post-validation of the RGFA

Raises:

  • if validation fails



147
148
149
150
151
# File 'lib/rgfa.rb', line 147

def validate!
  validate_segment_references!
  validate_path_links!
  return nil
end