Module: OboParser::Utilities

Defined in:
lib/utilities.rb

Constant Summary collapse

HOMOLONTO_HEADER =

Two column translation tools

%{
format-version: 1.2
auto-generated-by: obo_parser
default-namespace: fix_me

[Typedef]
id: OGEE:has_member
name: has_member
is_a: OBO_REL:relationship
def: "C has_member C', C is an homology group and C' is a biological object" []
comment: "We leave open the possibility that an homology group is a biological object. Thus, an homology group C may have C' has_member, with C' being an homology group."
is_transitive: true
is_anti_symmetric: true

}

Class Method Summary collapse

Class Method Details

.column_translate(options = {}) ⇒ String

Takes a two column input file, references it to two ontologies, and provides a report.

Example use

file = File.read('HAO_TGMA_list.txt')
col1_obo = File.read('hao.obo')
col2_obo = File.read('tgma.obo')

OboParser::Utilities.column_translate(:data => file, :col1_obo => col1_obo, :col2_obo => col2_obo, :output => :homolonto)

Output types

There are several output report types

:xls - Translates the columns in the data_file to the option passed in :translate_to, the first matching against col1_obo, the second against col2_obo.  Returns an Excel file.
:homolonto - Generates a homolonto compatible file to STDOUT
:cols - Prints a two column format to STDOUT

Parameters:

  • options (Hash) (defaults to: {})

    options.

  • data (Symbol)

    the two column data file.

Returns:

  • (String)

    the transation in tab delimted format.



126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# File 'lib/utilities.rb', line 126

def self.column_translate(options = {})
  opt = {
    :data => nil,
    :col1_obo => nil,
    :col2_obo => nil,
    :translate_to => :id,        # also :label
    :output => :cols,            # also :xls, :homolonto, :parent_match
    :parent_match_to => :is_a,   # only used when :output == :parent_match
    :output_filename => 'foo',
    :index_start => 0
  }.merge!(options)

  c1obo = parse_obo_file(opt[:col1_obo])
  c2obo = parse_obo_file(opt[:col2_obo])

  case opt[:output]
  when :xls
    Spreadsheet.client_encoding = 'UTF-8'
    book = Spreadsheet::Workbook.new
    sheet = book.create_worksheet
  when :homolonto
    s = HOMOLONTO_HEADER
    opt[:translate_to] = :id # force this in this mode
  end

  i = opt[:index_start]
  v1 = nil # a label like 'head'
  v2 = nil
  c1 = nil # an id 'FOO:123'
  c2 = nil

  opt[:data].split(/\n/).each do |row|
    i += 1
    c1, c2 =  row.split(/\t/).map(&:strip)

    if c1.nil? || c2.nil?
      puts
      next
    end

    # the conversion
    if opt[:translate_to] == :id
      if c1 =~ /.*\:.*/ # it's an id, leave it
        v1 = c1
      else
        v1 = c1obo.term_hash[c1]
      end
      if c2 =~ /.*\:.*/ 
        v2 = c2
      else
        v2 = c2obo.term_hash[c2]
      end
    else
      if c1 =~ /.*\:.*/ 
        v1 = c1obo.id_hash[c1]
      else
        v1 = c1
      end
      if c2 =~ /.*\:.*/ 
        v2 = c2obo.id_hash[c2]
      else
        v2 = c2
      end
    end

    case opt[:output]
    when :cols
      puts "#{v1}\t#{v2}"
    when :xls
      sheet[i,0] = v1
      sheet[i,1] = OboParser::Utilities.term_stanza_from_file(v1, opt[:col1_obo])
      sheet[i,2] = v2
      sheet[i,3] = OboParser::Utilities.term_stanza_from_file(v2, opt[:col2_obo])
    when :homolonto
      s << OboParser::Utilities.homolonto_stanza(i, c1obo.id_hash[v1] , v1, v2) # "#{c1obo.id_hash[v1]} ! #{c2obo.id_hash[v2]}"
      s << "\n\n"
    end
  end

  case opt[:output]
  when :xls
    book.write "#{opt[:output_filename]}.xls"
  when :homolonto 
    puts s + "\n"
  end

  true
end

.cytoscapify(options = {}) ⇒ Object

Takes a Hash of OBO ontology files, an Array of relationships, and writes two input files (a network, and node properties) for Cytoscape

Example use

OboParser::Utilities.cytoscapify(:ontologies => {‘HAO’ => File.read(‘input/hao.obo’), ‘TADS’ => File.read(‘input/tads.obo’), ‘TGMA’ => File.read(‘input/tgma.obo’), ‘FBBT’ => File.read(‘input/fbbt.obo’) }, :properties => [‘is_a’, ‘part_of’])

TODO: @return File1, File2, Filen

Parameters:

  • ontologies (Symbol)

    a Hash of #read files as values, keys as working names

  • properties (Symbol)

    an Array of properties like [‘is_a’, ‘part_of’]



301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/utilities.rb', line 301

def self.cytoscapify(options = {})
  opt = {
    :ontologies => {},
    :properties => []
  }.merge!(options)

  return false if opt[:properties].empty?
  return false if opt[:ontologies].empty?

  nodes = File.new("nodes.tab", "w+")
  edges = File.new("edges.eda", "w+")

  opt[:ontologies].keys.each do |k|

    obo_file = parse_obo_file(opt[:ontologies][k])

    obo_file.terms.each do |t|
      nodes.puts [t.id.value, t.name.value, k].join("\t") + "\n"

      t.relationships.each do |rel, id|
        edges.puts [t.id.value, "(#{rel})", id].join("\t") + "\n" if opt[:properties].include?(rel)
      end
    end
  end

  nodes.close
  edges.close

  true

end

.dump_comparison_by_id(cutoff = 0, files = []) ⇒ String

Summarizes labels used by id in a two column tab delimited format Providing a cutoff will report only those ids/labels with > 1 label per id Does not (yet) include reference to synonyms, this could be easily extended.

Example use

of1 = File.read(‘foo1.obo’) of2 = File.read(‘foo2.obo’) of3 = File.read(‘foo3.obo’) of4 = File.read(‘foo4.obo’)

OboParser::Utilities.dump_comparison_by_id(0,[of1, of2, of3, of4])

Parameters:

  • cutoff (Integer) (defaults to: 0)

    only Term ids with > cutoff labels will be reported

  • files (Array) (defaults to: [])

    an Array of read files

Returns:

  • (String)

    the transation in tab delimted format



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/utilities.rb', line 22

def self.dump_comparison_by_id(cutoff = 0, files = [])
  return '' if files.size < 1

  of = [] 
  files.each_with_index do |f, i|
    of[i] = parse_obo_file(f)	
  end

  all_data = {}

  of.each do |f|
    tmp_hash = f.id_hash
    tmp_hash.keys.each do |id|
      if all_data[id]
        all_data[id].push(tmp_hash[id])
      else
        all_data[id] = [tmp_hash[id]]
      end
    end
  end

  all_data.keys.sort.each do |k|
    if all_data[k].uniq.size > cutoff 
      puts "#{k}\t#{all_data[k].uniq.join(', ')}"
    end
  end
end

.hashify_pairs(options = {}) ⇒ Hash

Takes a two column input file, references it to two ontologies, and returns a hash

Example use

file = File.read('HAO_TGMA_list.txt')
col1_obo = File.read('hao.obo')
col2_obo = File.read('tgma.obo')

OboParser::Utilities.hashify_pairs(:data => file, :col1_obo => col1_obo, :col2_obo => col2_obo)

Parameters:

  • options (Hash) (defaults to: {})

    options.

  • data (Symbol)

    the two column data file.

  • colo1_obo (Symbol)

    the OBO file referenced in the first column

  • colo2_obo (Symbol)

    the OBO file referenced in the second column

Returns:

  • (Hash)

    a hash of string => id string



229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/utilities.rb', line 229

def self.hashify_pairs(options = {})
  opt = {
    :data => nil,
    :col1_obo => nil,
    :col2_obo => nil,
  }.merge!(options)
  
  c1obo = parse_obo_file(opt[:col1_obo])
  c2obo = parse_obo_file(opt[:col2_obo])

  hash = Hash.new

  i = opt[:index_start]
  v1 = nil # a label like 'head'
  v2 = nil
  c1 = nil # an id 'FOO:123'
  c2 = nil

  opt[:data].split(/\n/).each do |row|
    i += 1
    c1, c2 =  row.split(/\t/).map(&:strip)

    if c1.nil? || c2.nil?
      next
    end

    # the conversion
    if c1 =~ /.*\:.*/ # it's an id, leave it
      v1 = c1
    else
      v1 = c1obo.term_hash[c1]
    end
    if c2 =~ /.*\:.*/ 
      v2 = c2
    else
      v2 = c2obo.term_hash[c2]
    end
 
   hash.merge!(c1 => c2) 
  
  end
  return hash
end

.homolonto_stanza(id, name, *members) ⇒ String

Returns a HomolOnto Stanza

Parameters:

  • id (String)

    an externally tracked id for the id: tag like ‘00001’

  • name (String)

    a name for the name: tag

  • members (Array)

    a Array of 2 or more members for the relationship: has_member tag like [‘FOO:123’, ‘BAR:456’]

Returns:

  • (String)

    the stanza requested



280
281
282
283
284
285
286
287
288
289
290
# File 'lib/utilities.rb', line 280

def self.homolonto_stanza(id, name, *members)
  return 'NOT ENOUGH RELATIONSHIPS' if members.length < 2
  s = []
  s << '[Term]'
  s << "id: HOG:#{id}"
  s << "name: #{name}"
  members.each do |m|
    s << "relationship: has_member #{m}"
  end
  s.join("\n")
end

.parents(options = {}) ⇒ Hash

Takes a two column input file, references it to two ontologies, and returns a report that identifies data pairs that have parents who are also a data pair given a provided property/relation type.

Example use

file = File.read('HAO_TGMA_list.txt')
col1_obo = File.read('hao.obo')
col2_obo = File.read('tgma.obo')

foo = OboParser::Utilities.parents(:data => data, :col1_obo => col1_obo, :col2_obo => col2_obo, :property => ‘is_a’)

puts “– NO (#.size)n” puts foo.join(“n”) puts “– YES (#.size)n” puts foo.join(“n”)

Parameters:

  • options (Hash) (defaults to: {})

    options.

  • data (Symbol)

    the two column data file.

  • colo1_obo (Symbol)

    the OBO file referenced in the first column

  • colo2_obo (Symbol)

    the OBO file referenced in the second column

  • property (Symbol)

    the OBO relationship/property to check against (e.g. ‘is_a’, ‘part_of’)

Returns:

  • (Hash)

    a hash of => {, :no => {}}



356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
# File 'lib/utilities.rb', line 356

def self.parents(options = {})
  opt = {
    :data => nil,
    :col1_obo => nil,
    :col2_obo => nil,
    :property => nil
  }.merge!(options)

  return false if opt[:property].nil? 
  c1obo = parse_obo_file(opt[:col1_obo])
  c2obo = parse_obo_file(opt[:col2_obo])

  result = {:yes => [], :no => [], :unplaced => []}
  # update
  hash =  hashify_pairs(:data => opt[:data], :col1_obo => opt[:col1_obo], :col2_obo =>  opt[:col2_obo])

  obo1_hash = c1obo.id_index
  obo2_hash = c2obo.id_index

  hash.keys.each do |k|
    a = k
    b = hash[a]

    ids_1 = []
    ids_2 = []

    if !obo1_hash[a]
      puts "can't find #{k}\n"
      next
    end

    if !obo2_hash[b]
      puts "can't find #{k}\n"
      next
    end

    obo1_hash[a].relationships.each do |rel, id| 
      if rel == opt[:property] 
        ids_1.push id
      end
    end

    obo2_hash[b].relationships.each do |rel, id|
      if rel == opt[:property] 
        ids_2.push id
      end
    end

    unplaced = true

    ids_1.each do |c|
      ids_2.each do |d|
        t = "#{a} -> #{b}"
        if hash[c] == d
          result[:yes].push(t)
          unplaced = false
          next # don't add again after we find a hit
        else
          result[:no].push(t)
          unplaced = false
        end
      end
    end
    result[:unplaced] 

  end

  result
end

.shared_labels(files = []) ⇒ String

Returns all labels found in all passed ontologies. Does not yet include synonyms.

Example use

of1 = File.read('fly_anatomy.obo')	
of2 = File.read('hao.obo')	
of3 = File.read('mosquito_anatomy.obo')	

OboParser::Utilities.shared_labels([of1, of3])

Parameters:

  • files (Array) (defaults to: [])

    an Array of read files

Returns:

  • (String)

    lables, one per line



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/utilities.rb', line 61

def self.shared_labels(files = []) 
  comparison = {}

  files.each do |f|
    o = parse_obo_file(f)
    o.term_hash.keys.each do |k|
      tmp = k.gsub(/adult/, "").strip
      tmp = k.gsub(/embryonic\/larval/, "").strip
      if comparison[tmp]
        comparison[tmp] += 1
      else
        comparison.merge!(tmp => 1)
      end
    end
  end

  match = [] 
  comparison.keys.each do |k|
    if comparison[k] == files.size 
      match.push k
    end
  end

  puts match.sort.join("\n")
  puts "\n#{match.length} total."

end

.term_stanza_from_file(id, file) ⇒ String

Given a Term id and a String representing an OBO file returns that stanza.

Parameters:

  • id (String)

    a Term id like ‘FOO:123’

  • file (String)

    a Obo file as a String like File.read(‘my.obo’)

Returns:

  • (String)

    the stanza requested



436
437
438
439
440
441
# File 'lib/utilities.rb', line 436

def self.term_stanza_from_file(id, file)
  foo = ""
  file =~ /(^\[Term\]\s*?id:\s*?#{id}.*?)(^\[Term\]|^\[Typedef\])/im
  foo = $1 if !$1.nil?
  foo.gsub(/\n\r/,"\n")
end