Class: EuPathDBGeneInformationFileExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/eupathdb_gene_information_table.rb

Overview

A class for extracting gene info from a particular gene from the information file

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filename = nil) ⇒ EuPathDBGeneInformationFileExtractor

Returns a new instance of EuPathDBGeneInformationFileExtractor.



11
12
13
# File 'lib/eupathdb_gene_information_table.rb', line 11

def initialize(filename = nil)
  @filename = filename
end

Instance Attribute Details

#filenameObject

A filename path to the gene information file



9
10
11
# File 'lib/eupathdb_gene_information_table.rb', line 9

def filename
  @filename
end

Instance Method Details

#extract_gene_info(wanted_gene_id, grep_hack_lines = nil) ⇒ Object

Returns a EuPathDBGeneInformation object corresponding to the wanted key. If there are multiple in the file, only the first is returned. If none are found, nil is returned.

If grep_hack_lines is defined (as an integer), then a shortcut is applied to speed things up. Before parsing the gene info file, grep some lines after the “Gene Id: ..” line. Then feed that into the parser.



19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/eupathdb_gene_information_table.rb', line 19

def extract_gene_info(wanted_gene_id, grep_hack_lines = nil)
  inside_iterator = lambda do |gene|
    return gene if wanted_gene_id == gene.get_info('Gene Id')
  end
  
  filename = @filename
  if grep_hack_lines and grep_hack_lines.to_i != 0
    Tempfile.new('reubypathdb_grep_hack') do |tempfile|
      # grep however many lines from past the point. Rather dodgy, but faster.
      raise Exception, "grep_hack_lines should be an integer" unless grep_hack_lines.is_a?(Integer)
      `grep -A #{grep_hack_lines} 'Gene Id: #{wanted_gene_id}' '#{@filename}' >#{tempfile.path}`
      EuPathDBGeneInformationTable.new(File.open(tempfile.path)).each do |gene|
        inside_iterator.call(gene)
      end
    end
  else
    # no grep hack. Parse the whole gene information file
    EuPathDBGeneInformationTable.new(File.open(@filename)).each do |gene|
      inside_iterator.call(gene)
    end
  end
  return nil
end