Class: Bio::KEGG::GENES

Inherits:

Bio::KEGGDB

Object
DB
NCBIDB
Bio::KEGGDB
Bio::KEGG::GENES

show all

Includes:: Common::DblinksAsHash, Common::OrthologsAsHash, Common::PathwaysAsHash

Defined in:: lib/bio/db/kegg/genes.rb

Overview

Description

KEGG GENES entry parser.

References

www.genome.jp/kegg/genes.html

Constant Summary collapse

DELIMITER =

RS = "\n///\n"

TAGSIZE =

Instance Method Summary collapse

#aalen ⇒ Object

Returns length of the amino acid sequence described in the AASEQ lines.
#aaseq ⇒ Object

Returns amino acid sequence described in the AASEQ lines.
#chromosome ⇒ Object

Chromosome described in the POSITION line.
#codon_usage(codon = nil) ⇒ Object

Codon usage data described in the CODON_USAGE lines.
#cu_list ⇒ Object

Codon usage data described in the CODON_USAGE lines as an array.
#dblinks_as_hash ⇒ Object (also: #dblinks)

Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field.
#dblinks_as_strings ⇒ Object

Links to other databases described in the DBLINKS lines.
#definition ⇒ Object

Definition of the entry, described in the DEFINITION line.
#division ⇒ Object

Division of the entry, described in the ENTRY line.
#eclinks ⇒ Object

Enzyme’s EC numbers shown in the DEFINITION line.
#entry ⇒ Object

Returns the “ENTRY” line content as a Hash.
#entry_id ⇒ Object

ID of the entry, described in the ENTRY line.
#gbposition ⇒ Object

The position in the genome described in the POSITION line as GenBank feature table location formatted string.
#gene ⇒ Object

The method will be deprecated.
#genes ⇒ Object

The method will be deprecated.
#initialize(entry) ⇒ GENES constructor

Creates a new Bio::KEGG::GENES object.
#keggclass ⇒ Object

Returns CLASS field of the entry.
#keggclasses ⇒ Object

Returns an Array of biological classes in CLASS field.
#locations ⇒ Object

The position in the genome described in the POSITION line as Bio::Locations object.
#motif ⇒ Object

The specification of the method will be changed in the future.
#motifs_as_hash ⇒ Object (also: #motifs)

Motif information described in the MOTIF lines.
#motifs_as_strings ⇒ Object

Motif information described in the MOTIF lines.
#name ⇒ Object

Returns the NAME line.
#names_as_array ⇒ Object (also: #names)

Names of the entry as an Array, described in the NAME line.
#ntlen ⇒ Object (also: #nalen)

Returns nucleic acid sequence length.
#ntseq ⇒ Object (also: #naseq)

Returns nucleic acid sequence described in the NTSEQ lines.
#organism ⇒ Object

Organism name of the entry, described in the ENTRY line.
#orthologs_as_hash ⇒ Object (also: #orthologs)

Returns a Hash of the orthology ID and definition in ORTHOLOGY field.
#orthologs_as_strings ⇒ Object

Orthologs described in the ORTHOLOGY lines.
#pathway ⇒ Object

Returns the PATHWAY lines as a String.
#pathways_as_hash ⇒ Object (also: #pathways)

Returns a Hash of the pathway ID and name in PATHWAY field.
#pathways_as_strings ⇒ Object

Pathways described in the PATHWAY lines.
#position ⇒ Object

The position in the genome described in the POSITION line.
#structure ⇒ Object (also: #structures)

Returns structure ID information described in the STRUCTURE lines.

Methods inherited from DB

#exists?, #fetch, #get, open, #tags

Constructor Details

#initialize(entry) ⇒ `GENES`

Creates a new Bio::KEGG::GENES object.

Arguments:

(required) entry: (String) single entry as a string

Returns: Bio::KEGG::GENES object



115
116
117

# File 'lib/bio/db/kegg/genes.rb', line 115

def initialize(entry)
  super(entry, TAGSIZE)
end

Instance Method Details

#aalen ⇒ `Object`

Returns length of the amino acid sequence described in the AASEQ lines.

Returns: Integer



393
394
395

# File 'lib/bio/db/kegg/genes.rb', line 393

def aalen
  fetch('AASEQ')[/\d+/].to_i
end

#aaseq ⇒ `Object`

Returns amino acid sequence described in the AASEQ lines.

Returns: Bio::Sequence::AA object

# File 'lib/bio/db/kegg/genes.rb', line 383

def aaseq
  unless @data['AASEQ']
    @data['AASEQ'] = Bio::Sequence::AA.new(fetch('AASEQ').gsub(/\d+/, ''))
  end
  @data['AASEQ']
end

#chromosome ⇒ `Object`

Chromosome described in the POSITION line.

Returns: String or nil

# File 'lib/bio/db/kegg/genes.rb', line 264

def chromosome
  if position[/:/]
    position.sub(/:.*/, '')
  elsif ! position[/\.\./]
    position
  else
    nil
  end
end

#codon_usage(codon = nil) ⇒ `Object`

Codon usage data described in the CODON_USAGE lines. (Deprecated: no more exists)

Returns: Hash

# File 'lib/bio/db/kegg/genes.rb', line 350

def codon_usage(codon = nil)
  unless @data['CODON_USAGE']
    hash = Hash.new
    list = cu_list
    base = %w(t c a g)
    base.each_with_index do |x, i|
      base.each_with_index do |y, j|
        base.each_with_index do |z, k|
          hash["#{x}#{y}#{z}"] = list[i*16 + j*4 + k]
        end
      end
    end
    @data['CODON_USAGE'] = hash
  end
  @data['CODON_USAGE']
end

#cu_list ⇒ `Object`

Codon usage data described in the CODON_USAGE lines as an array.

Returns: Array

# File 'lib/bio/db/kegg/genes.rb', line 370

def cu_list
  ary = []
  get('CODON_USAGE').sub(/.*/,'').each_line do |line|	# cut 1st line
    line.chomp.sub(/^.{11}/, '').scan(/..../) do |cu|
      ary.push(cu.to_i)
    end
  end
  return ary
end

#dblinks_as_hash ⇒ `Object` Also known as: dblinks

Returns a Hash of the DB name and an Array of entry IDs in DBLINKS field.

97	# File 'lib/bio/db/kegg/genes.rb', line 97 def dblinks_as_hash; super; end

#dblinks_as_strings ⇒ `Object`

Links to other databases described in the DBLINKS lines.

Returns: Array containing String objects



332
333
334

# File 'lib/bio/db/kegg/genes.rb', line 332

def dblinks_as_strings
  lines_fetch('DBLINKS')
end

#definition ⇒ `Object`

Definition of the entry, described in the DEFINITION line.

Returns: String



199
200
201

# File 'lib/bio/db/kegg/genes.rb', line 199

def definition
  field_fetch('DEFINITION')
end

#division ⇒ `Object`

Division of the entry, described in the ENTRY line.

Returns: String



149
150
151

# File 'lib/bio/db/kegg/genes.rb', line 149

def division
  entry['division']			# CDS, tRNA etc.
end

#eclinks ⇒ `Object`

Enzyme’s EC numbers shown in the DEFINITION line.

Returns: Array containing String

# File 'lib/bio/db/kegg/genes.rb', line 206

def eclinks
  unless defined? @eclinks
    ec_list = 
      definition.slice(/\[EC\:([^\]]+)\]/, 1) ||
      definition.slice(/\(EC\:([^\)]+)\)/, 1)
    ary = ec_list ? ec_list.strip.split(/\s+/) : []
    @eclinks = ary
  end
  @eclinks
end

#entry ⇒ `Object`

Returns the “ENTRY” line content as a Hash. For example,

{"organism"=>"E.coli", "division"=>"CDS", "id"=>"b0356"}

Returns: Hash

# File 'lib/bio/db/kegg/genes.rb', line 125

def entry
  unless @data['ENTRY']
    hash = Hash.new('')
    if get('ENTRY').length > 30
      e = get('ENTRY')
      hash['id']       = e[12..29].strip
      hash['division'] = e[30..39].strip
      hash['organism'] = e[40..80].strip
    end
    @data['ENTRY'] = hash
  end
  @data['ENTRY']
end

#entry_id ⇒ `Object`

ID of the entry, described in the ENTRY line.

Returns: String



142
143
144

# File 'lib/bio/db/kegg/genes.rb', line 142

def entry_id
  entry['id']
end

#gbposition ⇒ `Object`

The position in the genome described in the POSITION line as GenBank feature table location formatted string.

Returns: String



278
279
280

# File 'lib/bio/db/kegg/genes.rb', line 278

def gbposition
  position.sub(/.*?:/, '')
end

#gene ⇒ `Object`

The method will be deprecated. Use entry.names.first instead.

Returns the first gene name described in the NAME line.

Returns: String



192
193
194

# File 'lib/bio/db/kegg/genes.rb', line 192

def gene
  genes.first
end

#genes ⇒ `Object`

The method will be deprecated. Use Bio::KEGG::GENES#names.

Names of the entry as an Array, described in the NAME line.

Returns: Array containing String



182
183
184

# File 'lib/bio/db/kegg/genes.rb', line 182

def genes
  names_as_array
end

#keggclass ⇒ `Object`

Returns CLASS field of the entry.



242
243
244

# File 'lib/bio/db/kegg/genes.rb', line 242

def keggclass
  field_fetch('CLASS')
end

#keggclasses ⇒ `Object`

Returns an Array of biological classes in CLASS field.



247
248
249

# File 'lib/bio/db/kegg/genes.rb', line 247

def keggclasses
  keggclass.gsub(/ \[[^\]]+/, '').split(/\] ?/)
end

#locations ⇒ `Object`

The position in the genome described in the POSITION line as Bio::Locations object.

Returns: Bio::Locations object



286
287
288

# File 'lib/bio/db/kegg/genes.rb', line 286

def locations
  Bio::Locations.new(gbposition)
end

#motif ⇒ `Object`

The specification of the method will be changed in the future. Please use Bio::KEGG::GENES#motifs.

Motif information described in the MOTIF lines.

Returns: Hash



325
326
327

# File 'lib/bio/db/kegg/genes.rb', line 325

def motif
  motifs
end

#motifs_as_hash ⇒ `Object` Also known as: motifs

Motif information described in the MOTIF lines.

Returns: Hash

# File 'lib/bio/db/kegg/genes.rb', line 300

def motifs_as_hash
  unless @data['MOTIF']
    hash = {}
    db = nil
    motifs_as_strings.each do |line|
      if line[/^\S+:/]
        db, str = line.split(/:/, 2)
      else
        str = line
      end
      hash[db] ||= []
      hash[db] += str.strip.split(/\s+/)
    end
    @data['MOTIF'] = hash
  end
  @data['MOTIF']		# Hash of Array of IDs in MOTIF
end

#motifs_as_strings ⇒ `Object`

Motif information described in the MOTIF lines.

Returns: Strings



293
294
295

# File 'lib/bio/db/kegg/genes.rb', line 293

def motifs_as_strings
  lines_fetch('MOTIF')
end

#name ⇒ `Object`

Returns the NAME line.

Returns: String



163
164
165

# File 'lib/bio/db/kegg/genes.rb', line 163

def name
  field_fetch('NAME')
end

#names_as_array ⇒ `Object` Also known as: names

Names of the entry as an Array, described in the NAME line.

Returns: Array containing String



171
172
173

# File 'lib/bio/db/kegg/genes.rb', line 171

def names_as_array
  name.split(', ')
end

#ntlen ⇒ `Object` Also known as: nalen

Returns nucleic acid sequence length.

Returns: Integer



411
412
413

# File 'lib/bio/db/kegg/genes.rb', line 411

def ntlen
  fetch('NTSEQ')[/\d+/].to_i
end

#ntseq ⇒ `Object` Also known as: naseq

Returns nucleic acid sequence described in the NTSEQ lines.

Returns: Bio::Sequence::NA object

# File 'lib/bio/db/kegg/genes.rb', line 400

def ntseq
  unless @data['NTSEQ']
    @data['NTSEQ'] = Bio::Sequence::NA.new(fetch('NTSEQ').gsub(/\d+/, ''))
  end
  @data['NTSEQ']
end

#organism ⇒ `Object`

Organism name of the entry, described in the ENTRY line.

Returns: String



156
157
158

# File 'lib/bio/db/kegg/genes.rb', line 156

def organism
  entry['organism']			# H.sapiens etc.
end

#orthologs_as_hash ⇒ `Object` Also known as: orthologs

Returns a Hash of the orthology ID and definition in ORTHOLOGY field.

107	# File 'lib/bio/db/kegg/genes.rb', line 107 def orthologs_as_hash; super; end

#orthologs_as_strings ⇒ `Object`

Orthologs described in the ORTHOLOGY lines.

Returns: Array containing String



220
221
222

# File 'lib/bio/db/kegg/genes.rb', line 220

def orthologs_as_strings
  lines_fetch('ORTHOLOGY')
end

#pathway ⇒ `Object`

Returns the PATHWAY lines as a String.

Returns: String

# File 'lib/bio/db/kegg/genes.rb', line 227

def pathway
  unless defined? @pathway
    @pathway = fetch('PATHWAY')
  end
  @pathway
end

#pathways_as_hash ⇒ `Object` Also known as: pathways

Returns a Hash of the pathway ID and name in PATHWAY field.

102	# File 'lib/bio/db/kegg/genes.rb', line 102 def pathways_as_hash; super; end

#pathways_as_strings ⇒ `Object`

Pathways described in the PATHWAY lines.

Returns: Array containing String



237
238
239

# File 'lib/bio/db/kegg/genes.rb', line 237

def pathways_as_strings
  lines_fetch('PATHWAY')
end

#position ⇒ `Object`

The position in the genome described in the POSITION line.

Returns: String

# File 'lib/bio/db/kegg/genes.rb', line 254

def position
  unless @data['POSITION']
    @data['POSITION'] = fetch('POSITION').gsub(/\s/, '')
  end
  @data['POSITION']
end

#structure ⇒ `Object` Also known as: structures

Returns structure ID information described in the STRUCTURE lines.

Returns: Array containing String

# File 'lib/bio/db/kegg/genes.rb', line 339

def structure
  unless @data['STRUCTURE']
    @data['STRUCTURE'] = fetch('STRUCTURE').sub(/(PDB: )*/,'').split(/\s+/)
  end
  @data['STRUCTURE'] # ['PDB:1A9X', ...]
end

Class: Bio::KEGG::GENES

Overview

Description

References

Constant Summary collapse

Instance Method Summary collapse

Methods inherited from DB

Constructor Details

#initialize(entry) ⇒ GENES

Instance Method Details

#aalen ⇒ Object

#aaseq ⇒ Object

#chromosome ⇒ Object

#codon_usage(codon = nil) ⇒ Object

#cu_list ⇒ Object

#dblinks_as_hash ⇒ Object Also known as: dblinks

#dblinks_as_strings ⇒ Object

#definition ⇒ Object

#division ⇒ Object

#eclinks ⇒ Object

#entry ⇒ Object

#entry_id ⇒ Object

#gbposition ⇒ Object

#gene ⇒ Object

#genes ⇒ Object

#keggclass ⇒ Object

#keggclasses ⇒ Object

#locations ⇒ Object

#motif ⇒ Object

#motifs_as_hash ⇒ Object Also known as: motifs

#motifs_as_strings ⇒ Object

#name ⇒ Object

#names_as_array ⇒ Object Also known as: names

#ntlen ⇒ Object Also known as: nalen

#ntseq ⇒ Object Also known as: naseq

#organism ⇒ Object

#orthologs_as_hash ⇒ Object Also known as: orthologs

#orthologs_as_strings ⇒ Object

#pathway ⇒ Object

#pathways_as_hash ⇒ Object Also known as: pathways

#pathways_as_strings ⇒ Object

#position ⇒ Object

#structure ⇒ Object Also known as: structures

#initialize(entry) ⇒ `GENES`

#aalen ⇒ `Object`

#aaseq ⇒ `Object`

#chromosome ⇒ `Object`

#codon_usage(codon = nil) ⇒ `Object`

#cu_list ⇒ `Object`

#dblinks_as_hash ⇒ `Object` Also known as: dblinks

#dblinks_as_strings ⇒ `Object`

#definition ⇒ `Object`

#division ⇒ `Object`

#eclinks ⇒ `Object`

#entry ⇒ `Object`

#entry_id ⇒ `Object`

#gbposition ⇒ `Object`

#gene ⇒ `Object`

#genes ⇒ `Object`

#keggclass ⇒ `Object`

#keggclasses ⇒ `Object`

#locations ⇒ `Object`

#motif ⇒ `Object`

#motifs_as_hash ⇒ `Object` Also known as: motifs

#motifs_as_strings ⇒ `Object`

#name ⇒ `Object`

#names_as_array ⇒ `Object` Also known as: names

#ntlen ⇒ `Object` Also known as: nalen

#ntseq ⇒ `Object` Also known as: naseq

#organism ⇒ `Object`

#orthologs_as_hash ⇒ `Object` Also known as: orthologs

#orthologs_as_strings ⇒ `Object`

#pathway ⇒ `Object`

#pathways_as_hash ⇒ `Object` Also known as: pathways

#pathways_as_strings ⇒ `Object`

#position ⇒ `Object`

#structure ⇒ `Object` Also known as: structures