Class: Bio::KEGG::Keggtab
Overview
Description
Parse ‘keggtab’ KEGG database definition file which also includes Taxonomic category of the KEGG organisms.
References
The ‘keggtab’ file is included in
Format
File format is something like
# KEGGTAB
#
# name type directory abbreviation
#
enzyme enzyme $BIOROOT/db/ideas/ligand ec
ec alias enzyme
(snip)
# Human
h.sapiens genes $BIOROOT/db/kegg/genes hsa
H.sapiens alias h.sapiens
hsa alias h.sapiens
(snip)
#
# Taxonomy
#
(snip)
animals alias hsa+mmu+rno+dre+dme+cel
eukaryotes alias animals+plants+protists+fungi
genes alias eubacteria+archaea+eukaryotes
Defined Under Namespace
Classes: DB
Instance Attribute Summary collapse
-
#bioroot ⇒ Object
readonly
Returns a string of the BIOROOT path prefix.
-
#db_names ⇒ Object
readonly
Returns the value of attribute db_names.
Instance Method Summary collapse
-
#alias_list(db_name) ⇒ Object
deprecated.
-
#aliases(db_abbrev) ⇒ Object
Returns an Array containing all alias names for the database.
- #child_nodes(node = 'genes') ⇒ Object
-
#database(db_abbrev = nil) ⇒ Object
Returns a hash containing DB definition section of the keggtab file.
-
#db_by_abbrev(db_abbrev) ⇒ Object
deprecated.
-
#db_path(db_name) ⇒ Object
deprecated.
-
#db_path_by_abbrev(db_abbrev) ⇒ Object
deprecated.
-
#initialize(file_path, bioroot = nil) ⇒ Keggtab
constructor
Path for keggtab file and optionally set bioroot top directory.
-
#korg2taxo(keggorg) ⇒ Object
(also: #keggorg2taxo, #korg2taxonomy, #keggorg2taxonomy)
Returns an array of taxonomy names the organism belongs.
-
#name(db_abbrev) ⇒ Object
Returns a canonical database name for the abbreviation.
-
#name_by_abbrev(db_abbrev) ⇒ Object
deprecated.
-
#path(db_abbrev) ⇒ Object
Returns an absolute path for the flat file database.
-
#taxa_list ⇒ Object
List of all node labels from Taxonomy section.
-
#taxo2korgs(node = 'genes') ⇒ Object
(also: #taxo2keggorgs, #taxon2korgs, #taxon2keggorgs)
Returns an array of organism names included in the specified taxon label.
-
#taxonomy(node = nil) ⇒ Object
Returns a hash containing Taxonomy section of the keggtab file.
Constructor Details
#initialize(file_path, bioroot = nil) ⇒ Keggtab
Path for keggtab file and optionally set bioroot top directory. Environmental variable BIOROOT overrides bioroot.
54 55 56 57 58 59 60 61 62 |
# File 'lib/bio/db/kegg/keggtab.rb', line 54 def initialize(file_path, bioroot = nil) @bioroot = ENV['BIOROOT'] || bioroot @db_names = Hash.new @database = Hash.new @taxonomy = Hash.new File.open(file_path) do |f| parse_keggtab(f.read) end end |
Instance Attribute Details
#bioroot ⇒ Object (readonly)
Returns a string of the BIOROOT path prefix.
65 66 67 |
# File 'lib/bio/db/kegg/keggtab.rb', line 65 def bioroot @bioroot end |
#db_names ⇒ Object (readonly)
Returns the value of attribute db_names.
66 67 68 |
# File 'lib/bio/db/kegg/keggtab.rb', line 66 def db_names @db_names end |
Instance Method Details
#alias_list(db_name) ⇒ Object
deprecated
141 142 143 144 145 |
# File 'lib/bio/db/kegg/keggtab.rb', line 141 def alias_list(db_name) if @db_names[db_name] @db_names[db_name].aliases end end |
#aliases(db_abbrev) ⇒ Object
Returns an Array containing all alias names for the database. (e.g. ‘hsa’ -> [“H.sapiens”, “hsa”], ‘hpj’ -> [“H.pylori_J99”, “hpj”])
112 113 114 115 116 |
# File 'lib/bio/db/kegg/keggtab.rb', line 112 def aliases(db_abbrev) if @database[db_abbrev] @database[db_abbrev].aliases end end |
#child_nodes(node = 'genes') ⇒ Object
196 197 198 |
# File 'lib/bio/db/kegg/keggtab.rb', line 196 def child_nodes(node = 'genes') return @taxonomy[node] end |
#database(db_abbrev = nil) ⇒ Object
Returns a hash containing DB definition section of the keggtab file. If database name is given as an argument, returns a Keggtab::DB object.
102 103 104 105 106 107 108 |
# File 'lib/bio/db/kegg/keggtab.rb', line 102 def database(db_abbrev = nil) if db_abbrev @database[db_abbrev] else @database end end |
#db_by_abbrev(db_abbrev) ⇒ Object
deprecated
157 158 159 160 161 162 |
# File 'lib/bio/db/kegg/keggtab.rb', line 157 def db_by_abbrev(db_abbrev) @db_names.each do |k, db| return db if db.abbrev == db_abbrev end return nil end |
#db_path(db_name) ⇒ Object
deprecated
148 149 150 151 152 153 154 |
# File 'lib/bio/db/kegg/keggtab.rb', line 148 def db_path(db_name) if @bioroot "#{@db_names[db_name].path.sub(/\$BIOROOT/,@bioroot)}/#{db_name}" else "#{@db_names[db_name].path}/#{db_name}" end end |
#db_path_by_abbrev(db_abbrev) ⇒ Object
deprecated
170 171 172 173 |
# File 'lib/bio/db/kegg/keggtab.rb', line 170 def db_path_by_abbrev(db_abbrev) db_name = name_by_abbrev(db_abbrev) db_path(db_name) end |
#korg2taxo(keggorg) ⇒ Object Also known as: keggorg2taxo, korg2taxonomy, keggorg2taxonomy
Returns an array of taxonomy names the organism belongs. (e.g. ‘eco’ -> [‘proteogamma’,‘proteobacteria’,‘eubacteria’,‘genes’]) This method has aliases as keggorg2taxo, korg2taxonomy, keggorg2taxonomy.
225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
# File 'lib/bio/db/kegg/keggtab.rb', line 225 def korg2taxo(keggorg) tmp = Array.new traverse = Proc.new {|keggorg| @taxonomy.each do |k,v| if v.include?(keggorg) tmp.push(k) traverse.call(k) break end end } traverse.call(keggorg) return tmp end |
#name(db_abbrev) ⇒ Object
Returns a canonical database name for the abbreviation. (e.g. ‘ec’ -> ‘enzyme’, ‘hsa’ -> ‘h.sapies’, …)
120 121 122 123 124 |
# File 'lib/bio/db/kegg/keggtab.rb', line 120 def name(db_abbrev) if @database[db_abbrev] @database[db_abbrev].name end end |
#name_by_abbrev(db_abbrev) ⇒ Object
deprecated
165 166 167 |
# File 'lib/bio/db/kegg/keggtab.rb', line 165 def name_by_abbrev(db_abbrev) db_by_abbrev(db_abbrev).name end |
#path(db_abbrev) ⇒ Object
Returns an absolute path for the flat file database. (e.g. ‘/bio/db/kegg/genes’, …)
128 129 130 131 132 133 134 135 136 137 |
# File 'lib/bio/db/kegg/keggtab.rb', line 128 def path(db_abbrev) if @database[db_abbrev] file = @database[db_abbrev].name if @bioroot "#{@database[db_abbrev].path.sub(/\$BIOROOT/,@bioroot)}/#{file}" else "#{@database[db_abbrev].path}/#{file}" end end end |
#taxa_list ⇒ Object
List of all node labels from Taxonomy section. (e.g. [“actinobacteria”, “animals”, “archaea”, “bacillales”, …)
192 193 194 |
# File 'lib/bio/db/kegg/keggtab.rb', line 192 def taxa_list @taxonomy.keys.sort end |
#taxo2korgs(node = 'genes') ⇒ Object Also known as: taxo2keggorgs, taxon2korgs, taxon2keggorgs
Returns an array of organism names included in the specified taxon label. (e.g. ‘proteobeta’ -> [“nme”, “nma”, “rso”]) This method has taxo2keggorgs, taxon2korgs, and taxon2keggorgs aliases.
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/bio/db/kegg/keggtab.rb', line 203 def taxo2korgs(node = 'genes') if node.length == 3 return node else if @taxonomy[node] tmp = Array.new @taxonomy[node].each do |x| tmp.push(taxo2korgs(x)) end return tmp else return nil end end end |
#taxonomy(node = nil) ⇒ Object
Returns a hash containing Taxonomy section of the keggtab file. If argument is given, returns a List of all child nodes belongs to the label node. (e.g. “eukaryotes” -> [“animals”, “plants”, “protists”, “fungi”], …)
182 183 184 185 186 187 188 |
# File 'lib/bio/db/kegg/keggtab.rb', line 182 def taxonomy(node = nil) if node @taxonomy[node] else @taxonomy end end |