Class: Bio::SPTR
- Includes:
- EMBLDB::Common
- Defined in:
- lib/bio/db/embl/sptr.rb
Overview
Parser class for UniProtKB/SwissProt and TrEMBL database entry.
Constant Summary collapse
- @@entry_regrexp =
/[A-Z0-9]{1,4}_[A-Z0-9]{1,5}/
- @@data_class =
["STANDARD", "PRELIMINARY"]
- @@ac_regrexp =
Bio::EMBLDB::Common#ac -> ary
#accessions -> ary #accession -> String (accessions.first)
/[OPQ][0-9][A-Z0-9]{3}[0-9]/
- @@cc_topics =
['PHARMACEUTICAL', 'BIOTECHNOLOGY', 'TOXIC DOSE', 'ALLERGEN', 'RNA EDITING', 'POLYMORPHISM', 'BIOPHYSICOCHEMICAL PROPERTIES', 'MASS SPECTROMETRY', 'WEB RESOURCE', 'ENZYME REGULATION', 'DISEASE', 'INTERACTION', 'DEVELOPMENTAL STAGE', 'INDUCTION', 'CAUTION', 'ALTERNATIVE PRODUCTS', 'DOMAIN', 'PTM', 'MISCELLANEOUS', 'TISSUE SPECIFICITY', 'COFACTOR', 'PATHWAY', 'SUBUNIT', 'CATALYTIC ACTIVITY', 'SUBCELLULAR LOCATION', 'FUNCTION', 'SIMILARITY']
- @@dr_database_identifier =
returns databases cross-references in the DR lines.
-
Bio::SPTR#dr -> Hash w/in Array
DR Line; defabases cross-reference (>=0)
DR database_identifier; primary_identifier; secondary_identifier. a cross_ref pre one line
-
['EMBL','CARBBANK','DICTYDB','ECO2DBASE', 'ECOGENE', 'FLYBASE','GCRDB','HIV','HSC-2DPAGE','HSSP','INTERPRO','MAIZEDB', 'MAIZE-2DPAGE','MENDEL','MGD''MIM','PDB','PFAM','PIR','PRINTS', 'PROSITE','REBASE','AARHUS/GHENT-2DPAGE','SGD','STYGENE','SUBTILIST', 'SWISS-2DPAGE','TIGR','TRANSFAC','TUBERCULIST','WORMPEP','YEPD','ZFIN']
Constants included from EMBLDB::Common
EMBLDB::Common::DELIMITER, EMBLDB::Common::RS, EMBLDB::Common::TAGSIZE
Instance Method Summary collapse
-
#cc(topic = nil) ⇒ Object
returns contents in the CC lines.
-
#dr(key = nil) ⇒ Object
Bio::SPTR#dr.
-
#dt(key = nil) ⇒ Object
returns a Hash of information in the DT lines.
-
#embl_dr ⇒ Object
Backup Bio::EMBLDB#dr as embl_dr.
-
#entry_id ⇒ Object
(also: #entry_name, #entry)
returns a ENTRY_NAME in the ID line.
-
#ft(feature_key = nil) ⇒ Object
returns contents in the feature table.
-
#gene_name ⇒ Object
returns a String of the first gene name in the GN line.
-
#gene_names ⇒ Object
returns a Array of gene names in the GN line.
-
#gn ⇒ Object
returns gene names in the GN line.
-
#hi ⇒ Object
The HI line Bio::SPTR#hi #=> hash.
-
#id_line(key = nil) ⇒ Object
returns a Hash of the ID line.
-
#molecule ⇒ Object
(also: #molecule_type)
returns a MOLECULE_TYPE in the ID line.
-
#oh ⇒ Object
The OH Line; .
-
#os(num = nil) ⇒ Object
returns a Array of Hashs or a String of the OS line when a key given.
-
#ox ⇒ Object
returns a Hash of oraganism taxonomy cross-references.
-
#protein_name ⇒ Object
returns the proposed official name of the protein.
-
#ref ⇒ Object
returns contents in the R lines.
-
#references ⇒ Object
returns Bio::Reference object from Bio::EMBLDB::Common#ref.
-
#seq ⇒ Object
(also: #aaseq)
returns a Bio::Sequence::AA of the amino acid sequence.
-
#sequence_length ⇒ Object
(also: #aalen)
returns a SEQUENCE_LENGTH in the ID line.
- #set_RN(data) ⇒ Object
-
#sq(key = nil) ⇒ Object
returns a Hash of conteins in the SQ lines.
-
#synonyms ⇒ Object
returns synonyms (unofficial and/or alternative names).
Methods included from EMBLDB::Common
#ac, #accession, #de, #initialize, #kw, #oc, #og
Methods inherited from EMBLDB
Methods inherited from DB
#exists?, #fetch, #get, open, #tags
Instance Method Details
#cc(topic = nil) ⇒ Object
returns contents in the CC lines.
-
Bio::SPTR#cc -> Hash
returns an object of contents in the TOPIC.
-
Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash
returns contents of the “ALTERNATIVE PRODUCTS”.
-
Bio::SPTR#cc(‘ALTERNATIVE PRODUCTS’) -> Hash
{'Event' => str, 'Named isoforms' => int, 'Comment' => str, 'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]} CC -!- ALTERNATIVE PRODUCTS: CC Event=Alternative splicing; Named isoforms=15; ... CC placentae isoforms. All tissues differentially splice exon 13; CC Name=A; Synonyms=no del; CC IsoId=P15529-1; Sequence=Displayed;
returns contents of the “DATABASE”.
-
Bio::SPTR#cc(‘DATABASE’) -> Array
[{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...] CC -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"].
returns contents of the “MASS SPECTROMETRY”.
-
Bio::SPTR#cc(‘MASS SPECTROMETRY’) -> Array
[{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...] CC -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX].
CC lines (>=0, optional)
CC -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT
CC IN LIVER, KIDNEY, LUNG AND BRAIN.
CC -!- TOPIC: FIRST LINE OF A COMMENT BLOCK;
CC SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK.
775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 |
# File 'lib/bio/db/embl/sptr.rb', line 775 def cc(topic = nil) unless @data['CC'] cc = Hash.new comment_border= '-' * (77 - 4 + 1) dlm = /-!- / # 12KD_MYCSM has no CC lines. return cc if get('CC').size == 0 cc_raw = fetch('CC') # Removing the copyright statement. cc_raw.sub!(/ *---.+---/m, '') # Not any CC Lines without the copyright statement. return cc if cc_raw == '' begin cc_raw, copyright = cc_raw.split(/#{comment_border}/)[0] cc_raw = cc_raw.sub(dlm,'') cc_raw.split(dlm).each do |tmp| tmp = tmp.strip if /(^[A-Z ]+[A-Z]): (.+)/ =~ tmp key = $1 body = $2 body.gsub!(/- (?!AND)/,'-') body.strip! unless cc[key] cc[key] = [body] else cc[key].push(body) end else raise ["Error: [#{entry_id}]: CC Lines", '"', tmp, '"', '', get('CC'),''].join("\n") end end rescue NameError if fetch('CC') == '' return {} else raise ["Error: Invalid CC Lines: [#{entry_id}]: ", "\n'#{self.get('CC')}'\n", "(#{$!})"].join end rescue NoMethodError end @data['CC'] = cc end case topic when 'ALLERGEN' return @data['CC'][topic] when 'ALTERNATIVE PRODUCTS' return cc_alternative_products(@data['CC'][topic]) when 'BIOPHYSICOCHEMICAL PROPERTIES' return cc_biophysiochemical_properties(@data['CC'][topic]) when 'BIOTECHNOLOGY' return @data['CC'][topic] when 'CATALITIC ACTIVITY' return cc_catalytic_activity(@data['CC'][topic]) when 'CAUTION' return cc_caution(@data['CC'][topic]) when 'COFACTOR' return @data['CC'][topic] when 'DEVELOPMENTAL STAGE' return @data['CC'][topic].join('') when 'DISEASE' return @data['CC'][topic].join('') when 'DOMAIN' return @data['CC'][topic] when 'ENZYME REGULATION' return @data['CC'][topic].join('') when 'FUNCTION' return @data['CC'][topic].join('') when 'INDUCTION' return @data['CC'][topic].join('') when 'INTERACTION' return cc_interaction(@data['CC'][topic]) when 'MASS SPECTROMETRY' return cc_mass_spectrometry(@data['CC'][topic]) when 'MISCELLANEOUS' return @data['CC'][topic] when 'PATHWAY' return cc_pathway(@data['CC'][topic]) when 'PHARMACEUTICAL' return @data['CC'][topic] when 'POLYMORPHISM' return @data['CC'][topic] when 'PTM' return @data['CC'][topic] when 'RNA EDITING' return cc_rna_editing(@data['CC'][topic]) when 'SIMILARITY' return @data['CC'][topic] when 'SUBCELLULAR LOCATION' return cc_subcellular_location(@data['CC'][topic]) when 'SUBUNIT' return @data['CC'][topic] when 'TISSUE SPECIFICITY' return @data['CC'][topic] when 'TOXIC DOSE' return @data['CC'][topic] when 'WEB RESOURCE' return cc_web_resource(@data['CC'][topic]) when 'DATABASE' # DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"]. tmp = Array.new db = @data['CC']['DATABASE'] return db unless db db.each do |e| db = {'NAME' => nil, 'NOTE' => nil, 'WWW' => nil, 'FTP' => nil} e.sub(/.$/,'').split(/;/).each do |line| case line when /NAME=(.+)/ db['NAME'] = $1 when /NOTE=(.+)/ db['NOTE'] = $1 when /WWW="(.+)"/ db['WWW'] = $1 when /FTP="(.+)"/ db['FTP'] = $1 end end tmp.push(db) end return tmp when nil return @data['CC'] else return @data['CC'][topic] end end |
#dr(key = nil) ⇒ Object
Bio::SPTR#dr
1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 |
# File 'lib/bio/db/embl/sptr.rb', line 1131 def dr(key = nil) unless key embl_dr else (embl_dr[key] or []).map {|x| {'Accession' => x[0], 'Version' => x[1], ' ' => x[2], 'Molecular Type' => x[3]} } end end |
#dt(key = nil) ⇒ Object
returns a Hash of information in the DT lines.
hash keys:
['created', 'sequence', 'annotation']
–
also Symbols acceptable (ASAP):
[:created, :sequence, :annotation]
++
Since UniProtKB release 7.0 of 07-Feb-2006, the DT line format is changed, and the word “annotation” is no longer used in DT lines. Despite the change, the word “annotation” is still used for keeping compatibility.
returns a String of information in the DT lines by a given key.
DT Line; date (3/entry)
DT DD-MMM-YYY (integrated into UniProtKB/XXXXX.)
DT DD-MMM-YYY (sequence version NN)
DT DD-MMM-YYY (entry version NN)
The format have been changed in UniProtKB release 7.0 of 07-Feb-2006. Below is the older format.
Old format of DT Line; date (3/entry)
DT DD-MMM-YYY (rel. NN, Created)
DT DD-MMM-YYY (rel. NN, Last sequence update)
DT DD-MMM-YYY (rel. NN, Last annotation update)
158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/bio/db/embl/sptr.rb', line 158 def dt(key = nil) return dt[key] if key return @data['DT'] if @data['DT'] part = self.get('DT').split(/\n/) @data['DT'] = { 'created' => part[0].sub(/\w{2} /,'').strip, 'sequence' => part[1].sub(/\w{2} /,'').strip, 'annotation' => part[2].sub(/\w{2} /,'').strip } end |
#embl_dr ⇒ Object
Backup Bio::EMBLDB#dr as embl_dr
1128 |
# File 'lib/bio/db/embl/sptr.rb', line 1128 alias :embl_dr :dr |
#entry_id ⇒ Object Also known as: entry_name, entry
returns a ENTRY_NAME in the ID line.
99 100 101 |
# File 'lib/bio/db/embl/sptr.rb', line 99 def entry_id id_line('ENTRY_NAME') end |
#ft(feature_key = nil) ⇒ Object
returns contents in the feature table.
Examples
sp = Bio::SPTR.new(entry)
ft = sp.ft
ft.class #=> Hash
ft.keys.each do |feature_key|
ft[feature_key].each do |feature|
feature['From'] #=> '1'
feature['To'] #=> '21'
feature['Description'] #=> ''
feature['FTId'] #=> ''
feature['diff'] #=> []
feature['original'] #=> [feature_key, '1', '21', '', '']
end
end
-
Bio::SPTR#ft -> Hash
{FEATURE_KEY => [{'From' => int, 'To' => int, 'Description' => aStr, 'FTId' => aStr, 'diff' => [original_residues, changed_residues], 'original' => aAry }],...}
returns an Array of the information about the feature_name in the feature table.
-
Bio::SPTR#ft(feature_name) -> Array of Hash
[{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...]
FT Line; feature table data (>=0, optional)
Col Data item
----- -----------------
1- 2 FT
6-13 Feature name
15-20 `FROM' endpoint
22-27 `TO' endpoint
35-75 Description (>=0 per key)
----- -----------------
Note: ‘FROM’ and ‘TO’ endopoints are allowed to use non-numerial charactors including ‘<’, ‘>’ or ‘?’. (c.f. ‘<1’, ‘?42’)
1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 |
# File 'lib/bio/db/embl/sptr.rb', line 1196 def ft(feature_key = nil) return ft[feature_key] if feature_key return @data['FT'] if @data['FT'] table = [] begin get('FT').split("\n").each do |line| if line =~ /^FT \w/ feature = line.chomp.ljust(74) table << [feature[ 5..12].strip, # Feature Name feature[14..19].strip, # From feature[21..26].strip, # To feature[34..74].strip ] # Description else table.last << line.chomp.sub!(/^FT +/, '') end end # Joining Description lines table = table.map { |feature| ftid = feature.pop if feature.last =~ /FTId=/ if feature.size > 4 feature = [feature[0], feature[1], feature[2], feature[3, feature.size - 3].join(" ")] end feature << if ftid then ftid else '' end } hash = {} table.each do |feature| hash[feature[0]] = [] unless hash[feature[0]] hash[feature[0]] << { # Removing '<', '>' or '?' in FROM/TO endopoint. 'From' => feature[1].sub(/\D/, '').to_i, 'To' => feature[2].sub(/\D/, '').to_i, 'Description' => feature[3], 'FTId' => feature[4].to_s.sub(/\/FTId=/, '').sub(/\.$/, ''), 'diff' => [], 'original' => feature } case feature[0] when 'VARSPLIC', 'VARIANT', 'VAR_SEQ', 'CONFLICT' case hash[feature[0]].last['Description'] when /(\w[\w ]*\w*) - ?> (\w[\w ]*\w*)/ original_res = $1 changed_res = $2 original_res = original_res.gsub(/ /,'').strip chenged_res = changed_res.gsub(/ /,'').strip when /Missing/i original_res = seq.subseq(hash[feature[0]].last['From'], hash[feature[0]].last['To']) changed_res = '' end hash[feature[0]].last['diff'] = [original_res, chenged_res] end end rescue raise "Invalid FT Lines(#{$!}) in #{entry_id}:, \n'#{self.get('FT')}'\n" end @data['FT'] = hash end |
#gene_name ⇒ Object
returns a String of the first gene name in the GN line.
438 439 440 |
# File 'lib/bio/db/embl/sptr.rb', line 438 def gene_name gene_names.first end |
#gene_names ⇒ Object
returns a Array of gene names in the GN line.
427 428 429 430 431 432 433 434 |
# File 'lib/bio/db/embl/sptr.rb', line 427 def gene_names gn # set @data['GN'] if it hasn't been already done if @data['GN'].first.class == Hash then @data['GN'].collect { |element| element[:name] } else @data['GN'].first end end |
#gn ⇒ Object
returns gene names in the GN line.
New UniProt/SwissProt format:
-
Bio::SPTR#gn -> [ <gene record>* ]
where <gene record> is:
{ :name => '...',
:synonyms => [ 's1', 's2', ... ],
:loci => [ 'l1', 'l2', ... ],
:orfs => [ 'o1', 'o2', ... ]
}
Old format:
-
Bio::SPTR#gn -> Array # AND
-
Bio::SPTR#gn -> Array # OR
GN Line: Gene name(s) (>=0, optional)
351 352 353 354 355 356 357 358 359 360 361 |
# File 'lib/bio/db/embl/sptr.rb', line 351 def gn unless @data['GN'] case fetch('GN') when /Name=/,/ORFNames=/,/OrderedLocusNames=/,/Synonyms=/ @data['GN'] = gn_uniprot_parser else @data['GN'] = gn_old_parser end end @data['GN'] end |
#hi ⇒ Object
The HI line
Bio::SPTR#hi #=> hash
691 692 693 694 695 696 697 698 699 700 701 702 703 704 |
# File 'lib/bio/db/embl/sptr.rb', line 691 def hi unless @data['HI'] @data['HI'] = [] fetch('HI').split(/\. /).each do |hlist| hash = {'Category' => '', 'Keywords' => [], 'Keyword' => ''} hash['Category'], hash['Keywords'] = hlist.split(': ') hash['Keywords'] = hash['Keywords'].split('; ') hash['Keyword'] = hash['Keywords'].pop hash['Keyword'].sub!(/\.$/, '') @data['HI'] << hash end end @data['HI'] end |
#id_line(key = nil) ⇒ Object
returns a Hash of the ID line.
returns a content (Int or String) of the ID line by a given key. Hash keys: [‘ENTRY_NAME’, ‘DATA_CLASS’, ‘MODECULE_TYPE’, ‘SEQUENCE_LENGTH’]
ID Line (since UniProtKB release 9.0 of 31-Oct-2006)
ID P53_HUMAN Reviewed; 393 AA.
#"ID #{ENTRY_NAME} #{DATA_CLASS}; #{SEQUENCE_LENGTH}."
Examples
obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"Reviewed",
"SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>nil}
obj.id_line('ENTRY_NAME') #=> "P53_HUMAN"
ID Line (older style)
ID P53_HUMAN STANDARD; PRT; 393 AA.
#"ID #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}."
Examples
obj.id_line #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"STANDARD",
"SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>"PRT"}
obj.id_line('ENTRY_NAME') #=> "P53_HUMAN"
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/bio/db/embl/sptr.rb', line 74 def id_line(key = nil) return id_line[key] if key return @data['ID'] if @data['ID'] part = @orig['ID'].split(/ +/) if part[4].to_s.chomp == 'AA.' then # after UniProtKB release 9.0 of 31-Oct-2006 # (http://www.uniprot.org/docs/sp_news.htm) molecule_type = nil sequence_length = part[3].to_i else molecule_type = part[3].sub(/;/,'') sequence_length = part[4].to_i end @data['ID'] = { 'ENTRY_NAME' => part[1], 'DATA_CLASS' => part[2].sub(/;/,''), 'MOLECULE_TYPE' => molecule_type, 'SEQUENCE_LENGTH' => sequence_length } end |
#molecule ⇒ Object Also known as: molecule_type
returns a MOLECULE_TYPE in the ID line.
A short-cut for Bio::SPTR#id_line(‘MOLECULE_TYPE’).
109 110 111 |
# File 'lib/bio/db/embl/sptr.rb', line 109 def molecule id_line('MOLECULE_TYPE') end |
#oh ⇒ Object
The OH Line;
OH NCBI_TaxID=TaxID; HostName. br.expasy.org/sprot/userman.html#OH_line
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 |
# File 'lib/bio/db/embl/sptr.rb', line 521 def oh unless @data['OH'] @data['OH'] = fetch('OH').split("\. ").map {|x| if x =~ /NCBI_TaxID=(\d+);/ taxid = $1 else raise ArgumentError, ["Error: Invalid OH line format (#{self.entry_id}):", $!, "\n", get('OH'), "\n"].join end if x =~ /NCBI_TaxID=\d+; (.+)/ host_name = $1 host_name.sub!(/\.$/, '') else host_name = nil end {'NCBI_TaxID' => taxid, 'HostName' => host_name} } end @data['OH'] end |
#os(num = nil) ⇒ Object
returns a Array of Hashs or a String of the OS line when a key given.
-
Bio::EMBLDB#os -> Array
[{'name' => '(Human)', 'os' => 'Homo sapiens'},
{'name' => '(Rat)', 'os' => 'Rattus norveticus'}]
-
Bio::EPTR#os -> Hash
{'name' => "(Human)", 'os' => 'Homo sapiens'}
-
Bio::SPTR#os[‘name’] -> “(Human)”
-
Bio::EPTR#os(0) -> “Homo sapiens (Human)”
OS Line; organism species (>=1)
OS Genus species (name).
OS Genus species (name0) (name1).
OS Genus species (name0) (name1).
OS Genus species (name0), G s0 (name0), and G s (name0) (name1).
OS Homo sapiens (Human), and Rarrus norveticus (Rat)
OS Hippotis sp. Clark and Watts 825.
OS unknown cyperaceous sp.
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 |
# File 'lib/bio/db/embl/sptr.rb', line 460 def os(num = nil) unless @data['OS'] os = Array.new fetch('OS').split(/, and|, /).each do |tmp| if tmp =~ /(\w+ *[\w\d \:\'\+\-\.]+[\w\d\.])/ org = $1 tmp =~ /(\(.+\))/ os.push({'name' => $1, 'os' => org}) else raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n" end end @data['OS'] = os end if num # EX. "Trifolium repens (white clover)" return "#{@data['OS'][num]['os']} #{@data['OS'][num]['name']}" else return @data['OS'] end end |
#ox ⇒ Object
returns a Hash of oraganism taxonomy cross-references.
-
Bio::SPTR#ox -> Hash
{'NCBI_TaxID' => ['1234','2345','3456','4567'], ...}
OX Line; organism taxonomy cross-reference (>=1 per entry)
OX NCBI_TaxID=1234;
OX NCBI_TaxID=1234, 2345, 3456, 4567;
504 505 506 507 508 509 510 511 512 513 514 515 |
# File 'lib/bio/db/embl/sptr.rb', line 504 def ox unless @data['OX'] tmp = fetch('OX').sub(/\.$/,'').split(/;/).map { |e| e.strip } hsh = Hash.new tmp.each do |e| db,refs = e.split(/=/) hsh[db] = refs.split(/, */) end @data['OX'] = hsh end return @data['OX'] end |
#protein_name ⇒ Object
returns the proposed official name of the protein. Returns a String.
Since UniProtKB release 14.0 of 22-Jul-2008, the DE line format have been changed. The method returns the full name which is taken from “RecName: Full=” or “SubName: Full=” line normally in the beginning of the DE lines. Unlike parser for old format, no special treatments for fragment or precursor.
For old format, the method parses the DE lines and returns the protein name as a String.
DE Line; description (>=1)
"DE #{OFFICIAL_NAME} (#{SYNONYM})"
"DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]."
OFFICIAL_NAME 1/entry
SYNONYM >=0
CONTEINS >=0
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
# File 'lib/bio/db/embl/sptr.rb', line 251 def protein_name @data['DE'] ||= parse_DE_line_rel14(get('DE')) parsed_de_line = @data['DE'] if parsed_de_line then # since UniProtKB release 14.0 of 22-Jul-2008 name = nil parsed_de_line.each do |a| case a[0] when 'RecName', 'SubName' if name_pair = a[1..-1].find { |b| b[0] == 'Full' } then name = name_pair[1] break end end end name = name.to_s else # old format (before Rel. 13.x) name = "" if de_line = fetch('DE') then str = de_line[/^[^\[]*/] # everything preceding the first [ (the "contains" part) name = str[/^[^(]*/].strip name << ' (Fragment)' if str =~ /fragment/i end end return name end |
#ref ⇒ Object
returns contents in the R lines.
-
Bio::EMBLDB::Common#ref -> [ <refernece information Hash>* ]
where <reference information Hash> is:
{'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '',
'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''}
R Lines
-
RN RC RP RX RA RT RL RG
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 |
# File 'lib/bio/db/embl/sptr.rb', line 557 def ref unless @data['R'] @data['R'] = [get('R').split(/\nRN /)].flatten.map { |str| hash = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''} str = 'RN ' + str unless /^RN / =~ str str.split("\n").each do |line| if /^(R[NPXARLCTG]) (.+)/ =~ line hash[$1] += $2 + ' ' else raise "Invalid format in R lines, \n[#{line}]\n" end end hash['RN'] = set_RN(hash['RN']) hash['RC'] = set_RC(hash['RC']) hash['RP'] = set_RP(hash['RP']) hash['RX'] = set_RX(hash['RX']) hash['RA'] = set_RA(hash['RA']) hash['RT'] = set_RT(hash['RT']) hash['RL'] = set_RL(hash['RL']) hash['RG'] = set_RG(hash['RG']) hash } end @data['R'] end |
#references ⇒ Object
returns Bio::Reference object from Bio::EMBLDB::Common#ref.
-
Bio::EMBLDB::Common#ref -> Bio::References
651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 |
# File 'lib/bio/db/embl/sptr.rb', line 651 def references unless @data['references'] ary = self.ref.map {|ent| hash = Hash.new('') ent.each {|key, value| case key when 'RA' hash['authors'] = value.split(/, /) when 'RT' hash['title'] = value when 'RL' if value =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/ hash['journal'] = $1 hash['volume'] = $2 hash['issue'] = $3 hash['pages'] = $4 hash['year'] = $5 else hash['journal'] = value end when 'RX' # PUBMED, MEDLINE, DOI value.each do |tag, xref| hash[ tag.downcase ] = xref end end } Reference.new(hash) } @data['references'] = References.new(ary) end @data['references'] end |
#seq ⇒ Object Also known as: aaseq
returns a Bio::Sequence::AA of the amino acid sequence.
-
Bio::SPTR#seq -> Bio::Sequence::AA
blank Line; sequence data (>=1)
1306 1307 1308 1309 1310 1311 |
# File 'lib/bio/db/embl/sptr.rb', line 1306 def seq unless @data[''] @data[''] = Sequence::AA.new( fetch('').gsub(/ |\d+/,'') ) end return @data[''] end |
#sequence_length ⇒ Object Also known as: aalen
returns a SEQUENCE_LENGTH in the ID line.
A short-cut for Bio::SPTR#id_line(‘SEQUENCE_LENGHT’).
118 119 120 |
# File 'lib/bio/db/embl/sptr.rb', line 118 def sequence_length id_line('SEQUENCE_LENGTH') end |
#set_RN(data) ⇒ Object
588 589 590 |
# File 'lib/bio/db/embl/sptr.rb', line 588 def set_RN(data) data.strip end |
#sq(key = nil) ⇒ Object
returns a Hash of conteins in the SQ lines.
-
Bio::SPTRL#sq -> hsh
returns a value of a key given in the SQ lines.
-
Bio::SPTRL#sq(key) -> int or str
-
Keys: [‘MW’, ‘mw’, ‘molecular’, ‘weight’, ‘aalen’, ‘len’, ‘length’,
'CRC64']
SQ Line; sequence header (1/entry)
SQ SEQUENCE 233 AA; 25630 MW; 146A1B48A1475C86 CRC64;
SQ SEQUENCE \d+ AA; \d+ MW; [0-9A-Z]+ CRC64;
MW, Dalton unit. CRC64 (64-bit Cyclic Redundancy Check, ISO 3309).
1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 |
# File 'lib/bio/db/embl/sptr.rb', line 1278 def sq(key = nil) unless @data['SQ'] if fetch('SQ') =~ /(\d+) AA\; (\d+) MW; (.+) CRC64;/ @data['SQ'] = { 'aalen' => $1.to_i, 'MW' => $2.to_i, 'CRC64' => $3 } else raise "Invalid SQ Line: \n'#{fetch('SQ')}'" end end if key case key when /mw/, /molecular/, /weight/ @data['SQ']['MW'] when /len/, /length/, /AA/ @data['SQ']['aalen'] else @data['SQ'][key] end else @data['SQ'] end end |
#synonyms ⇒ Object
returns synonyms (unofficial and/or alternative names). Returns an Array containing String objects.
Since UniProtKB release 14.0 of 22-Jul-2008, the DE line format have been changed. The method returns the full or short names which are taken from “RecName: Short=”, “RecName: EC=”, and AltName lines, except after “Contains:” or “Includes:”. For keeping compatibility with old format parser, “RecName: EC=N.N.N.N” is reported as “EC N.N.N.N”. In addition, to prevent confusion, “Allergen=” and “CD_antigen=” prefixes are added for the corresponding fields.
For old format, the method parses the DE lines and returns synonyms. synonyms are each placed in () following the official name on the DE line.
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
# File 'lib/bio/db/embl/sptr.rb', line 294 def synonyms ary = Array.new @data['DE'] ||= parse_DE_line_rel14(get('DE')) parsed_de_line = @data['DE'] if parsed_de_line then # since UniProtKB release 14.0 of 22-Jul-2008 parsed_de_line.each do |a| case a[0] when 'Includes', 'Contains' break #the each loop when 'RecName', 'SubName', 'AltName' a[1..-1].each do |b| if name = b[1] and b[1] != self.protein_name then case b[0] when 'EC' name = "EC " + b[1] when 'Allergen', 'CD_antigen' name = b[0] + '=' + b[1] else name = b[1] end ary.push name end end end #case a[0] end #parsed_de_line.each else # old format (before Rel. 13.x) if de_line = fetch('DE') then line = de_line.sub(/\[.*\]/,'') # ignore stuff between [ and ]. That's the "contains" part line.scan(/\([^)]+/) do |synonym| unless synonym =~ /fragment/i then ary << synonym[1..-1].strip # index to remove the leading ( end end end end return ary end |