Class: Bio::NBRF
Overview
Sequence data class for NBRF/PIR flatfile format.
Constant Summary collapse
- DELIMITER =
Delimiter of each entry. Bio::FlatFile uses it.
RS = "\n>"
- DELIMITER_OVERRUN =
(Integer) excess read size included in DELIMITER.
1
Instance Attribute Summary collapse
-
#data ⇒ Object
sequence data of the entry (???).
-
#definition ⇒ Object
Returns the description line of the NBRF/PIR formatted data.
-
#entry_id ⇒ Object
(also: #accession)
Returns ID described in the entry.
-
#entry_overrun ⇒ Object
readonly
piece of next entry.
-
#seq_type ⇒ Object
Returns sequence type described in the entry.
Class Method Summary collapse
-
.to_nbrf(hash) ⇒ Object
Creates a NBRF/PIR formatted text.
Instance Method Summary collapse
-
#aalen ⇒ Object
Returens the length of protein (amino acids) sequence.
-
#aaseq ⇒ Object
Returens the protein (amino acids) sequence.
-
#entry ⇒ Object
(also: #to_s)
Returns the stored one entry as a NBRF/PIR format.
-
#initialize(str) ⇒ NBRF
constructor
Creates a new NBRF object.
-
#length ⇒ Object
Returns sequence length.
-
#nalen ⇒ Object
Returens the length of sequence.
-
#naseq ⇒ Object
Returens the nucleic acid sequence.
-
#seq ⇒ Object
Returns sequence data.
-
#seq_class ⇒ Object
Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, depending on sequence type.
Methods inherited from DB
#exists?, #fetch, #get, open, #tags
Constructor Details
#initialize(str) ⇒ NBRF
Creates a new NBRF object. It stores the comment and sequence information from one entry of the NBRF/PIR format string. If the argument contains more than one entry, only the first entry is used.
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/bio/db/nbrf.rb', line 45 def initialize(str) str = str.sub(/\A[\r\n]+/, '') # remove first void lines line1, line2, rest = str.split(/^/, 3) rest = rest.to_s rest.sub!(/^>.*/m, '') # remove trailing entries for sure @entry_overrun = $& rest.sub!(/\*\s*\z/, '') # remove last '*' and "\n" @data = rest @definition = line2.to_s.chomp if /^>?([A-Za-z0-9]{2})\;(.*)/ =~ line1.to_s then @seq_type = $1 @entry_id = $2 end end |
Instance Attribute Details
#data ⇒ Object
sequence data of the entry (???)
77 78 79 |
# File 'lib/bio/db/nbrf.rb', line 77 def data @data end |
#definition ⇒ Object
Returns the description line of the NBRF/PIR formatted data.
74 75 76 |
# File 'lib/bio/db/nbrf.rb', line 74 def definition @definition end |
#entry_id ⇒ Object Also known as: accession
Returns ID described in the entry.
70 71 72 |
# File 'lib/bio/db/nbrf.rb', line 70 def entry_id @entry_id end |
#entry_overrun ⇒ Object (readonly)
piece of next entry. Bio::FlatFile uses it.
80 81 82 |
# File 'lib/bio/db/nbrf.rb', line 80 def entry_overrun @entry_overrun end |
#seq_type ⇒ Object
Returns sequence type described in the entry.
P1 (protein), F1 (protein fragment)
DL (DNA linear), DC (DNA circular)
RL (DNA linear), RC (DNA circular)
N3 (tRNA), N1 (other functional RNA)
67 68 69 |
# File 'lib/bio/db/nbrf.rb', line 67 def seq_type @seq_type end |
Class Method Details
.to_nbrf(hash) ⇒ Object
Creates a NBRF/PIR formatted text. Parameters can be omitted.
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/bio/db/nbrf.rb', line 167 def self.to_nbrf(hash) seq_type = hash[:seq_type] seq = hash[:seq] unless seq_type if seq.is_a?(Bio::Sequence::AA) then seq_type = 'P1' elsif seq.is_a?(Bio::Sequence::NA) then seq_type = /u/i =~ seq ? 'RL' : 'DL' else seq_type = 'XX' end end width = hash.has_key?(:width) ? hash[:width] : 70 if width then seq = seq.to_s + "*" seq.gsub!(Regexp.new(".{1,#{width}}"), "\\0\n") else seq = seq.to_s + "*\n" end ">#{seq_type};#{hash[:entry_id]}\n#{hash[:definition]}\n#{seq}" end |
Instance Method Details
#aalen ⇒ Object
Returens the length of protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
157 158 159 |
# File 'lib/bio/db/nbrf.rb', line 157 def aalen aaseq.length end |
#aaseq ⇒ Object
Returens the protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
143 144 145 146 147 148 149 150 151 |
# File 'lib/bio/db/nbrf.rb', line 143 def aaseq if seq.is_a?(Bio::Sequence::NA) then raise 'not nucleic but protein sequence' elsif seq.is_a?(Bio::Sequence::AA) then seq else Bio::Sequence::AA.new(seq) end end |
#entry ⇒ Object Also known as: to_s
Returns the stored one entry as a NBRF/PIR format. (same as to_s)
84 85 86 |
# File 'lib/bio/db/nbrf.rb', line 84 def entry @entry = ">#{@seq_type or 'XX'};#{@entry_id}\n#{definition}\n#{@data}*\n" end |
#length ⇒ Object
Returns sequence length.
115 116 117 |
# File 'lib/bio/db/nbrf.rb', line 115 def length seq.length end |
#nalen ⇒ Object
Returens the length of sequence. If you call nalen for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
135 136 137 |
# File 'lib/bio/db/nbrf.rb', line 135 def nalen naseq.length end |
#naseq ⇒ Object
Returens the nucleic acid sequence. If you call naseq for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.
122 123 124 125 126 127 128 129 130 |
# File 'lib/bio/db/nbrf.rb', line 122 def naseq if seq.is_a?(Bio::Sequence::AA) then raise 'not nucleic but protein sequence' elsif seq.is_a?(Bio::Sequence::NA) then seq else Bio::Sequence::NA.new(seq) end end |
#seq ⇒ Object
Returns sequence data. Returns Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence, according to the sequence type.
107 108 109 110 111 112 |
# File 'lib/bio/db/nbrf.rb', line 107 def seq unless defined?(@seq) @seq = seq_class.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up end @seq end |
#seq_class ⇒ Object
Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, depending on sequence type.
91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/bio/db/nbrf.rb', line 91 def seq_class case @seq_type when /[PF]1/ # protein Sequence::AA when /[DR][LC]/, /N[13]/ # nucleic Sequence::NA else Sequence end end |