Class: Bio::GCG::Seq
Overview
Bio::GCG::Seq
This is GCG sequence file format (.seq or .pep) parser class.
References
-
Information about GCG Wisconsin Package®
www.accelrys.com/products/gcg_wisconsin_package .
-
EMBOSS sequence formats
www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html
-
BioPerl document
Constant Summary collapse
- DELIMITER =
delimiter used by Bio::FlatFile
RS = nil
Instance Attribute Summary collapse
-
#checksum ⇒ Object
readonly
“Check:” field, which indicates checksum of current sequence.
-
#date ⇒ Object
readonly
Date field of this entry.
-
#definition ⇒ Object
readonly
Description field.
-
#entry_id ⇒ Object
readonly
ID field.
-
#heading ⇒ Object
readonly
heading (‘!!NA_SEQUENCE 1.0’ or whatever like this).
-
#length ⇒ Object
readonly
“Length:” field.
-
#seq_type ⇒ Object
readonly
“Type:” field, which indicates sequence type.
Class Method Summary collapse
-
.calc_checksum(str) ⇒ Object
Calculates checksum from given string.
-
.to_gcg(hash) ⇒ Object
Creates a new GCG sequence format text.
Instance Method Summary collapse
-
#aaseq ⇒ Object
If you know the sequence is AA, use this method.
-
#initialize(str) ⇒ Seq
constructor
Creates new instance of this class.
-
#naseq ⇒ Object
If you know the sequence is NA, use this method.
-
#seq ⇒ Object
Sequence data.
-
#validate_checksum ⇒ Object
Validates checksum.
Constructor Details
#initialize(str) ⇒ Seq
Creates new instance of this class. str must be a GCG seq formatted string.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/bio/appl/gcg/seq.rb', line 38 def initialize(str) @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this str = str.sub(/.*/, '') str.sub!(/.*\.\.$/m, '') @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s desc = $&.to_s if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then @entry_id = m[1].to_s.strip @length = (m[2] ? m[2].to_i : nil) @date = m[3].to_s.strip @seq_type = m[4] @checksum = (m[5] ? m[5].to_i : nil) end @data = str @seq = nil @definition.strip! end |
Instance Attribute Details
#checksum ⇒ Object (readonly)
“Check:” field, which indicates checksum of current sequence.
74 75 76 |
# File 'lib/bio/appl/gcg/seq.rb', line 74 def checksum @checksum end |
#date ⇒ Object (readonly)
Date field of this entry.
67 68 69 |
# File 'lib/bio/appl/gcg/seq.rb', line 67 def date @date end |
#definition ⇒ Object (readonly)
Description field.
60 61 62 |
# File 'lib/bio/appl/gcg/seq.rb', line 60 def definition @definition end |
#entry_id ⇒ Object (readonly)
ID field.
57 58 59 |
# File 'lib/bio/appl/gcg/seq.rb', line 57 def entry_id @entry_id end |
#heading ⇒ Object (readonly)
heading (‘!!NA_SEQUENCE 1.0’ or whatever like this)
78 79 80 |
# File 'lib/bio/appl/gcg/seq.rb', line 78 def heading @heading end |
#length ⇒ Object (readonly)
“Length:” field. Note that sometimes this might differ from real sequence length.
64 65 66 |
# File 'lib/bio/appl/gcg/seq.rb', line 64 def length @length end |
#seq_type ⇒ Object (readonly)
“Type:” field, which indicates sequence type. “N” means nucleic acid sequence, “P” means protein sequence.
71 72 73 |
# File 'lib/bio/appl/gcg/seq.rb', line 71 def seq_type @seq_type end |
Class Method Details
.calc_checksum(str) ⇒ Object
Calculates checksum from given string.
141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/bio/appl/gcg/seq.rb', line 141 def self.calc_checksum(str) # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3 idx = 0 sum = 0 str.upcase.tr('^A-Z.~', '').each_byte do |c| idx += 1 sum += idx * c idx = 0 if idx >= 57 end (sum % 10000) end |
.to_gcg(hash) ⇒ Object
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/bio/appl/gcg/seq.rb', line 161 def self.to_gcg(hash) seq = hash[:seq] if seq.is_a?(Bio::Sequence::NA) then seq_type = 'N' elsif seq.is_a?(Bio::Sequence::AA) then seq_type = 'P' else seq_type = (hash[:seq_type] or 'P') end if seq_type == 'N' then head = '!!NA_SEQUENCE 1.0' else head = '!!AA_SEQUENCE 1.0' end date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M')) entry_id = hash[:entry_id].to_s.strip len = seq.length checksum = self.calc_checksum(seq) definition = hash[:definition].to_s.strip seq = seq.upcase.gsub(/.{1,50}/, "\\0\n") seq.gsub!(/.{10}/, "\\0 ") w = len.to_s.size + 1 i = 1 seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s } [ head, "\n", definition, "\n\n", "#{entry_id} Length: #{len} #{date} " \ "Type: #{seq_type} Check: #{checksum} ..\n", seq, "\n" ].join('') end |
Instance Method Details
#aaseq ⇒ Object
If you know the sequence is AA, use this method. Returns a Bio::Sequence::AA object.
If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.
108 109 110 111 112 113 114 |
# File 'lib/bio/appl/gcg/seq.rb', line 108 def aaseq if seq.is_a?(Bio::Sequence::AA) then @seq else raise 'seq_type != \'P\'' end end |
#naseq ⇒ Object
If you know the sequence is NA, use this method. Returens a Bio::Sequence::NA object.
If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.
121 122 123 124 125 126 127 |
# File 'lib/bio/appl/gcg/seq.rb', line 121 def naseq if seq.is_a?(Bio::Sequence::NA) then @seq else raise 'seq_type != \'N\'' end end |
#seq ⇒ Object
Sequence data. The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence::Generic, according to the sequence type.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/bio/appl/gcg/seq.rb', line 88 def seq unless @seq then case @seq_type when 'N', 'n' k = Bio::Sequence::NA when 'P', 'p' k = Bio::Sequence::AA else k = Bio::Sequence end @seq = k.new(@data.tr('^-a-zA-Z.~', '')) end @seq end |
#validate_checksum ⇒ Object
Validates checksum. If validation succeeds, returns true. Otherwise, returns false.
132 133 134 |
# File 'lib/bio/appl/gcg/seq.rb', line 132 def validate_checksum checksum == self.class.calc_checksum(seq) end |