Class: Bio::GFF::GFF3::Record::Gap
Overview
Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.
Defined Under Namespace
Classes: Code
Class Method Summary collapse
-
.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object
Creates a new Gap object from given sequence alignment.
-
.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object
Creates a new Gap object from given sequence alignment.
-
.parse(str) ⇒ Object
Same as new(str).
Instance Method Summary collapse
-
#==(other) ⇒ Object
If self == other, returns true.
-
#initialize(str = nil) ⇒ Gap
constructor
Creates a new Gap object.
-
#process_sequences_na(reference, target, gap_char = '-') ⇒ Object
Processes nucleotide sequences and returns gapped sequences as an array of sequences.
-
#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object
Processes sequences and returns gapped sequences as an array of sequences.
-
#to_s ⇒ Object
string representation.
Constructor Details
#initialize(str = nil) ⇒ Gap
Creates a new Gap object.
Arguments:
-
str: a formatted string, or nil.
1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 |
# File 'lib/bio/db/gff.rb', line 1276 def initialize(str = nil) if str then @data = str.split(/ +/).collect do |x| if /\A([A-Z])([0-9]+)\z/ =~ x.strip then Code.new($1.intern, $2.to_i) else warn "ignored unknown token: #{x}.inspect" if $VERBOSE nil end end @data.compact! else @data = [] end end |
Class Method Details
.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object
Creates a new Gap object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_regexp: regexp to identify gap
1392 1393 1394 1395 1396 1397 1398 1399 1400 |
# File 'lib/bio/db/gff.rb', line 1392 def self.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) gap = self.new gap.instance_eval { __initialize_from_sequences_na(reference, target, gap_regexp) } gap end |
.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object
Creates a new Gap object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.
For example,
atgg-taagac-att
M V K - I
is treated as:
atggt<aagacatt
M V K >>I
Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.
Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.
Priority of regular expressions:
space > forward/reverse frameshift > gap
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_regexp: regexp to identify gap
-
space_regexp: regexp to identify space character which is completely ignored
-
forward_frameshift_regexp: regexp to identify forward frameshift
-
reverse_frameshift_regexp: regexp to identify reverse frameshift
1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 |
# File 'lib/bio/db/gff.rb', line 1588 def self.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) gap = self.new gap.instance_eval { __initialize_from_sequences_na_aa(reference, target, gap_regexp, space_regexp, forward_frameshift_regexp, reverse_frameshift_regexp) } gap end |
.parse(str) ⇒ Object
Same as new(str).
1293 1294 1295 |
# File 'lib/bio/db/gff.rb', line 1293 def self.parse(str) self.new(str) end |
Instance Method Details
#==(other) ⇒ Object
If self == other, returns true. otherwise, returns false.
1616 1617 1618 1619 1620 1621 1622 1623 |
# File 'lib/bio/db/gff.rb', line 1616 def ==(other) if other.class == self.class and @data == other.data then true else false end end |
#process_sequences_na(reference, target, gap_char = '-') ⇒ Object
Processes nucleotide sequences and returns gapped sequences as an array of sequences.
Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_char: gap character
1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 |
# File 'lib/bio/db/gff.rb', line 1716 def process_sequences_na(reference, target, gap_char = '-') s_ref, s_tgt = dup_seqs(reference, target) s_ref, s_tgt = __process_sequences(s_ref, s_tgt, gap_char, gap_char, 1, 1, gap_char, gap_char) if $VERBOSE and s_ref.length != s_tgt.length then warn "returned sequences not equal length" end return s_ref, s_tgt end |
#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object
Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.
Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:
atgaagat<aatgtc
M K I N V
Alignment of “Gap=M3 R3 M3” is:
atgaag<<<attaatgtc
M K I I N V
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_char: gap character
-
space_char: space character inserted to amino sequence for matching na-aa alignment
-
forward_frameshift: forward frameshift character
-
reverse_frameshift: reverse frameshift character
1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 |
# File 'lib/bio/db/gff.rb', line 1753 def process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') s_ref, s_tgt = dup_seqs(reference, target) s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}") ref_increment = 3 tgt_increment = 1 + space_char.length * 2 ref_gap = gap_char * 3 tgt_gap = "#{gap_char}#{space_char}#{space_char}" return __process_sequences(s_ref, s_tgt, ref_gap, tgt_gap, ref_increment, tgt_increment, forward_frameshift, reverse_frameshift) end |
#to_s ⇒ Object
string representation
1605 1606 1607 |
# File 'lib/bio/db/gff.rb', line 1605 def to_s @data.collect { |x| x.to_s }.join(" ") end |