Class: Bio::GFF::GFF3::Record::Gap
- Inherits:
-
Object
- Object
- Bio::GFF::GFF3::Record::Gap
- Defined in:
- lib/bio/db/gff.rb
Overview
Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.
Defined Under Namespace
Classes: Code
Class Method Summary collapse
-
.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object
Creates a new Gap object from given sequence alignment.
-
.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object
Creates a new Gap object from given sequence alignment.
-
.parse(str) ⇒ Object
Same as new(str).
Instance Method Summary collapse
-
#==(other) ⇒ Object
If self == other, returns true.
-
#initialize(str = nil) ⇒ Gap
constructor
Creates a new Gap object.
-
#process_sequences_na(reference, target, gap_char = '-') ⇒ Object
Processes nucleotide sequences and returns gapped sequences as an array of sequences.
-
#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object
Processes sequences and returns gapped sequences as an array of sequences.
-
#to_s ⇒ Object
string representation.
Constructor Details
#initialize(str = nil) ⇒ Gap
Creates a new Gap object.
Arguments:
-
str: a formatted string, or nil.
1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 |
# File 'lib/bio/db/gff.rb', line 1283 def initialize(str = nil) if str then @data = str.split(/ +/).collect do |x| if /\A([A-Z])([0-9]+)\z/ =~ x.strip then Code.new($1.intern, $2.to_i) else warn "ignored unknown token: #{x}.inspect" if $VERBOSE nil end end @data.compact! else @data = [] end end |
Class Method Details
.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object
Creates a new Gap object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_regexp: regexp to identify gap
1399 1400 1401 1402 1403 1404 1405 1406 1407 |
# File 'lib/bio/db/gff.rb', line 1399 def self.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) gap = self.new gap.instance_eval { __initialize_from_sequences_na(reference, target, gap_regexp) } gap end |
.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object
Creates a new Gap object from given sequence alignment.
Note that sites of which both reference and target are gaps are silently removed.
For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.
For example,
atgg-taagac-att
M V K - I
is treated as:
atggt<aagacatt
M V K >>I
Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.
Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.
Priority of regular expressions:
space > forward/reverse frameshift > gap
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_regexp: regexp to identify gap
-
space_regexp: regexp to identify space character which is completely ignored
-
forward_frameshift_regexp: regexp to identify forward frameshift
-
reverse_frameshift_regexp: regexp to identify reverse frameshift
1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 |
# File 'lib/bio/db/gff.rb', line 1595 def self.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) gap = self.new gap.instance_eval { __initialize_from_sequences_na_aa(reference, target, gap_regexp, space_regexp, forward_frameshift_regexp, reverse_frameshift_regexp) } gap end |
.parse(str) ⇒ Object
Same as new(str).
1300 1301 1302 |
# File 'lib/bio/db/gff.rb', line 1300 def self.parse(str) self.new(str) end |
Instance Method Details
#==(other) ⇒ Object
If self == other, returns true. otherwise, returns false.
1623 1624 1625 1626 1627 1628 1629 1630 |
# File 'lib/bio/db/gff.rb', line 1623 def ==(other) if other.class == self.class and @data == other.data then true else false end end |
#process_sequences_na(reference, target, gap_char = '-') ⇒ Object
Processes nucleotide sequences and returns gapped sequences as an array of sequences.
Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (nucleotide sequence)
-
gap_char: gap character
1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 |
# File 'lib/bio/db/gff.rb', line 1723 def process_sequences_na(reference, target, gap_char = '-') s_ref, s_tgt = dup_seqs(reference, target) s_ref, s_tgt = __process_sequences(s_ref, s_tgt, gap_char, gap_char, 1, 1, gap_char, gap_char) if $VERBOSE and s_ref.length != s_tgt.length then warn "returned sequences not equal length" end return s_ref, s_tgt end |
#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object
Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.
Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:
atgaagat<aatgtc
M K I N V
Alignment of “Gap=M3 R3 M3” is:
atgaag<<<attaatgtc
M K I I N V
Arguments:
-
reference: reference sequence (nucleotide sequence)
-
target: target sequence (amino acid sequence)
-
gap_char: gap character
-
space_char: space character inserted to amino sequence for matching na-aa alignment
-
forward_frameshift: forward frameshift character
-
reverse_frameshift: reverse frameshift character
1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 |
# File 'lib/bio/db/gff.rb', line 1760 def process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') s_ref, s_tgt = dup_seqs(reference, target) s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}") ref_increment = 3 tgt_increment = 1 + space_char.length * 2 ref_gap = gap_char * 3 tgt_gap = "#{gap_char}#{space_char}#{space_char}" return __process_sequences(s_ref, s_tgt, ref_gap, tgt_gap, ref_increment, tgt_increment, forward_frameshift, reverse_frameshift) end |
#to_s ⇒ Object
string representation
1612 1613 1614 |
# File 'lib/bio/db/gff.rb', line 1612 def to_s @data.collect { |x| x.to_s }.join(" ") end |