Class: Bio::GFF::GFF3::Record::Gap

Inherits:

Object

Object
Bio::GFF::GFF3::Record::Gap

show all

Defined in:: lib/bio/db/gff.rb

Overview

Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.

Defined Under Namespace

Classes: Code

Class Method Summary collapse

.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object

Creates a new Gap object from given sequence alignment.
.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object

Creates a new Gap object from given sequence alignment.
.parse(str) ⇒ Object

Same as new(str).

Instance Method Summary collapse

#==(other) ⇒ Object

If self == other, returns true.
#initialize(str = nil) ⇒ Gap constructor

Creates a new Gap object.
#process_sequences_na(reference, target, gap_char = '-') ⇒ Object

Processes nucleotide sequences and returns gapped sequences as an array of sequences.
#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object

Processes sequences and returns gapped sequences as an array of sequences.
#to_s ⇒ Object

string representation.

Constructor Details

#initialize(str = nil) ⇒ `Gap`

Creates a new Gap object.

Arguments:

str: a formatted string, or nil.

# File 'lib/bio/db/gff.rb', line 1283

def initialize(str = nil)
  if str then
    @data = str.split(/ +/).collect do |x|
      if /\A([A-Z])([0-9]+)\z/ =~ x.strip then
        Code.new($1.intern, $2.to_i)
      else
        warn "ignored unknown token: #{x}.inspect" if $VERBOSE
        nil
      end
    end
    @data.compact!
  else
    @data = []
  end
end

Class Method Details

.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ `Object`

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (nucleotide sequence)
gap_regexp: regexp to identify gap

# File 'lib/bio/db/gff.rb', line 1399

def self.new_from_sequences_na(reference, target,
                               gap_regexp = /[^a-zA-Z]/)
  gap = self.new
  gap.instance_eval { 
    __initialize_from_sequences_na(reference, target,
                                   gap_regexp)
  }
  gap
end

.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ `Object`

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.

For example,

atgg-taagac-att
M  V  K  -  I

is treated as:

atggt<aagacatt
M  V  K  >>I

Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.

Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.

Priority of regular expressions:

space > forward/reverse frameshift > gap

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (amino acid sequence)
gap_regexp: regexp to identify gap
space_regexp: regexp to identify space character which is completely ignored
forward_frameshift_regexp: regexp to identify forward frameshift
reverse_frameshift_regexp: regexp to identify reverse frameshift

# File 'lib/bio/db/gff.rb', line 1595

def self.new_from_sequences_na_aa(reference, target,
                                  gap_regexp = /[^a-zA-Z]/,
                                  space_regexp = /\s/,
                                  forward_frameshift_regexp = /\>/,
                                  reverse_frameshift_regexp = /\</)
  gap = self.new
  gap.instance_eval { 
    __initialize_from_sequences_na_aa(reference, target,
                                      gap_regexp,
                                      space_regexp,
                                      forward_frameshift_regexp,
                                      reverse_frameshift_regexp)
  }
  gap
end

.parse(str) ⇒ `Object`

Same as new(str).



1300
1301
1302

# File 'lib/bio/db/gff.rb', line 1300

def self.parse(str)
  self.new(str)
end

Instance Method Details

#==(other) ⇒ `Object`

If self == other, returns true. otherwise, returns false.

# File 'lib/bio/db/gff.rb', line 1623

def ==(other)
  if other.class == self.class and
      @data == other.data then
    true
  else
    false
  end
end

#process_sequences_na(reference, target, gap_char = '-') ⇒ `Object`

Processes nucleotide sequences and returns gapped sequences as an array of sequences.

Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (nucleotide sequence)
gap_char: gap character

# File 'lib/bio/db/gff.rb', line 1723

def process_sequences_na(reference, target, gap_char = '-')
  s_ref, s_tgt = dup_seqs(reference, target)

  s_ref, s_tgt = __process_sequences(s_ref, s_tgt,
                                     gap_char, gap_char,
                                     1, 1,
                                     gap_char, gap_char)

  if $VERBOSE and s_ref.length != s_tgt.length then
    warn "returned sequences not equal length"
  end
  return s_ref, s_tgt
end

#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ `Object`

Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.

Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:

atgaagat<aatgtc
M  K  I  N  V

Alignment of “Gap=M3 R3 M3” is:

atgaag<<<attaatgtc
M  K  I  I  N  V

Arguments:

reference: reference sequence (nucleotide sequence)
target: target sequence (amino acid sequence)
gap_char: gap character
space_char: space character inserted to amino sequence for matching na-aa alignment
forward_frameshift: forward frameshift character
reverse_frameshift: reverse frameshift character

# File 'lib/bio/db/gff.rb', line 1760

def process_sequences_na_aa(reference, target,
                            gap_char = '-',
                            space_char = ' ',
                            forward_frameshift = '>',
                            reverse_frameshift = '<')
  s_ref, s_tgt = dup_seqs(reference, target)
  s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}")
  ref_increment = 3
  tgt_increment = 1 + space_char.length * 2
  ref_gap = gap_char * 3
  tgt_gap = "#{gap_char}#{space_char}#{space_char}"
  return __process_sequences(s_ref, s_tgt,
                             ref_gap, tgt_gap,
                             ref_increment, tgt_increment,
                             forward_frameshift,
                             reverse_frameshift)
end

#to_s ⇒ `Object`

string representation



1612
1613
1614

# File 'lib/bio/db/gff.rb', line 1612

def to_s
  @data.collect { |x| x.to_s }.join(" ")
end

Class: Bio::GFF::GFF3::Record::Gap

Overview

Defined Under Namespace

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str = nil) ⇒ Gap

Class Method Details

.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object

.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object

.parse(str) ⇒ Object

Instance Method Details

#==(other) ⇒ Object

#process_sequences_na(reference, target, gap_char = '-') ⇒ Object

#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object

#to_s ⇒ Object

#initialize(str = nil) ⇒ `Gap`

.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ `Object`

.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ `Object`

.parse(str) ⇒ `Object`

#==(other) ⇒ `Object`

#process_sequences_na(reference, target, gap_char = '-') ⇒ `Object`

#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ `Object`

#to_s ⇒ `Object`