Module: Bio::Sequence::SequenceMasker

Included in:
Bio::Sequence
Defined in:
lib/bio/sequence/sequence_masker.rb

Overview

Bio::Sequence::SequenceMasker is a mix-in module to provide helpful methods for masking a sequence.

It is only expected to be included in Bio::Sequence. In the future, methods in this module might be moved to Bio::Sequence or other module and this module might be removed. Please do not depend on this module.

Instance Method Summary collapse

Instance Method Details

#mask_with_enumerator(enum, mask_char) ⇒ Object

Masks the sequence with each value in the enum. The enum<em> should be an array or enumerator. A block must be given. When the block returns true, the sequence is masked with <em>mask_char.


Arguments:

  • (required) enum : Enumerator

  • (required) mask_char : (String) character used for masking

Returns

Bio::Sequence object



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/bio/sequence/sequence_masker.rb', line 39

def mask_with_enumerator(enum, mask_char)
  offset = 0
  unit = mask_char.length - 1
  s = self.seq.class.new(self.seq)
  j = 0
  enum.each_with_index do |item, index|
    if yield item then
      j = index + offset
      if j < s.length then
        s[j, 1] = mask_char
        offset += unit
      end
    end
  end
  newseq = self.dup
  newseq.seq = s
  newseq
end

#mask_with_error_probability(threshold, mask_char) ⇒ Object

Masks high error-probability sequence regions. For each sequence position, if the error probability is larger than the threshold, the sequence in the position is replaced with mask_char.


Arguments:

  • (required) threshold : (Numeric) threshold

  • (required) mask_char : (String) character used for masking

Returns

Bio::Sequence object



86
87
88
89
90
91
# File 'lib/bio/sequence/sequence_masker.rb', line 86

def mask_with_error_probability(threshold, mask_char)
  values = self.error_probabilities || []
  mask_with_enumerator(values, mask_char) do |item|
    item > threshold
  end
end

#mask_with_quality_score(threshold, mask_char) ⇒ Object

Masks low quality sequence regions. For each sequence position, if the quality score is smaller than the threshold, the sequence in the position is replaced with mask_char.

Note: This method does not care quality_score_type.


Arguments:

  • (required) threshold : (Numeric) threshold

  • (required) mask_char : (String) character used for masking

Returns

Bio::Sequence object



69
70
71
72
73
74
# File 'lib/bio/sequence/sequence_masker.rb', line 69

def mask_with_quality_score(threshold, mask_char)
  scores = self.quality_scores || []
  mask_with_enumerator(scores, mask_char) do |item|
    item < threshold
  end
end