Class: Bio::Big::OrfEmitter

Inherits:
Object
  • Object
show all
Defined in:
lib/bigbio/db/emitters/orf_emitter.rb

Instance Method Summary collapse

Constructor Details

#initialize(emit, type, min_size = 30, max_size = nil) ⇒ OrfEmitter

6-frame ORF emitter for (growing) sequences from the emit object. Type can be a symbol or a function. Symbols are

:stopstop   All sequences from STOP to STOP codon
:startstop  All sequences from START to STOP codon

size control is in nucleotides.

The difference with most other getorf implementations, including EMBOSS, is that:

1) ORFs get emitted during the reading of large continuous sequences,

e.g. chromosomes.

2) This allows processing in parallel to IO, even on a single CPU 3) ORFs come with splitting CODONs 4) Bordering ORFs are not included (by default), which is somehow

not easy with EMBOSS getorf

I have carefully designed this code, so it is easy to reason about the steps and prove correct. It is easy to understand, and therefore to parallelize correctly. Some features are:

5) Emit size does not matter for correctness 6) Reverse strands are positioned according to

GFF3 on the parent contig


235
236
237
238
239
240
# File 'lib/bigbio/db/emitters/orf_emitter.rb', line 235

def initialize emit, type, min_size=30, max_size=nil
  @em = emit
  @type = type
  @min_size = min_size
  @max_size = max_size
end

Instance Method Details

#emit_seqObject

Concats sequences from the emitter and yields the contained ORFs for every resulting frame (-3..-1, 1..3 ). Note that for the reverse frame, the resulting sequence is complemented! Translate these sequences in a forward frame only.

First :head, then :mid parts get emitted, closed by the :tail part.



249
250
251
252
253
254
255
256
257
258
259
260
# File 'lib/bigbio/db/emitters/orf_emitter.rb', line 249

def emit_seq
  @em.emit_seq do | part, index, tag, seq |
    # p [part, seq]
    # case part do
    #   when :head
    #   when :mid
    #   when :tail
    # end
    emit_forward(part, index, tag, seq) { |*x| yield(*x) }
    emit_reverse(part, index, tag, seq) { |*x| yield(*x) }
  end
end