Class: Bio::Big::OrfEmitter
- Inherits:
-
Object
- Object
- Bio::Big::OrfEmitter
- Defined in:
- lib/bigbio/db/emitters/orf_emitter.rb
Instance Method Summary collapse
-
#emit_seq ⇒ Object
Concats sequences from the emitter and yields the contained ORFs for every resulting frame (-3..-1, 1..3 ).
-
#initialize(emit, type, min_size = 30, max_size = nil) ⇒ OrfEmitter
constructor
6-frame ORF emitter for (growing) sequences from the
emit
object.
Constructor Details
#initialize(emit, type, min_size = 30, max_size = nil) ⇒ OrfEmitter
6-frame ORF emitter for (growing) sequences from the emit
object. Type can be a symbol or a function. Symbols are
:stopstop All sequences from STOP to STOP codon
:startstop All sequences from START to STOP codon
size control is in nucleotides.
The difference with most other getorf implementations, including EMBOSS, is that:
1) ORFs get emitted during the reading of large continuous sequences,
e.g. chromosomes.
2) This allows processing in parallel to IO, even on a single CPU 3) ORFs come with splitting CODONs 4) Bordering ORFs are not included (by default), which is somehow
not easy with EMBOSS getorf
I have carefully designed this code, so it is easy to reason about the steps and prove correct. It is easy to understand, and therefore to parallelize correctly. Some features are:
5) Emit size does not matter for correctness 6) Reverse strands are positioned according to
GFF3 on the parent contig
235 236 237 238 239 240 |
# File 'lib/bigbio/db/emitters/orf_emitter.rb', line 235 def initialize emit, type, min_size=30, max_size=nil @em = emit @type = type @min_size = min_size @max_size = max_size end |
Instance Method Details
#emit_seq ⇒ Object
Concats sequences from the emitter and yields the contained ORFs for every resulting frame (-3..-1, 1..3 ). Note that for the reverse frame, the resulting sequence is complemented! Translate these sequences in a forward frame only.
First :head, then :mid parts get emitted, closed by the :tail part.
249 250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/bigbio/db/emitters/orf_emitter.rb', line 249 def emit_seq @em.emit_seq do | part, index, tag, seq | # p [part, seq] # case part do # when :head # when :mid # when :tail # end emit_forward(part, index, tag, seq) { |*x| yield(*x) } emit_reverse(part, index, tag, seq) { |*x| yield(*x) } end end |