Module: Bio::Sequence::Format

Defined in:
lib/bio/sequence/format.rb

Overview

DESCRIPTION

A Mixin of methods used by Bio::Sequence#output to output sequences in common bioinformatic formats. These are not called in isolation.

USAGE

# Given a Bio::Sequence object,
puts s.output(:fasta)
puts s.output(:genbank)
puts s.output(:embl)

Instance Method Summary collapse

Instance Method Details

#format_emblObject

INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any case, it would be difficult to successfully call this method outside its expected context).

Output the EMBL format string of the sequence.

Used in Bio::Sequence#output.


Returns

String object



98
99
100
101
102
103
104
# File 'lib/bio/sequence/format.rb', line 98

def format_embl
  prefix = 'FT   '
  indent = prefix + ' ' * 16
  fwidth = 80 - indent.length

  format_features(prefix, indent, fwidth)
end

#format_fasta(header = nil, width = nil) ⇒ Object

INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any case, it would be difficult to successfully call this method outside its expected context).

Output the FASTA format string of the sequence.

UNFORTUNATLY, the current implementation of Bio::Sequence is incapable of using either the header or width arguments. So something needs to be changed…

Currently, this method is used in Bio::Sequence#output like so,

s = Bio::Sequence.new('atgc')
puts s.output(:fasta)                   #=> "> \natgc\n"

Arguments:

  • (optional) header: String (default nil)

  • (optional) width: Fixnum (default nil)

Returns

String object



55
56
57
58
59
60
61
62
63
64
# File 'lib/bio/sequence/format.rb', line 55

def format_fasta(header = nil, width = nil)
  header ||= "#{@entry_id} #{@definition}"

  ">#{header}\n" +
  if width
    @seq.to_s.gsub(Regexp.new(".{1,#{width}}"), "\\0\n")
  else
    @seq.to_s + "\n"
  end
end

#format_genbankObject

INTERNAL USE ONLY, YOU SHOULD NOT CALL THIS METHOD. (And in any case, it would be difficult to successfully call this method outside its expected context).

Output the Genbank format string of the sequence.

Used in Bio::Sequence#output.


Returns

String object



82
83
84
85
86
87
88
# File 'lib/bio/sequence/format.rb', line 82

def format_genbank
  prefix = ' ' * 5
  indent = prefix + ' ' * 16
  fwidth = 79 - indent.length

  format_features(prefix, indent, fwidth)
end

#format_gffObject

Not yet implemented :) Remove the nodoc command after implementation!


Returns

String object

Raises:

  • (NotImplementedError)


70
71
72
# File 'lib/bio/sequence/format.rb', line 70

def format_gff #:nodoc:
  raise NotImplementedError
end