Class: Bio::NBRF

Inherits:

Object
DB
Bio::NBRF

show all

Defined in:: lib/bio/db/nbrf.rb

Overview

Sequence data class for NBRF/PIR flatfile format.

Constant Summary collapse

DELIMITER = Delimiter of each entry. Bio::FlatFile uses it.

RS = "\n>"

DELIMITER_OVERRUN = (Integer) excess read size included in DELIMITER.

Instance Attribute Summary collapse

#data ⇒ Object

sequence data of the entry (???).
#definition ⇒ Object

Returns the description line of the NBRF/PIR formatted data.
#entry_id ⇒ Object (also: #accession)

Returns ID described in the entry.
#entry_overrun ⇒ Object readonly

piece of next entry.
#seq_type ⇒ Object

Returns sequence type described in the entry.

Class Method Summary collapse

.to_nbrf(hash) ⇒ Object

Creates a NBRF/PIR formatted text.

Instance Method Summary collapse

#aalen ⇒ Object

Returens the length of protein (amino acids) sequence.
#aaseq ⇒ Object

Returens the protein (amino acids) sequence.
#entry ⇒ Object (also: #to_s)

Returns the stored one entry as a NBRF/PIR format.
#initialize(str) ⇒ NBRF constructor

Creates a new NBRF object.
#length ⇒ Object

Returns sequence length.
#nalen ⇒ Object

Returens the length of sequence.
#naseq ⇒ Object

Returens the nucleic acid sequence.
#seq ⇒ Object

Returns sequence data.
#seq_class ⇒ Object

Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, depending on sequence type.

Methods inherited from DB

#exists?, #fetch, #get, open, #tags

Constructor Details

#initialize(str) ⇒ `NBRF`

Creates a new NBRF object. It stores the comment and sequence information from one entry of the NBRF/PIR format string. If the argument contains more than one entry, only the first entry is used.

# File 'lib/bio/db/nbrf.rb', line 45

def initialize(str)
  str = str.sub(/\A[\r\n]+/, '') # remove first void lines
  line1, line2, rest = str.split(/^/, 3)

  rest = rest.to_s
  rest.sub!(/^>.*/m, '') # remove trailing entries for sure
  @entry_overrun = $&
  rest.sub!(/\*\s*\z/, '') # remove last '*' and "\n"
  @data = rest

  @definition = line2.to_s.chomp
  if /^>?([A-Za-z0-9]{2})\;(.*)/ =~ line1.to_s then
    @seq_type = $1
    @entry_id = $2
  end
end

Instance Attribute Details

#data ⇒ `Object`

sequence data of the entry (???)



77
78
79

# File 'lib/bio/db/nbrf.rb', line 77

def data
  @data
end

#definition ⇒ `Object`

Returns the description line of the NBRF/PIR formatted data.



74
75
76

# File 'lib/bio/db/nbrf.rb', line 74

def definition
  @definition
end

#entry_id ⇒ `Object` Also known as: accession

Returns ID described in the entry.



70
71
72

# File 'lib/bio/db/nbrf.rb', line 70

def entry_id
  @entry_id
end

#entry_overrun ⇒ `Object` (readonly)

piece of next entry. Bio::FlatFile uses it.



80
81
82

# File 'lib/bio/db/nbrf.rb', line 80

def entry_overrun
  @entry_overrun
end

#seq_type ⇒ `Object`

Returns sequence type described in the entry.

P1 (protein), F1 (protein fragment)
DL (DNA linear), DC (DNA circular)
RL (DNA linear), RC (DNA circular)
N3 (tRNA), N1 (other functional RNA)



67
68
69

# File 'lib/bio/db/nbrf.rb', line 67

def seq_type
  @seq_type
end

Class Method Details

.to_nbrf(hash) ⇒ `Object`

Creates a NBRF/PIR formatted text. Parameters can be omitted.

# File 'lib/bio/db/nbrf.rb', line 167

def self.to_nbrf(hash)
  seq_type = hash[:seq_type]
  seq = hash[:seq]
  unless seq_type
    if seq.is_a?(Bio::Sequence::AA) then
      seq_type = 'P1'
    elsif seq.is_a?(Bio::Sequence::NA) then
      seq_type = /u/i =~ seq ? 'RL' : 'DL'
    else
      seq_type = 'XX'
    end
  end
  width = hash.has_key?(:width) ? hash[:width] : 70
  if width then
    seq = seq.to_s + "*"
    seq.gsub!(Regexp.new(".{1,#{width}}"), "\\0\n")
  else
    seq = seq.to_s + "*\n"
  end
  ">#{seq_type};#{hash[:entry_id]}\n#{hash[:definition]}\n#{seq}"
end

Instance Method Details

#aalen ⇒ `Object`

Returens the length of protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.



157
158
159

# File 'lib/bio/db/nbrf.rb', line 157

def aalen
  aaseq.length
end

#aaseq ⇒ `Object`

Returens the protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.

# File 'lib/bio/db/nbrf.rb', line 143

def aaseq
  if seq.is_a?(Bio::Sequence::NA) then
    raise 'not nucleic but protein sequence'
  elsif seq.is_a?(Bio::Sequence::AA) then
    seq
  else
    Bio::Sequence::AA.new(seq)
  end
end

#entry ⇒ `Object` Also known as: to_s

Returns the stored one entry as a NBRF/PIR format. (same as to_s)



84
85
86

# File 'lib/bio/db/nbrf.rb', line 84

def entry
  @entry = ">#{@seq_type or 'XX'};#{@entry_id}\n#{definition}\n#{@data}*\n"
end

#length ⇒ `Object`

Returns sequence length.



115
116
117

# File 'lib/bio/db/nbrf.rb', line 115

def length
  seq.length
end

#nalen ⇒ `Object`

Returens the length of sequence. If you call nalen for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.



135
136
137

# File 'lib/bio/db/nbrf.rb', line 135

def nalen
  naseq.length
end

#naseq ⇒ `Object`

Returens the nucleic acid sequence. If you call naseq for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.

# File 'lib/bio/db/nbrf.rb', line 122

def naseq
  if seq.is_a?(Bio::Sequence::AA) then
    raise 'not nucleic but protein sequence'
  elsif seq.is_a?(Bio::Sequence::NA) then
    seq
  else
    Bio::Sequence::NA.new(seq)
  end
end

#seq ⇒ `Object`

Returns sequence data. Returns Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence, according to the sequence type.

# File 'lib/bio/db/nbrf.rb', line 107

def seq
  unless defined?(@seq)
    @seq = seq_class.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up
  end
  @seq
end

#seq_class ⇒ `Object`

Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, depending on sequence type.

# File 'lib/bio/db/nbrf.rb', line 91

def seq_class
  case @seq_type
  when /[PF]1/
    # protein
    Sequence::AA
  when /[DR][LC]/, /N[13]/
    # nucleic
    Sequence::NA
  else
    Sequence
  end
end

Class: Bio::NBRF

Overview

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from DB

Constructor Details

#initialize(str) ⇒ NBRF

Instance Attribute Details

#data ⇒ Object

#definition ⇒ Object

#entry_id ⇒ Object Also known as: accession

#entry_overrun ⇒ Object (readonly)

#seq_type ⇒ Object

Class Method Details

.to_nbrf(hash) ⇒ Object

Instance Method Details

#aalen ⇒ Object

#aaseq ⇒ Object

#entry ⇒ Object Also known as: to_s

#length ⇒ Object

#nalen ⇒ Object

#naseq ⇒ Object

#seq ⇒ Object

#seq_class ⇒ Object