Class: Bio::Locations
- Includes:
- Enumerable
- Defined in:
- lib/bio/location.rb
Overview
Description
The Bio::Locations class is a container for Bio::Location objects: creating a Bio::Locations object (based on a GenBank style position string) will spawn an array of Bio::Location objects.
Usage
locations = Bio::Locations.new('join(complement(500..550), 600..625)')
locations.each do |loc|
puts "class = " + loc.class.to_s
puts "range = #{loc.from}..#{loc.to} (strand = #{loc.strand})"
end
# Output would be:
# class = Bio::Location
# range = 500..550 (strand = -1)
# class = Bio::Location
# range = 600..625 (strand = 1)
# For the following three location strings, print the span and range
['one-of(898,900)..983',
'one-of(5971..6308,5971..6309)',
'8050..one-of(10731,10758,10905,11242)'].each do |loc|
location = Bio::Locations.new(loc)
puts location.span
puts location.range
end
GenBank location descriptor classification
Definition of the position notation of the GenBank location format
According to the GenBank manual ‘gbrel.txt’, position notations were classified into 10 patterns - (A) to (J).
3.4.12.2 Feature Location
The second column of the feature descriptor line designates the
location of the feature in the sequence. The location descriptor
begins at position 22. Several conventions are used to indicate
sequence location.
Base numbers in location descriptors refer to numbering in the entry,
which is not necessarily the same as the numbering scheme used in the
published report. The first base in the presented sequence is numbered
base 1. Sequences are presented in the 5 to 3 direction.
Location descriptors can be one of the following:
(A) 1. A single base;
(B) 2. A contiguous span of bases;
(C) 3. A site between two bases;
(D) 4. A single base chosen from a range of bases;
(E) 5. A single base chosen from among two or more specified bases;
(F) 6. A joining of sequence spans;
(G) 7. A reference to an entry other than the one to which the feature
belongs (i.e., a remote entry), followed by a location descriptor
referring to the remote sequence;
(H) 8. A literal sequence (a string of bases enclosed in quotation marks).
Description commented with pattern IDs.
(C) A site between two residues, such as an endonuclease cleavage site, is
indicated by listing the two bases separated by a carat (e.g., 23^24).
(D) A single residue chosen from a range of residues is indicated by the
number of the first and last bases in the range separated by a single
period (e.g., 23.79). The symbols < and > indicate that the end point
(I) of the range is beyond the specified base number.
(B) A contiguous span of bases is indicated by the number of the first and
last bases in the range separated by two periods (e.g., 23..79). The
(I) symbols < and > indicate that the end point of the range is beyond the
specified base number. Starting and ending positions can be indicated
by base number or by one of the operators described below.
Operators are prefixes that specify what must be done to the indicated
sequence to locate the feature. The following are the operators
available, along with their most common format and a description.
(J) complement (location): The feature is complementary to the location
indicated. Complementary strands are read 5 to 3.
(F) join (location, location, .. location): The indicated elements should
be placed end to end to form one contiguous sequence.
(F) order (location, location, .. location): The elements are found in the
specified order in the 5 to 3 direction, but nothing is implied about
the rationality of joining them.
(F) group (location, location, .. location): The elements are related and
should be grouped together, but no order is implied.
(E) one-of (location, location, .. location): The element can be any one,
but only one, of the items listed.
Reduction strategy of the position notations
-
(A) Location n
-
(B) Location n..m
-
© Location n^m
-
(D) (n.m) => Location n
-
(E)
-
one-of(n,m,..) => Location n
-
one-of(n..m,..) => Location n..m
-
-
(F)
-
order(loc,loc,..) => join(loc, loc,..)
-
group(loc,loc,..) => join(loc, loc,..)
-
join(loc,loc,..) => Sequence
-
-
(G) ID:loc => Location with ID
-
(H) “atgc” => Location only with Sequence
-
(I)
-
<n => Location n with lt flag
-
>n => Location n with gt flag
-
<n..m => Location n..m with lt flag
-
n..>m => Location n..m with gt flag
-
<n..>m => Location n..m with lt, gt flag
-
-
(J) complement(loc) => Sequence
-
(K) replace(loc, str) => Location with replacement Sequence
Instance Attribute Summary collapse
-
#locations ⇒ Object
(Array) An Array of Bio::Location objects.
-
#operator ⇒ Object
(Symbol or nil) Operator.
Instance Method Summary collapse
-
#==(other) ⇒ Object
If other is equal with the self, returns true.
-
#[](n) ⇒ Object
Returns nth Bio::Location object.
-
#absolute(n, type = nil) ⇒ Object
Converts relative position in the locus to position in the whole of the DNA sequence.
-
#each ⇒ Object
Iterates on each Bio::Location object.
-
#equals?(other) ⇒ Boolean
Evaluate equality of Bio::Locations object.
-
#first ⇒ Object
Returns first Bio::Location object.
-
#initialize(position) ⇒ Locations
constructor
Parses a GenBank style position string and returns a Bio::Locations object, which contains a list of Bio::Location objects.
-
#last ⇒ Object
Returns last Bio::Location object.
-
#length ⇒ Object
(also: #size)
Returns a length of the spliced RNA.
-
#range ⇒ Object
Similar to span, but returns a Range object min..max.
-
#relative(n, type = nil) ⇒ Object
Converts absolute position in the whole of the DNA sequence to relative position in the locus.
-
#span ⇒ Object
Returns an Array containing overall min and max position [min, max] of this Bio::Locations object.
-
#to_s ⇒ Object
String representation.
Constructor Details
#initialize(position) ⇒ Locations
346 347 348 349 350 351 352 353 354 |
# File 'lib/bio/location.rb', line 346 def initialize(position) @operator = nil if position.is_a? Array @locations = position else position = gbl_cleanup(position) # preprocessing @locations = gbl_pos2loc(position) # create an Array of Bio::Location objects end end |
Instance Attribute Details
#locations ⇒ Object
(Array) An Array of Bio::Location objects
357 358 359 |
# File 'lib/bio/location.rb', line 357 def locations @locations end |
#operator ⇒ Object
(Symbol or nil) Operator. nil (means :join), :order, or :group (obsolete).
361 362 363 |
# File 'lib/bio/location.rb', line 361 def operator @operator end |
Instance Method Details
#==(other) ⇒ Object
If other is equal with the self, returns true. Otherwise, returns false.
Arguments:
-
(required) other: any object
- Returns
-
true or false
381 382 383 384 385 386 387 388 389 390 |
# File 'lib/bio/location.rb', line 381 def ==(other) return true if super(other) return false unless other.instance_of?(self.class) if self.locations == other.locations and self.operator == other.operator then true else false end end |
#[](n) ⇒ Object
Returns nth Bio::Location object.
400 401 402 |
# File 'lib/bio/location.rb', line 400 def [](n) @locations[n] end |
#absolute(n, type = nil) ⇒ Object
Converts relative position in the locus to position in the whole of the DNA sequence.
This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ‘:aa’-flag returns the position of the associated amino-acid rather than the nucleotide.
loc = Bio::Locations.new('complement(12838..13533)')
puts loc.absolute(10) # => 13524
puts loc.absolute(10, :aa) # => 13506
Arguments:
-
(required) position: nucleotide position within locus
-
:aa: flag to be used if position is a aminoacid position rather than a nucleotide position
- Returns
-
position within the whole of the sequence
490 491 492 493 494 495 496 497 498 499 500 |
# File 'lib/bio/location.rb', line 490 def absolute(n, type = nil) case type when :location ; when :aa n = (n - 1) * 3 + 1 rel2abs(n) else rel2abs(n) end end |
#each ⇒ Object
Iterates on each Bio::Location object.
393 394 395 396 397 |
# File 'lib/bio/location.rb', line 393 def each @locations.each do |x| yield(x) end end |
#equals?(other) ⇒ Boolean
Evaluate equality of Bio::Locations object.
364 365 366 367 368 369 370 371 372 373 |
# File 'lib/bio/location.rb', line 364 def equals?(other) if ! other.kind_of?(Bio::Locations) return nil end if self.sort == other.sort return true else return false end end |
#first ⇒ Object
Returns first Bio::Location object.
405 406 407 |
# File 'lib/bio/location.rb', line 405 def first @locations.first end |
#last ⇒ Object
Returns last Bio::Location object.
410 411 412 |
# File 'lib/bio/location.rb', line 410 def last @locations.last end |
#length ⇒ Object Also known as: size
Returns a length of the spliced RNA.
429 430 431 432 433 434 435 436 437 438 439 |
# File 'lib/bio/location.rb', line 429 def length len = 0 @locations.each do |x| if x.sequence len += x.sequence.size else len += (x.to - x.from + 1) end end len end |
#range ⇒ Object
Similar to span, but returns a Range object min..max
423 424 425 426 |
# File 'lib/bio/location.rb', line 423 def range min, max = span min..max end |
#relative(n, type = nil) ⇒ Object
Converts absolute position in the whole of the DNA sequence to relative position in the locus.
This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ‘:aa’-flag returns the position of the associated amino-acid rather than the nucleotide.
loc = Bio::Locations.new('complement(12838..13533)')
puts loc.relative(13524) # => 10
puts loc.relative(13506, :aa) # => 3
Arguments:
-
(required) position: nucleotide position within whole of the sequence
-
:aa: flag that lets method return position in aminoacid coordinates
- Returns
-
position within the location
458 459 460 461 462 463 464 465 466 467 468 469 470 471 |
# File 'lib/bio/location.rb', line 458 def relative(n, type = nil) case type when :location ; when :aa if n = abs2rel(n) (n - 1) / 3 + 1 else nil end else abs2rel(n) end end |
#span ⇒ Object
Returns an Array containing overall min and max position [min, max] of this Bio::Locations object.
416 417 418 419 420 |
# File 'lib/bio/location.rb', line 416 def span span_min = @locations.min { |a,b| a.from <=> b.from } span_max = @locations.max { |a,b| a.to <=> b.to } return span_min.from, span_max.to end |
#to_s ⇒ Object
String representation.
Note: In some cases, it fails to detect whether “complement(join(…))” or “join(complement(..))”, and whether “complement(order(…))” or “order(complement(..))”.
- Returns
-
String
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 |
# File 'lib/bio/location.rb', line 511 def to_s return '' if @locations.empty? complement_join = false locs = @locations if locs.size >= 2 and locs.inject(true) do |flag, loc| # check if each location is complement (flag && (loc.strand == -1) && !loc.xref_id) end and locs.inject(locs[0].from) do |pos, loc| if pos then (pos >= loc.from) ? loc.from : false else false end end then locs = locs.reverse complement_join = true end locs = locs.collect do |loc| lt = loc.lt ? '<' : '' gt = loc.gt ? '>' : '' str = if loc.from == loc.to then "#{lt}#{gt}#{loc.from.to_i}" elsif loc.carat then "#{lt}#{loc.from.to_i}^#{gt}#{loc.to.to_i}" else "#{lt}#{loc.from.to_i}..#{gt}#{loc.to.to_i}" end if loc.xref_id and !loc.xref_id.empty? then str = "#{loc.xref_id}:#{str}" end if loc.strand == -1 and !complement_join then str = "complement(#{str})" end if loc.sequence then str = "replace(#{str},\"#{loc.sequence}\")" end str end if locs.size >= 2 then op = (self.operator || 'join').to_s result = "#{op}(#{locs.join(',')})" else result = locs[0] end if complement_join then result = "complement(#{result})" end result end |