Class: Bio::Blat::Report::Hit
Overview
Hit class for the BLAT result parser. Similar to Bio::Blast::Report::Hit but lacks many methods. Its object may contain some Bio::Blat::Report::SegmentPair objects.
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Raw data of the hit.
Instance Method Summary collapse
-
#block_count ⇒ Object
Number of blocks(exons, segment pairs).
-
#block_sizes ⇒ Object
Sizes of all blocks(exons, segment pairs).
-
#blocks ⇒ Object
(also: #exons, #hsps)
Returns blocks(exons, segment pairs) of the hit.
-
#each(&x) ⇒ Object
Iterates over each block(exon, segment pair) of the hit.
-
#initialize(str) ⇒ Hit
constructor
Creates a new Hit object from a piece of BLAT result text.
-
#match ⇒ Object
Match nucleotides.
-
#milli_bad ⇒ Object
Calculates the pslCalcMilliBad value defined in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
-
#mismatch ⇒ Object
Mismatch nucleotides.
-
#n_s ⇒ Object
“N’s”.
-
#percent_identity ⇒ Object
Calculates the percent identity compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
-
#protein? ⇒ Boolean
When the output data comes from the protein query, returns true.
-
#query ⇒ Object
Returns sequence informations of the query.
-
#query_def ⇒ Object
(also: #query_id)
Returns the name of query sequence.
-
#query_len ⇒ Object
Returns the length of query sequence.
-
#rep_match ⇒ Object
“rep.
-
#score ⇒ Object
Calculates the score compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
-
#strand ⇒ Object
Returns strand information of the hit.
-
#target ⇒ Object
Returns sequence informations of the target(hit).
-
#target_def ⇒ Object
(also: #target_id, #definition)
Returns the name of the target(subject) sequence.
-
#target_len ⇒ Object
(also: #len)
Returns the length of the target(subject) sequence.
Constructor Details
#initialize(str) ⇒ Hit
Creates a new Hit object from a piece of BLAT result text. It is designed to be called internally from Bio::Blat::Report object. Users shall not use it directly.
293 294 295 |
# File 'lib/bio/appl/blat/report.rb', line 293 def initialize(str) @data = str.chomp.split(/\t/) end |
Instance Attribute Details
#data ⇒ Object (readonly)
Raw data of the hit. (Note that it doesn’t add 1 to position numbers.)
299 300 301 |
# File 'lib/bio/appl/blat/report.rb', line 299 def data @data end |
Instance Method Details
#block_count ⇒ Object
Number of blocks(exons, segment pairs).
350 |
# File 'lib/bio/appl/blat/report.rb', line 350 def block_count; @data[17].to_i; end |
#block_sizes ⇒ Object
Sizes of all blocks(exons, segment pairs). Returns an array of numbers.
354 355 356 357 358 359 |
# File 'lib/bio/appl/blat/report.rb', line 354 def block_sizes unless defined?(@block_sizes) then @block_sizes = split_comma(@data[18]).collect { |x| x.to_i } end @block_sizes end |
#blocks ⇒ Object Also known as: exons, hsps
Returns blocks(exons, segment pairs) of the hit. Returns an array of Bio::Blat::Report::SegmentPair objects.
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
# File 'lib/bio/appl/blat/report.rb', line 363 def blocks unless defined?(@blocks) bs = block_sizes qst = query.starts tst = target.starts qseqs = query.seqs tseqs = target.seqs pflag = self.protein? @blocks = (0...block_count).collect do |i| SegmentPair.new(query.size, target.size, strand, bs[i], qst[i], tst[i], qseqs[i], tseqs[i], pflag) end end @blocks end |
#each(&x) ⇒ Object
Iterates over each block(exon, segment pair) of the hit. Yields a Bio::Blat::Report::SegmentPair object.
404 405 406 |
# File 'lib/bio/appl/blat/report.rb', line 404 def each(&x) #:yields: segmentpair exons.each(&x) end |
#match ⇒ Object
Match nucleotides.
332 |
# File 'lib/bio/appl/blat/report.rb', line 332 def match; @data[0].to_i; end |
#milli_bad ⇒ Object
Calculates the pslCalcMilliBad value defined in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
418 419 420 421 422 423 424 425 426 427 428 429 430 |
# File 'lib/bio/appl/blat/report.rb', line 418 def milli_bad w = (self.protein? ? 3 : 1) qalen = w * (self.query.end - self.query.start) talen = self.target.end - self.target.start alen = (if qalen < talen then qalen; else talen; end) return 0 if alen <= 0 d = qalen - talen d = 0 if d < 0 total = w * (self.match + self.rep_match + self.mismatch) return 0 if total == 0 return (1000 * (self.mismatch * w + self.query.gap_count + (3 * Math.log(1 + d)).round) / total) end |
#mismatch ⇒ Object
Mismatch nucleotides.
334 |
# File 'lib/bio/appl/blat/report.rb', line 334 def mismatch; @data[1].to_i; end |
#n_s ⇒ Object
“N’s”. Number of ‘N’ bases.
342 |
# File 'lib/bio/appl/blat/report.rb', line 342 def n_s; @data[3].to_i; end |
#percent_identity ⇒ Object
Calculates the percent identity compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
438 439 440 |
# File 'lib/bio/appl/blat/report.rb', line 438 def percent_identity 100.0 - self.milli_bad * 0.1 end |
#protein? ⇒ Boolean
When the output data comes from the protein query, returns true. Otherwise (nucleotide query), returns false. It returns nil if this cannot be determined.
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
Note: It seems that it returns true only when protein query with nucleotide database (blat options: -q=prot -t=dnax).
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 |
# File 'lib/bio/appl/blat/report.rb', line 451 def protein? return nil if self.block_sizes.empty? case self.strand[1,1] when '+' if self.target.end == self.target.starts[-1] + 3 * self.block_sizes[-1] then true else false end when '-' if self.target.start == self.target.size - self.target.starts[-1] - 3 * self.block_sizes[-1] then true else false end else nil end end |
#query ⇒ Object
Returns sequence informations of the query. Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.
310 311 312 313 314 315 316 317 |
# File 'lib/bio/appl/blat/report.rb', line 310 def query unless defined?(@query) d = @data @query = SeqDesc.new(d[4], d[5], d[9], d[10], d[11], d[12], split_comma(d[19]), split_comma(d[21])) end @query end |
#query_def ⇒ Object Also known as: query_id
Returns the name of query sequence.
390 |
# File 'lib/bio/appl/blat/report.rb', line 390 def query_def; query.name; end |
#query_len ⇒ Object
Returns the length of query sequence.
387 |
# File 'lib/bio/appl/blat/report.rb', line 387 def query_len; query.size; end |
#rep_match ⇒ Object
“rep. match”. Number of bases that match but are part of repeats. Note that current version of BLAT always set 0.
339 |
# File 'lib/bio/appl/blat/report.rb', line 339 def rep_match; @data[2].to_i; end |
#score ⇒ Object
Calculates the score compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
479 480 481 482 483 |
# File 'lib/bio/appl/blat/report.rb', line 479 def score w = (self.protein? ? 3 : 1) w * (self.match + (self.rep_match >> 1)) - w * self.mismatch - self.query.gap_count - self.target.gap_count end |
#strand ⇒ Object
Returns strand information of the hit. Returns ‘+’ or ‘-’. This would be a Bio::Blat specific method.
347 |
# File 'lib/bio/appl/blat/report.rb', line 347 def strand; @data[8]; end |
#target ⇒ Object
Returns sequence informations of the target(hit). Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.
322 323 324 325 326 327 328 329 |
# File 'lib/bio/appl/blat/report.rb', line 322 def target unless defined?(@target) d = @data @target = SeqDesc.new(d[6], d[7], d[13], d[14], d[15], d[16], split_comma(d[20]), split_comma(d[22])) end @target end |
#target_def ⇒ Object Also known as: target_id, definition
Returns the name of the target(subject) sequence.
398 |
# File 'lib/bio/appl/blat/report.rb', line 398 def target_def; target.name; end |
#target_len ⇒ Object Also known as: len
Returns the length of the target(subject) sequence.
394 |
# File 'lib/bio/appl/blat/report.rb', line 394 def target_len; target.size; end |