Class: Bio::Blat::Report::Hit
Overview
Hit class for the BLAT result parser. Similar to Bio::Blast::Report::Hit but lacks many methods. Its object may contain some Bio::Blat::Report::SegmentPair objects.
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Raw data of the hit.
Instance Method Summary collapse
-
#block_count ⇒ Object
Number of blocks(exons, segment pairs).
-
#block_sizes ⇒ Object
Sizes of all blocks(exons, segment pairs).
-
#blocks ⇒ Object
(also: #exons, #hsps)
Returns blocks(exons, segment pairs) of the hit.
-
#each(&x) ⇒ Object
Iterates over each block(exon, segment pair) of the hit.
-
#initialize(str) ⇒ Hit
constructor
Creates a new Hit object from a piece of BLAT result text.
-
#match ⇒ Object
Match nucleotides.
-
#milli_bad ⇒ Object
Calculates the pslCalcMilliBad value defined in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
-
#mismatch ⇒ Object
Mismatch nucleotides.
-
#n_s ⇒ Object
“N’s”.
-
#percent_identity ⇒ Object
Calculates the percent identity compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
-
#protein? ⇒ Boolean
When the output data comes from the protein query, returns true.
-
#query ⇒ Object
Returns sequence informations of the query.
-
#query_def ⇒ Object
(also: #query_id)
Returns the name of query sequence.
-
#query_len ⇒ Object
Returns the length of query sequence.
-
#rep_match ⇒ Object
“rep.
-
#score ⇒ Object
Calculates the score compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
-
#strand ⇒ Object
Returns strand information of the hit.
-
#target ⇒ Object
Returns sequence informations of the target(hit).
-
#target_def ⇒ Object
(also: #target_id, #definition)
Returns the name of the target(subject) sequence.
-
#target_len ⇒ Object
(also: #len)
Returns the length of the target(subject) sequence.
Constructor Details
#initialize(str) ⇒ Hit
Creates a new Hit object from a piece of BLAT result text. It is designed to be called internally from Bio::Blat::Report object. Users shall not use it directly.
252 253 254 |
# File 'lib/bio/appl/blat/report.rb', line 252 def initialize(str) @data = str.chomp.split(/\t/) end |
Instance Attribute Details
#data ⇒ Object (readonly)
Raw data of the hit. (Note that it doesn’t add 1 to position numbers.)
258 259 260 |
# File 'lib/bio/appl/blat/report.rb', line 258 def data @data end |
Instance Method Details
#block_count ⇒ Object
Number of blocks(exons, segment pairs).
309 |
# File 'lib/bio/appl/blat/report.rb', line 309 def block_count; @data[17].to_i; end |
#block_sizes ⇒ Object
Sizes of all blocks(exons, segment pairs). Returns an array of numbers.
313 314 315 316 317 318 |
# File 'lib/bio/appl/blat/report.rb', line 313 def block_sizes unless defined?(@block_sizes) then @block_sizes = split_comma(@data[18]).collect { |x| x.to_i } end @block_sizes end |
#blocks ⇒ Object Also known as: exons, hsps
Returns blocks(exons, segment pairs) of the hit. Returns an array of Bio::Blat::Report::SegmentPair objects.
322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 |
# File 'lib/bio/appl/blat/report.rb', line 322 def blocks unless defined?(@blocks) bs = block_sizes qst = query.starts tst = target.starts qseqs = query.seqs tseqs = target.seqs pflag = self.protein? @blocks = (0...block_count).collect do |i| SegmentPair.new(query.size, target.size, strand, bs[i], qst[i], tst[i], qseqs[i], tseqs[i], pflag) end end @blocks end |
#each(&x) ⇒ Object
Iterates over each block(exon, segment pair) of the hit. Yields a Bio::Blat::Report::SegmentPair object.
363 364 365 |
# File 'lib/bio/appl/blat/report.rb', line 363 def each(&x) #:yields: segmentpair exons.each(&x) end |
#match ⇒ Object
Match nucleotides.
291 |
# File 'lib/bio/appl/blat/report.rb', line 291 def match; @data[0].to_i; end |
#milli_bad ⇒ Object
Calculates the pslCalcMilliBad value defined in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
377 378 379 380 381 382 383 384 385 386 387 388 389 |
# File 'lib/bio/appl/blat/report.rb', line 377 def milli_bad w = (self.protein? ? 3 : 1) qalen = w * (self.query.end - self.query.start) talen = self.target.end - self.target.start alen = (if qalen < talen then qalen; else talen; end) return 0 if alen <= 0 d = qalen - talen d = 0 if d < 0 total = w * (self.match + self.rep_match + self.mismatch) return 0 if total == 0 return (1000 * (self.mismatch * w + self.query.gap_count + (3 * Math.log(1 + d)).round) / total) end |
#mismatch ⇒ Object
Mismatch nucleotides.
293 |
# File 'lib/bio/appl/blat/report.rb', line 293 def mismatch; @data[1].to_i; end |
#n_s ⇒ Object
“N’s”. Number of ‘N’ bases.
301 |
# File 'lib/bio/appl/blat/report.rb', line 301 def n_s; @data[3].to_i; end |
#percent_identity ⇒ Object
Calculates the percent identity compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
397 398 399 |
# File 'lib/bio/appl/blat/report.rb', line 397 def percent_identity 100.0 - self.milli_bad * 0.1 end |
#protein? ⇒ Boolean
When the output data comes from the protein query, returns true. Otherwise (nucleotide query), returns false. It returns nil if this cannot be determined.
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
Note: It seems that it returns true only when protein query with nucleotide database (blat options: -q=prot -t=dnax).
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 |
# File 'lib/bio/appl/blat/report.rb', line 410 def protein? return nil if self.block_sizes.empty? case self.strand[1,1] when '+' if self.target.end == self.target.starts[-1] + 3 * self.block_sizes[-1] then true else false end when '-' if self.target.start == self.target.size - self.target.starts[-1] - 3 * self.block_sizes[-1] then true else false end else nil end end |
#query ⇒ Object
Returns sequence informations of the query. Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.
269 270 271 272 273 274 275 276 |
# File 'lib/bio/appl/blat/report.rb', line 269 def query unless defined?(@query) d = @data @query = SeqDesc.new(d[4], d[5], d[9], d[10], d[11], d[12], split_comma(d[19]), split_comma(d[21])) end @query end |
#query_def ⇒ Object Also known as: query_id
Returns the name of query sequence.
349 |
# File 'lib/bio/appl/blat/report.rb', line 349 def query_def; query.name; end |
#query_len ⇒ Object
Returns the length of query sequence.
346 |
# File 'lib/bio/appl/blat/report.rb', line 346 def query_len; query.size; end |
#rep_match ⇒ Object
“rep. match”. Number of bases that match but are part of repeats. Note that current version of BLAT always set 0.
298 |
# File 'lib/bio/appl/blat/report.rb', line 298 def rep_match; @data[2].to_i; end |
#score ⇒ Object
Calculates the score compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).
438 439 440 441 442 |
# File 'lib/bio/appl/blat/report.rb', line 438 def score w = (self.protein? ? 3 : 1) w * (self.match + (self.rep_match >> 1)) - w * self.mismatch - self.query.gap_count - self.target.gap_count end |
#strand ⇒ Object
Returns strand information of the hit. Returns ‘+’ or ‘-’. This would be a Bio::Blat specific method.
306 |
# File 'lib/bio/appl/blat/report.rb', line 306 def strand; @data[8]; end |
#target ⇒ Object
Returns sequence informations of the target(hit). Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.
281 282 283 284 285 286 287 288 |
# File 'lib/bio/appl/blat/report.rb', line 281 def target unless defined?(@target) d = @data @target = SeqDesc.new(d[6], d[7], d[13], d[14], d[15], d[16], split_comma(d[20]), split_comma(d[22])) end @target end |
#target_def ⇒ Object Also known as: target_id, definition
Returns the name of the target(subject) sequence.
357 |
# File 'lib/bio/appl/blat/report.rb', line 357 def target_def; target.name; end |
#target_len ⇒ Object Also known as: len
Returns the length of the target(subject) sequence.
353 |
# File 'lib/bio/appl/blat/report.rb', line 353 def target_len; target.size; end |