Class: Bio::Blat::Report::Hit

Inherits:
Object
  • Object
show all
Defined in:
lib/bio/appl/blat/report.rb

Overview

Hit class for the BLAT result parser. Similar to Bio::Blast::Report::Hit but lacks many methods. Its object may contain some Bio::Blat::Report::SegmentPair objects.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str) ⇒ Hit

Creates a new Hit object from a piece of BLAT result text. It is designed to be called internally from Bio::Blat::Report object. Users shall not use it directly.



252
253
254
# File 'lib/bio/appl/blat/report.rb', line 252

def initialize(str)
  @data = str.chomp.split(/\t/)
end

Instance Attribute Details

#dataObject (readonly)

Raw data of the hit. (Note that it doesn’t add 1 to position numbers.)



258
259
260
# File 'lib/bio/appl/blat/report.rb', line 258

def data
  @data
end

Instance Method Details

#block_countObject

Number of blocks(exons, segment pairs).



309
# File 'lib/bio/appl/blat/report.rb', line 309

def block_count; @data[17].to_i; end

#block_sizesObject

Sizes of all blocks(exons, segment pairs). Returns an array of numbers.



313
314
315
316
317
318
# File 'lib/bio/appl/blat/report.rb', line 313

def block_sizes
  unless defined?(@block_sizes) then
    @block_sizes = split_comma(@data[18]).collect { |x| x.to_i }
  end
  @block_sizes
end

#blocksObject Also known as: exons, hsps

Returns blocks(exons, segment pairs) of the hit. Returns an array of Bio::Blat::Report::SegmentPair objects.



322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
# File 'lib/bio/appl/blat/report.rb', line 322

def blocks
  unless defined?(@blocks)
    bs    = block_sizes
    qst   = query.starts
    tst   = target.starts
    qseqs = query.seqs
    tseqs = target.seqs
    pflag = self.protein?
    @blocks = (0...block_count).collect do |i|
      SegmentPair.new(query.size, target.size, strand, bs[i],
                      qst[i], tst[i], qseqs[i], tseqs[i],
                      pflag)
    end
  end
  @blocks
end

#each(&x) ⇒ Object

Iterates over each block(exon, segment pair) of the hit. Yields a Bio::Blat::Report::SegmentPair object.



363
364
365
# File 'lib/bio/appl/blat/report.rb', line 363

def each(&x) #:yields: segmentpair
  exons.each(&x)
end

#matchObject

Match nucleotides.



291
# File 'lib/bio/appl/blat/report.rb', line 291

def match;       @data[0].to_i;  end

#milli_badObject

Calculates the pslCalcMilliBad value defined in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).



377
378
379
380
381
382
383
384
385
386
387
388
389
# File 'lib/bio/appl/blat/report.rb', line 377

def milli_bad
  w = (self.protein? ? 3 : 1)
  qalen = w * (self.query.end - self.query.start)
  talen = self.target.end - self.target.start
  alen = (if qalen < talen then qalen; else talen; end)
  return 0 if alen <= 0
  d = qalen - talen
  d = 0 if d < 0
  total = w * (self.match + self.rep_match + self.mismatch)
  return 0 if total == 0
  return (1000 * (self.mismatch * w + self.query.gap_count +
                    (3 * Math.log(1 + d)).round) / total)
end

#mismatchObject

Mismatch nucleotides.



293
# File 'lib/bio/appl/blat/report.rb', line 293

def mismatch;    @data[1].to_i;  end

#n_sObject

“N’s”. Number of ‘N’ bases.



301
# File 'lib/bio/appl/blat/report.rb', line 301

def n_s;         @data[3].to_i;  end

#percent_identityObject

Calculates the percent identity compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).



397
398
399
# File 'lib/bio/appl/blat/report.rb', line 397

def percent_identity
  100.0 - self.milli_bad * 0.1
end

#protein?Boolean

When the output data comes from the protein query, returns true. Otherwise (nucleotide query), returns false. It returns nil if this cannot be determined.

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

Note: It seems that it returns true only when protein query with nucleotide database (blat options: -q=prot -t=dnax).

Returns:

  • (Boolean)


410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
# File 'lib/bio/appl/blat/report.rb', line 410

def protein?
  return nil if self.block_sizes.empty?
  case self.strand[1,1]
  when '+'
    if self.target.end == self.target.starts[-1] +
        3 * self.block_sizes[-1] then
      true
    else
      false
    end
  when '-'
    if self.target.start == self.target.size -
        self.target.starts[-1] - 3 * self.block_sizes[-1] then
      true
    else
      false
    end
  else
    nil
  end
end

#queryObject

Returns sequence informations of the query. Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.



269
270
271
272
273
274
275
276
# File 'lib/bio/appl/blat/report.rb', line 269

def query
  unless defined?(@query)
    d = @data
    @query = SeqDesc.new(d[4], d[5], d[9], d[10], d[11], d[12],
                         split_comma(d[19]), split_comma(d[21]))
  end
  @query
end

#query_defObject Also known as: query_id

Returns the name of query sequence.



349
# File 'lib/bio/appl/blat/report.rb', line 349

def query_def;  query.name;  end

#query_lenObject

Returns the length of query sequence.



346
# File 'lib/bio/appl/blat/report.rb', line 346

def query_len;  query.size;  end

#rep_matchObject

“rep. match”. Number of bases that match but are part of repeats. Note that current version of BLAT always set 0.



298
# File 'lib/bio/appl/blat/report.rb', line 298

def rep_match;   @data[2].to_i;  end

#scoreObject

Calculates the score compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).



438
439
440
441
442
# File 'lib/bio/appl/blat/report.rb', line 438

def score
  w = (self.protein? ? 3 : 1)
  w * (self.match + (self.rep_match >> 1)) -
    w * self.mismatch - self.query.gap_count - self.target.gap_count
end

#strandObject

Returns strand information of the hit. Returns ‘+’ or ‘-’. This would be a Bio::Blat specific method.



306
# File 'lib/bio/appl/blat/report.rb', line 306

def strand;      @data[8];       end

#targetObject

Returns sequence informations of the target(hit). Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.



281
282
283
284
285
286
287
288
# File 'lib/bio/appl/blat/report.rb', line 281

def target
  unless defined?(@target)
    d = @data
    @target = SeqDesc.new(d[6], d[7], d[13], d[14], d[15], d[16],
                          split_comma(d[20]), split_comma(d[22]))
  end
  @target
end

#target_defObject Also known as: target_id, definition

Returns the name of the target(subject) sequence.



357
# File 'lib/bio/appl/blat/report.rb', line 357

def target_def; target.name; end

#target_lenObject Also known as: len

Returns the length of the target(subject) sequence.



353
# File 'lib/bio/appl/blat/report.rb', line 353

def target_len; target.size; end