Class: Bio::Ngs::Cufflinks::Compare
- Inherits:
-
Object
- Object
- Bio::Ngs::Cufflinks::Compare
- Includes:
- Command::Wrapper
- Defined in:
- lib/bio/appl/ngs/cufflinks.rb
Overview
cuffcompare v1.0.2 (2335)
Usage: cuffcompare [-r <reference_mrna.gtf>] [-R] [-T] [-V] [-s <seq_path>]
[-o <outprefix>] [-p <cprefix>]
{-i <input_gtf_list> | <input1.gtf> [<input2.gtf> .. <inputN.gtf>]}
Cuffcompare provides classification, reference annotation mapping and various
statistics for Cufflinks transfrags.
Cuffcompare clusters and tracks transfrags across multiple samples, writing
matching transcripts (intron chains) into <outprefix>.tracking, and a GTF
file <outprefix>.combined.gtf containing a nonredundant set of transcripts
across all input files (with a single representative transfrag chosen
for each clique of matching transfrags across samples).
Options: -i provide a text file with a list of Cufflinks GTF files to process instead
of expecting them as command line arguments (useful when a large number
of GTF files should be processed)
-r a set of known mRNAs to use as a reference for assessing
the accuracy of mRNAs or gene models given in <input.gtf>
-R for -r option, reduce the set of reference transcripts to
only those found to overlap any of the input loci
-M discard (ignore) single-exon transfrags and reference transcripts -N discard (ignore) single-exon reference transcripts
-s <seq_path> can be a multi-fasta file with all the genomic sequences or
a directory containing multiple single-fasta files (one file per contig);
lower case bases will be used to classify input transcripts as repeats
-d max distance (range) for grouping transcript start sites (100) -p the name prefix to use for consensus transcripts in the
<outprefix>.combined.gtf file (default: 'TCONS')
-C include the “contained” transcripts in the .combined.gtf file -G generic GFF input file(s) (do not assume Cufflinks GTF) -T do not generate .tmap and .refmap files for each input file -V verbose processing mode (showing all GFF parsing warnings)
Class Method Summary collapse
-
.build_compare_kb(gtf) ⇒ Object
Dump an hash of associations from a GTF file generated from CuffCompare gene_id: transcript_id, gene_name, oid, nearest_ref gene_id example: :XLOC_000001=>:transcripts=>{:TCONS_00000001=>{:oid=>:ENST00000519787, :nearest_ref=>:ENST00000519787}} the others are just plain hash transcript_id: gene_id, gene_name, oid, nearest_ref gene_name: gene_id, transcript_id, oid, nearest_ref oid: gene_id, transcript_id, gene_name, nearest_ref nearest_ref: gene_id, transcript_id, gene_name, oid Note:exons and coordinates are not saved.
- .exists_kb?(gtf) ⇒ Boolean
- .kb_name(gtf) ⇒ Object
-
.load_compare_kb(gtf) ⇒ Object
Return the hash of associations gene_id: transcript_id, gene_name, oid, nearest_ref transcript_id: gene_id, gene_name, oid, nearest_ref gene_name: gene_id, transcript_id, oid, nearest_ref oid: gene_id, transcript_id, gene_name, nearest_ref nearest_ref: gene_id, transcript_id, gene_name, oid.
Methods included from Command::Wrapper
#class_name, #default_options, included, #initialize, #normalize_params, #options, #options=, #output, #params, #params=, #path, #path=, #pipe_ahead, #pipe_ahead=, #pipe_ahead?, #program, #reset_params, #run, #sub_program, #thor_task, #to_cmd_ary, #use_aliases?
Class Method Details
.build_compare_kb(gtf) ⇒ Object
Dump an hash of associations from a GTF file generated from CuffCompare gene_id: transcript_id, gene_name, oid, nearest_ref
gene_id example: :XLOC_000001=>{:gene_name=>:RP11-304M2.1, :transcripts=>{:TCONS_00000001=>{:oid=>:ENST00000519787, :nearest_ref=>:ENST00000519787}}}
the others are just plain hash transcript_id: gene_id, gene_name, oid, nearest_ref gene_name: gene_id, transcript_id, oid, nearest_ref oid: gene_id, transcript_id, gene_name, nearest_ref nearest_ref: gene_id, transcript_id, gene_name, oid Note:exons and coordinates are not saved.
555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 |
# File 'lib/bio/appl/ngs/cufflinks.rb', line 555 def build_compare_kb(gtf) unless File.exists?(gtf) STDERR.puts "File #{gtf} doesn't exist." return nil end dict = {} #build an hash with the combinations of data extracted from GTF file, XLOC, TCONS, ENST, SYMBOL File.open(gtf,'r') do |f| f.lines do |line| line=~/gene_id (.*?);/ gene_id = $1.gsub(/"/,'').to_sym line=~/transcript_id (.*?);/ transcript_id = $1.gsub(/"/,'').to_sym line=~/gene_name (.*?);/ gene_name = $1.gsub(/"/,'').to_sym line=~/oId (.*?);/ oid=$1.gsub(/"/,'').to_sym line=~/nearest_ref (.*?);/ nearest_ref = $1.gsub(/"/,'').to_sym unless dict.key?(gene_id) dict[gene_id]={:gene_name=>gene_name,:transcripts=>{}} end unless dict[gene_id][:transcripts].key?(transcript_id) dict[gene_id][:transcripts][transcript_id]={:odi=>oid, :nearest_ref=>nearest_ref} end dict[transcript_id]={:gene_id=>gene_id, :gene_name=>gene_name, :odi=>oid, :nearest_ref=>nearest_ref} dict[gene_name]={:gene_id=>gene_id, :transcript_id=>transcript_id, :odi=>oid, :nearest_ref=>nearest_ref} dict[oid]={:gene_id=>gene_id, :transcript_id=>transcript_id, :gene_name=>gene_name, :nearest_ref=>nearest_ref} dict[nearest_ref]={:gene_id=>gene_id, :transcript_id=>transcript_id, :odi=>oid, :gene_name=>gene_name} end#lines end#file kb_filename = kb_name(gtf) File.open(kb_filename,'w') do |fkb| #fkb.write(dict.to_json) Marshal.dump(dict,fkb) end #fkb dict end |
.exists_kb?(gtf) ⇒ Boolean
542 543 544 |
# File 'lib/bio/appl/ngs/cufflinks.rb', line 542 def exists_kb?(gtf) File.exists?(kb_name(gtf)) end |
.kb_name(gtf) ⇒ Object
538 539 540 |
# File 'lib/bio/appl/ngs/cufflinks.rb', line 538 def kb_name(gtf) gtf.sub(/\.[a-zA-Z0-9]*$/,".kb") end |
.load_compare_kb(gtf) ⇒ Object
Return the hash of associations gene_id: transcript_id, gene_name, oid, nearest_ref transcript_id: gene_id, gene_name, oid, nearest_ref gene_name: gene_id, transcript_id, oid, nearest_ref oid: gene_id, transcript_id, gene_name, nearest_ref nearest_ref: gene_id, transcript_id, gene_name, oid
600 601 602 603 604 605 606 |
# File 'lib/bio/appl/ngs/cufflinks.rb', line 600 def load_compare_kb(gtf) #TODO rescue Exceptions kb_filename = kb_name(gtf) gtf_kb = File.open(kb_filename,'r') do |kb_dump| Marshal.load(kb_dump) end end |