Class: Bio::DB::Vcf
- Inherits:
-
Object
- Object
- Bio::DB::Vcf
- Defined in:
- lib/bio/util/bio-gngm.rb
Overview
Extends the methods of the Bio::DB::Vcf class in bio-samtools. A Vcf object represents the VCF format described at www.1000genomes.org/node/101 . The Bio::DB::Vcf object returns all information in the VCF line, but the implementation here acts as if there is only possibly one variant at each position and ignores positions at which there may be multiple variants. Vcf format is only used when the Bio::Util::Gngm object requests information about indels using SAMtools mpileup method.
Instance Method Summary collapse
-
#alternatives ⇒ Object
List of alternate alleles at this locus, obtained by splitting the vcf.alt attribute string on commas.
-
#gq ⇒ Object
Returns the genotype quality score from the sample data (as defined by the Vcf GQ attribute) for the first sample in the Vcf only.
-
#is_indel?(options) ⇒ Boolean
Returns true if ref col is different in length from any of the entries in alt column.
-
#is_mnp?(options) ⇒ Boolean
returns true if ref col has same length as all alternatives and position variant passes quality.
-
#is_snp?(options) ⇒ Boolean
returns true if ref col has length of 1 and is_mnp?.
-
#mq ⇒ Object
Returns the mean Mapping Quality from the reads over this position as defined by the Vcf MQ attribute.
-
#non_ref_allele_count ⇒ Object
Returns the depth of reads containing the non reference allele.
-
#non_ref_allele_freq ⇒ Object
Returns the non-reference allele frequency based on depth of reads used for the genotype call,.
-
#pass_quality?(options) ⇒ Boolean
Returns true if the position passes criteria.
-
#pl ⇒ Object
Returns the phred scaled likelihood of the first non-reference allele (as defined by the Vcf PL attribute) for the first sample in the Vcf only.
-
#to_s ⇒ Object
Return a short string representing chromosome, position, reference sequence, alt sequence(s) and the info string of the Vcf object.
-
#used_depth ⇒ Object
The depth of reads actually used in the genotype call by Vcftools.
-
#variant? ⇒ Boolean
returns true if the
alt
column of the Vcf is not . .
Instance Method Details
#alternatives ⇒ Object
List of alternate alleles at this locus, obtained by splitting the vcf.alt attribute string on commas
Example vcf.alt = “ACT,TCA” vcf.alternatives = [“ACT”, “TCA”] vcf.alt = “T” vcf.alternatives = [“T”]
123 124 125 |
# File 'lib/bio/util/bio-gngm.rb', line 123 def alternatives self.alt.split(",") rescue [] end |
#gq ⇒ Object
Returns the genotype quality score from the sample data (as defined by the Vcf GQ attribute) for the first sample in the Vcf only.
146 147 148 |
# File 'lib/bio/util/bio-gngm.rb', line 146 def gq self.samples["1"]["GQ"].to_f rescue 0.0 end |
#is_indel?(options) ⇒ Boolean
Returns true if ref col is different in length from any of the entries in alt column
186 187 188 189 |
# File 'lib/bio/util/bio-gngm.rb', line 186 def is_indel?() return true if self.variant? and self.alternatives.any? {|x| x.length != self.ref.length} and self.pass_quality?() false end |
#is_mnp?(options) ⇒ Boolean
returns true if ref col has same length as all alternatives and position variant passes quality
174 175 176 177 |
# File 'lib/bio/util/bio-gngm.rb', line 174 def is_mnp?() return true if self.alternatives.all? {|x| x.length == self.ref.length} and self.pass_quality?() false end |
#is_snp?(options) ⇒ Boolean
returns true if ref col has length of 1 and is_mnp?
180 181 182 183 |
# File 'lib/bio/util/bio-gngm.rb', line 180 def is_snp?() return true if self.is_mnp?() and self.ref.length == 1 false end |
#mq ⇒ Object
Returns the mean Mapping Quality from the reads over this position as defined by the Vcf MQ attribute.
141 142 143 |
# File 'lib/bio/util/bio-gngm.rb', line 141 def mq self.info["MQ"].to_f rescue 0.0 end |
#non_ref_allele_count ⇒ Object
Returns the depth of reads containing the non reference allele. IE the sum of the last two figures in the DP4 attribute.
128 129 130 |
# File 'lib/bio/util/bio-gngm.rb', line 128 def non_ref_allele_count self.info["DP4"].split(",")[2..3].inject {|sum,n| sum.to_f + n.to_f } rescue 0.0 end |
#non_ref_allele_freq ⇒ Object
Returns the non-reference allele frequency based on depth of reads used for the genotype call,
IE vcf.non_ref_allele_count / vcf.used_depth
136 137 138 |
# File 'lib/bio/util/bio-gngm.rb', line 136 def non_ref_allele_freq self.non_ref_allele_count / self.used_depth end |
#pass_quality?(options) ⇒ Boolean
Returns true if the position passes criteria
Options and Defaults:
-
:min_depth => 2
-
:min_non_ref_count => 2
-
:mapping_quality => 10
Example vcf.pass_quality?(:min_depth => 5, :min_non_ref_count => 2, :mapping_quality => 25, :min_snp_quality => 20)
169 170 171 |
# File 'lib/bio/util/bio-gngm.rb', line 169 def pass_quality?() (self.used_depth >= [:min_depth] and self.mq >= [:mapping_quality] and self.non_ref_allele_count >= [:min_non_ref_count] and self.qual >= [:min_snp_quality]) end |
#pl ⇒ Object
Returns the phred scaled likelihood of the first non-reference allele (as defined by the Vcf PL attribute) for the first sample in the Vcf only.
151 152 153 |
# File 'lib/bio/util/bio-gngm.rb', line 151 def pl self.samples["1"]["PL"].split(",")[1].to_f rescue 0.0 end |
#to_s ⇒ Object
Return a short string representing chromosome, position, reference sequence, alt sequence(s) and the info string of the Vcf object.
107 108 109 |
# File 'lib/bio/util/bio-gngm.rb', line 107 def to_s "#{self.chrom} #{self.pos} #{self.ref} #{self.alt} #{self.info}" end |
#used_depth ⇒ Object
The depth of reads actually used in the genotype call by Vcftools. The sum of the DP4 attribute. Returns 0.0 if no value is calculated.
112 113 114 |
# File 'lib/bio/util/bio-gngm.rb', line 112 def used_depth self.info["DP4"].split(",").inject {|sum,n| sum.to_f + n.to_f} rescue 0.0 end |
#variant? ⇒ Boolean
returns true if the alt
column of the Vcf is not .
Examples
vcf record = 20 14370 rs6054257 G A 29 PASS … vcf.variant? #=> true vcf record = 20 1230237 . T . 47 PASS … vcf.variant? #=> false
102 103 104 |
# File 'lib/bio/util/bio-gngm.rb', line 102 def variant? not self.alt == "." rescue false end |