Class: Cheripic::ContigPileups
- Inherits:
-
Object
- Object
- Cheripic::ContigPileups
- Extended by:
- Forwardable
- Includes:
- Enumerable
- Defined in:
- lib/cheripic/contig_pileups.rb
Overview
A ContigPileup object for each contig from assembly that stores pileup file information and variants are selected from analysis of pileup files selected variants from pileup files is stored as hashes
Instance Attribute Summary collapse
-
#bg_bulk ⇒ Hash
A hash of variant positions from bg_bulk as keys and pileup info as values.
-
#bg_parent ⇒ Hash
A hash of variant positions from bg_parent as keys and pileup info as values.
-
#id ⇒ String
Id of the contig in assembly taken from fasta file.
-
#masked_regions ⇒ Object
Returns the value of attribute masked_regions.
-
#mut_bulk ⇒ Hash
A hash of variant positions from mut_bulk as keys and pileup info as values.
-
#mut_parent ⇒ Hash
A hash of variant positions from mut_parent as keys and pileup info as values.
-
#parent_hemi ⇒ Hash
A hash of hemi-variant positions as keys and bfr calculated from parent bulks as values.
Instance Method Summary collapse
-
#bulks_compared ⇒ Array<Hash>
bulk pileups are compared and variant positions are selected for homozygous, heterozygous and hemi-variant positions.
-
#categorise_pos(var_type, pos, ratio) ⇒ Object
method stores pos as key and allele fraction as value to @hm_pos or @ht_pos hash based on variant type.
-
#compare_pileup(pos) ⇒ Object
mut_bulk and bg_bulk pileups are compared at selected position of the contig.
-
#compare_var_type(muttype, bgtype) ⇒ Symbol
Simple comparison of variant type of mut and bg bulks at a position If both bulks have homozygous variant at selected position then it is ignored.
-
#hemisnps_in_parent ⇒ Hash
Compares parental pileups for the contig and identify position that indicate variants from homeologues called hemi-snps and calculates bulk frequency ratio (bfr).
-
#initialize(fasta) ⇒ ContigPileups
constructor
creates a ContigPileup object using fasta entry id.
-
#var_mode(fraction) ⇒ Symbol
Categorizes variant zygosity based on the allele fraction provided.
-
#var_mode_fraction(pileup_info) ⇒ Symbol, Float
Method to extract var_mode and allele fraction from pileup information at a position in contig.
Constructor Details
#initialize(fasta) ⇒ ContigPileups
creates a ContigPileup object using fasta entry id
39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/cheripic/contig_pileups.rb', line 39 def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end |
Instance Attribute Details
#bg_bulk ⇒ Hash
Returns a hash of variant positions from bg_bulk as keys and pileup info as values.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/cheripic/contig_pileups.rb', line 26 class ContigPileups include Enumerable extend Forwardable def_delegators :@mut_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store attr_accessor :id, :parent_hemi attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions # creates a ContigPileup object using fasta entry id # @param fasta [String] a contig id from fasta entry def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end # bulk pileups are compared and variant positions are selected # @return [Array<Hash>] variant positions are stored in hashes # for homozygous, heterozygous and hemi-variant positions def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end # mut_bulk and bg_bulk pileups are compared at selected position of the contig. # Empty hash results from position below selected coverage # or bases freq below noise and such positions are deleted. # @param pos [Integer] position in the contig # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end # Method to extract var_mode and allele fraction from pileup information at a position in contig # # @param pileup_info [Pileup] pileup object # @return [Symbol] variant mode from pileup position (:hom or :het) at the position # @return [Float] allele fraction at the position def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end # Categorizes variant zygosity based on the allele fraction provided. # Uses lower and upper limit set for heterozygosity in the options. # @note consider increasing the range of heterozygosity limits for RNA-seq data # @param fraction [Float] allele fraction # @return [Symbol] of either :het or :hom to represent heterozygous or homozygous respectively def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end # Simple comparison of variant type of mut and bg bulks at a position # If both bulks have homozygous variant at selected position then it is ignored # @param muttype [Symbol] values are either :hom or :het # @param bgtype [Symbol] values are either :hom or :het # @return [Symbol] variant mode of the mut bulk (:hom or :het) at the position or nil def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end # method stores pos as key and allele fraction as value # to @hm_pos or @ht_pos hash based on variant type # @param var_type [Symbol] values are either :hom or :het # @param pos [Integer] position in the contig # @param ratio [Float] allele fraction def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end # Compares parental pileups for the contig and identify position # that indicate variants from homeologues called hemi-snps # and calculates bulk frequency ratio (bfr) # @return [Hash] parent_hemi hash with position as key and bfr as value def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end end |
#bg_parent ⇒ Hash
Returns a hash of variant positions from bg_parent as keys and pileup info as values.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/cheripic/contig_pileups.rb', line 26 class ContigPileups include Enumerable extend Forwardable def_delegators :@mut_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store attr_accessor :id, :parent_hemi attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions # creates a ContigPileup object using fasta entry id # @param fasta [String] a contig id from fasta entry def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end # bulk pileups are compared and variant positions are selected # @return [Array<Hash>] variant positions are stored in hashes # for homozygous, heterozygous and hemi-variant positions def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end # mut_bulk and bg_bulk pileups are compared at selected position of the contig. # Empty hash results from position below selected coverage # or bases freq below noise and such positions are deleted. # @param pos [Integer] position in the contig # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end # Method to extract var_mode and allele fraction from pileup information at a position in contig # # @param pileup_info [Pileup] pileup object # @return [Symbol] variant mode from pileup position (:hom or :het) at the position # @return [Float] allele fraction at the position def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end # Categorizes variant zygosity based on the allele fraction provided. # Uses lower and upper limit set for heterozygosity in the options. # @note consider increasing the range of heterozygosity limits for RNA-seq data # @param fraction [Float] allele fraction # @return [Symbol] of either :het or :hom to represent heterozygous or homozygous respectively def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end # Simple comparison of variant type of mut and bg bulks at a position # If both bulks have homozygous variant at selected position then it is ignored # @param muttype [Symbol] values are either :hom or :het # @param bgtype [Symbol] values are either :hom or :het # @return [Symbol] variant mode of the mut bulk (:hom or :het) at the position or nil def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end # method stores pos as key and allele fraction as value # to @hm_pos or @ht_pos hash based on variant type # @param var_type [Symbol] values are either :hom or :het # @param pos [Integer] position in the contig # @param ratio [Float] allele fraction def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end # Compares parental pileups for the contig and identify position # that indicate variants from homeologues called hemi-snps # and calculates bulk frequency ratio (bfr) # @return [Hash] parent_hemi hash with position as key and bfr as value def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end end |
#id ⇒ String
Returns id of the contig in assembly taken from fasta file.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/cheripic/contig_pileups.rb', line 26 class ContigPileups include Enumerable extend Forwardable def_delegators :@mut_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store attr_accessor :id, :parent_hemi attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions # creates a ContigPileup object using fasta entry id # @param fasta [String] a contig id from fasta entry def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end # bulk pileups are compared and variant positions are selected # @return [Array<Hash>] variant positions are stored in hashes # for homozygous, heterozygous and hemi-variant positions def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end # mut_bulk and bg_bulk pileups are compared at selected position of the contig. # Empty hash results from position below selected coverage # or bases freq below noise and such positions are deleted. # @param pos [Integer] position in the contig # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end # Method to extract var_mode and allele fraction from pileup information at a position in contig # # @param pileup_info [Pileup] pileup object # @return [Symbol] variant mode from pileup position (:hom or :het) at the position # @return [Float] allele fraction at the position def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end # Categorizes variant zygosity based on the allele fraction provided. # Uses lower and upper limit set for heterozygosity in the options. # @note consider increasing the range of heterozygosity limits for RNA-seq data # @param fraction [Float] allele fraction # @return [Symbol] of either :het or :hom to represent heterozygous or homozygous respectively def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end # Simple comparison of variant type of mut and bg bulks at a position # If both bulks have homozygous variant at selected position then it is ignored # @param muttype [Symbol] values are either :hom or :het # @param bgtype [Symbol] values are either :hom or :het # @return [Symbol] variant mode of the mut bulk (:hom or :het) at the position or nil def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end # method stores pos as key and allele fraction as value # to @hm_pos or @ht_pos hash based on variant type # @param var_type [Symbol] values are either :hom or :het # @param pos [Integer] position in the contig # @param ratio [Float] allele fraction def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end # Compares parental pileups for the contig and identify position # that indicate variants from homeologues called hemi-snps # and calculates bulk frequency ratio (bfr) # @return [Hash] parent_hemi hash with position as key and bfr as value def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end end |
#masked_regions ⇒ Object
Returns the value of attribute masked_regions.
35 36 37 |
# File 'lib/cheripic/contig_pileups.rb', line 35 def masked_regions @masked_regions end |
#mut_bulk ⇒ Hash
Returns a hash of variant positions from mut_bulk as keys and pileup info as values.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/cheripic/contig_pileups.rb', line 26 class ContigPileups include Enumerable extend Forwardable def_delegators :@mut_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store attr_accessor :id, :parent_hemi attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions # creates a ContigPileup object using fasta entry id # @param fasta [String] a contig id from fasta entry def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end # bulk pileups are compared and variant positions are selected # @return [Array<Hash>] variant positions are stored in hashes # for homozygous, heterozygous and hemi-variant positions def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end # mut_bulk and bg_bulk pileups are compared at selected position of the contig. # Empty hash results from position below selected coverage # or bases freq below noise and such positions are deleted. # @param pos [Integer] position in the contig # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end # Method to extract var_mode and allele fraction from pileup information at a position in contig # # @param pileup_info [Pileup] pileup object # @return [Symbol] variant mode from pileup position (:hom or :het) at the position # @return [Float] allele fraction at the position def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end # Categorizes variant zygosity based on the allele fraction provided. # Uses lower and upper limit set for heterozygosity in the options. # @note consider increasing the range of heterozygosity limits for RNA-seq data # @param fraction [Float] allele fraction # @return [Symbol] of either :het or :hom to represent heterozygous or homozygous respectively def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end # Simple comparison of variant type of mut and bg bulks at a position # If both bulks have homozygous variant at selected position then it is ignored # @param muttype [Symbol] values are either :hom or :het # @param bgtype [Symbol] values are either :hom or :het # @return [Symbol] variant mode of the mut bulk (:hom or :het) at the position or nil def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end # method stores pos as key and allele fraction as value # to @hm_pos or @ht_pos hash based on variant type # @param var_type [Symbol] values are either :hom or :het # @param pos [Integer] position in the contig # @param ratio [Float] allele fraction def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end # Compares parental pileups for the contig and identify position # that indicate variants from homeologues called hemi-snps # and calculates bulk frequency ratio (bfr) # @return [Hash] parent_hemi hash with position as key and bfr as value def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end end |
#mut_parent ⇒ Hash
Returns a hash of variant positions from mut_parent as keys and pileup info as values.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/cheripic/contig_pileups.rb', line 26 class ContigPileups include Enumerable extend Forwardable def_delegators :@mut_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store attr_accessor :id, :parent_hemi attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions # creates a ContigPileup object using fasta entry id # @param fasta [String] a contig id from fasta entry def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end # bulk pileups are compared and variant positions are selected # @return [Array<Hash>] variant positions are stored in hashes # for homozygous, heterozygous and hemi-variant positions def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end # mut_bulk and bg_bulk pileups are compared at selected position of the contig. # Empty hash results from position below selected coverage # or bases freq below noise and such positions are deleted. # @param pos [Integer] position in the contig # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end # Method to extract var_mode and allele fraction from pileup information at a position in contig # # @param pileup_info [Pileup] pileup object # @return [Symbol] variant mode from pileup position (:hom or :het) at the position # @return [Float] allele fraction at the position def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end # Categorizes variant zygosity based on the allele fraction provided. # Uses lower and upper limit set for heterozygosity in the options. # @note consider increasing the range of heterozygosity limits for RNA-seq data # @param fraction [Float] allele fraction # @return [Symbol] of either :het or :hom to represent heterozygous or homozygous respectively def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end # Simple comparison of variant type of mut and bg bulks at a position # If both bulks have homozygous variant at selected position then it is ignored # @param muttype [Symbol] values are either :hom or :het # @param bgtype [Symbol] values are either :hom or :het # @return [Symbol] variant mode of the mut bulk (:hom or :het) at the position or nil def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end # method stores pos as key and allele fraction as value # to @hm_pos or @ht_pos hash based on variant type # @param var_type [Symbol] values are either :hom or :het # @param pos [Integer] position in the contig # @param ratio [Float] allele fraction def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end # Compares parental pileups for the contig and identify position # that indicate variants from homeologues called hemi-snps # and calculates bulk frequency ratio (bfr) # @return [Hash] parent_hemi hash with position as key and bfr as value def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end end |
#parent_hemi ⇒ Hash
Returns a hash of hemi-variant positions as keys and bfr calculated from parent bulks as values.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/cheripic/contig_pileups.rb', line 26 class ContigPileups include Enumerable extend Forwardable def_delegators :@mut_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_bulk, :each, :each_key, :each_value, :length, :[], :store def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store attr_accessor :id, :parent_hemi attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions # creates a ContigPileup object using fasta entry id # @param fasta [String] a contig id from fasta entry def initialize (fasta) @id = fasta @mut_bulk = {} @bg_bulk = {} @mut_parent = {} @bg_parent = {} @parent_hemi = {} @masked_regions = Hash.new { |h,k| h[k] = {} } @hm_pos = {} @ht_pos = {} @hemi_pos = {} end # bulk pileups are compared and variant positions are selected # @return [Array<Hash>] variant positions are stored in hashes # for homozygous, heterozygous and hemi-variant positions def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end # mut_bulk and bg_bulk pileups are compared at selected position of the contig. # Empty hash results from position below selected coverage # or bases freq below noise and such positions are deleted. # @param pos [Integer] position in the contig # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end # Method to extract var_mode and allele fraction from pileup information at a position in contig # # @param pileup_info [Pileup] pileup object # @return [Symbol] variant mode from pileup position (:hom or :het) at the position # @return [Float] allele fraction at the position def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end # Categorizes variant zygosity based on the allele fraction provided. # Uses lower and upper limit set for heterozygosity in the options. # @note consider increasing the range of heterozygosity limits for RNA-seq data # @param fraction [Float] allele fraction # @return [Symbol] of either :het or :hom to represent heterozygous or homozygous respectively def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end # Simple comparison of variant type of mut and bg bulks at a position # If both bulks have homozygous variant at selected position then it is ignored # @param muttype [Symbol] values are either :hom or :het # @param bgtype [Symbol] values are either :hom or :het # @return [Symbol] variant mode of the mut bulk (:hom or :het) at the position or nil def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end # method stores pos as key and allele fraction as value # to @hm_pos or @ht_pos hash based on variant type # @param var_type [Symbol] values are either :hom or :het # @param pos [Integer] position in the contig # @param ratio [Float] allele fraction def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end # Compares parental pileups for the contig and identify position # that indicate variants from homeologues called hemi-snps # and calculates bulk frequency ratio (bfr) # @return [Hash] parent_hemi hash with position as key and bfr as value def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end end |
Instance Method Details
#bulks_compared ⇒ Array<Hash>
bulk pileups are compared and variant positions are selected for homozygous, heterozygous and hemi-variant positions
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/cheripic/contig_pileups.rb', line 55 def bulks_compared @mut_bulk.each_key do | pos | ignore = 0 unless @masked_regions.empty? @masked_regions.each_key do | index | if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end]) ignore = 1 logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}" end end end next if ignore == 1 if Options.polyploidy and @parent_hemi.key?(pos) bg_bases = '' if @bg_bulk.key?(pos) bg_bases = @bg_bulk[pos].var_base_frac end mut_bases = @mut_bulk[pos].var_base_frac bfr = Bfr.get_bfr(mut_bases, bg_bases) @hemi_pos[pos] = bfr else self.compare_pileup(pos) end end [@hm_pos, @ht_pos, @hemi_pos] end |
#categorise_pos(var_type, pos, ratio) ⇒ Object
method stores pos as key and allele fraction as value to @hm_pos or @ht_pos hash based on variant type
154 155 156 157 158 159 160 |
# File 'lib/cheripic/contig_pileups.rb', line 154 def categorise_pos(var_type, pos, ratio) if var_type == :hom @hm_pos[pos] = ratio elsif var_type == :het @ht_pos[pos] = ratio end end |
#compare_pileup(pos) ⇒ Object
mut_bulk and bg_bulk pileups are compared at selected position of the contig. Empty hash results from position below selected coverage or bases freq below noise and such positions are deleted. stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes
87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/cheripic/contig_pileups.rb', line 87 def compare_pileup(pos) mut_type, fraction = var_mode_fraction(@mut_bulk[pos]) return nil if mut_type.nil? if @bg_bulk.key?(pos) bg_type = var_mode_fraction(@bg_bulk[pos])[0] mut_type = compare_var_type(mut_type, bg_type) end unless mut_type.nil? categorise_pos(mut_type, pos, fraction) end end |
#compare_var_type(muttype, bgtype) ⇒ Symbol
Simple comparison of variant type of mut and bg bulks at a position If both bulks have homozygous variant at selected position then it is ignored
141 142 143 144 145 146 147 |
# File 'lib/cheripic/contig_pileups.rb', line 141 def compare_var_type(muttype, bgtype) if muttype == :hom and bgtype == :hom nil else muttype end end |
#hemisnps_in_parent ⇒ Hash
Compares parental pileups for the contig and identify position that indicate variants from homeologues called hemi-snps and calculates bulk frequency ratio (bfr)
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/cheripic/contig_pileups.rb', line 166 def hemisnps_in_parent # mark all the hemi snp based on both parents @mut_parent.each_key do |pos| mut_parent_frac = @mut_parent[pos].var_base_frac if @bg_parent.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac) @parent_hemi[pos] = bfr @bg_parent.delete(pos) else bfr = Bfr.get_bfr(mut_parent_frac) @parent_hemi[pos] = bfr end end # now include all hemi snp unique to background parent @bg_parent.each_key do |pos| unless @parent_hemi.key?(pos) bg_parent_frac = @bg_parent[pos].var_base_frac bfr = Bfr.get_bfr(bg_parent_frac) @parent_hemi[pos] = bfr end end end |
#var_mode(fraction) ⇒ Symbol
consider increasing the range of heterozygosity limits for RNA-seq data
Categorizes variant zygosity based on the allele fraction provided. Uses lower and upper limit set for heterozygosity in the options.
124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/cheripic/contig_pileups.rb', line 124 def var_mode(fraction) ht_low = Options.htlow ht_high = Options.hthigh mode = '' if fraction.between?(ht_low, ht_high) mode = :het elsif fraction > ht_high mode = :hom end mode end |
#var_mode_fraction(pileup_info) ⇒ Symbol, Float
Method to extract var_mode and allele fraction from pileup information at a position in contig
105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/cheripic/contig_pileups.rb', line 105 def var_mode_fraction(pileup_info) base_frac_hash = pileup_info.var_base_frac base_frac_hash.delete(:ref) return [nil, nil] if base_frac_hash.empty? # we could ignore complex loci or # take the variant type based on predominant base if base_frac_hash.length > 1 fraction = base_frac_hash.values.max else fraction = base_frac_hash[base_frac_hash.keys[0]] end [var_mode(fraction), fraction] end |