Class: Bio::PAML::Codeml::Report
- Inherits:
-
Bio::PAML::Common::Report
- Object
- Bio::PAML::Common::Report
- Bio::PAML::Codeml::Report
- Defined in:
- lib/bio/appl/paml/codeml/report.rb
Overview
Description
Run PAML codeml and get the results from the output file. The Codeml::Report object is returned by Bio::PAML::Codeml.query. For example
codeml = Bio::PAML::Codeml.new('codeml', :runmode => 0,
:RateAncestor => 1, :alpha => 0.5, :fix_alpha => 0)
result = codeml.query(alignment, tree)
where alignment and tree are Bioruby objects. This class assumes we have a buffer containing the output of codeml.
References
Phylogenetic Analysis by Maximum Likelihood (PAML) is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. Suggestion citation
Yang, Z. 1997
PAML: a program package for phylogenetic analysis by maximum likelihood
CABIOS 13:555-556
abacus.gene.ucl.ac.uk/software/paml.html
Examples
– The following is not shown in the documentation
>> require 'bio'
>> require 'bio/test/biotestfile'
>> buf = BioTestFile.read('paml/codeml/models/results0-3.txt')
++
Invoke Bioruby’s PAML codeml parser, after having read the contents of the codeml result file into buf (for example using File.read)
>> c = Bio::PAML::Codeml::Report.new(buf)
Do we have two models?
>> c.models.size
=> 2
>> c.models[0].name
=> "M0"
>> c.models[1].name
=> "M3"
Check the general information
>> c.num_sequences
=> 6
>> c.num_codons
=> 134
>> c.descr
=> "M0-3"
Test whether the second model M3 is significant over M0
>> c.significant
=> true
Now fetch the results of the first model M0, and check its values
>> m0 = c.models[0]
>> m0.tree_length
=> 1.90227
>> m0.lnL
=> -1125.800375
>> m0.omega
=> 0.58589
>> m0.dN_dS
=> 0.58589
>> m0.kappa
=> 2.14311
>> m0.alpha
=> nil
We also have a tree (as a string)
>> m0.tree
=> "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.400074): 0.000004, PITG_23257T0: 0.952614): 0.000004, PITG_23264T0: 0.445507): 0.000004, PITG_23267T0: 0.011814, PITG_23293T0: 0.092242);"
Check the M3 and its specific values
>> m3 = c.models[1]
>> m3.lnL
=> -1070.964046
>> m3.classes.size
=> 3
>> m3.classes[0]
=> {:w=>0.00928, :p=>0.56413}
And the tree
>> m3.tree
=> "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.762597): 0.000004, PITG_23257T0: 2.721710): 0.000004, PITG_23264T0: 0.924326): 0.014562, PITG_23267T0: 0.000004, PITG_23293T0: 0.237433);"
Next take the overall posterior analysis
>> c.nb_sites.size
=> 44
>> c.nb_sites[0].to_a
=> [17, "I", 0.988, 3.293]
or by field
>> codon = c.nb_sites[0]
>> codon.position
=> 17
>> codon.probability
=> 0.988
>> codon.dN_dS
=> 3.293
with aliases
>> codon.p
=> 0.988
>> codon.w
=> 3.293
Now we generate special string ‘graph’ for positive selection. The following returns a string the length of the input alignment and shows the locations of positive selection:
>> c.nb_sites.graph[0..32]
=> " ** * * *"
And with dN/dS (high values are still an asterisk *)
>> c.nb_sites.graph_omega[0..32]
=> " 3* 6 6 2"
We also provide the raw buffers to adhere to the principle of unexpected use. Test the raw buffers for content:
>> c.header.to_s =~ /seed/
=> 1
>> m0.to_s =~ /one-ratio/
=> 3
>> m3.to_s =~ /discrete/
=> 3
>> c.footer.to_s =~ /Bayes/
=> 16
Finally we do a test on an M7+M8 run. Again, after loading the results file into buf
–
>> buf78 = BioTestFile.read('paml/codeml/models/results7-8.txt')
++
Invoke Bioruby’s PAML codeml parser
>> c = Bio::PAML::Codeml::Report.new(buf78)
Do we have two models?
>> c.models.size
=> 2
>> c.models[0].name
=> "M7"
>> c.models[1].name
=> "M8"
Assert the results are significant
>> c.significant
=> true
Compared to M0/M3 there are some differences. The important ones are the parameters and the full Bayesian result available for M7/M8. This is the naive Bayesian result:
>> c.nb_sites.size
=> 10
And this is the full Bayesian result:
>> c.sites.size
=> 30
>> c.sites[0].to_a
=> [17, "I", 0.672, 2.847]
>> c.sites.graph[0..32]
=> " ** * * *"
Note the differences of omega with earlier M0-M3 naive Bayesian analysis:
>> c.sites.graph_omega[0..32]
=> " 24 3 3 2"
The locations are the same, but the omega differs.
Instance Attribute Summary collapse
-
#footer ⇒ Object
readonly
Returns the value of attribute footer.
-
#header ⇒ Object
readonly
Returns the value of attribute header.
-
#models ⇒ Object
readonly
Returns the value of attribute models.
Instance Method Summary collapse
-
#alpha ⇒ Object
compatibility call for older interface (single models only).
-
#descr ⇒ Object
Give a short description of the models, for example ‘M0-3’.
-
#initialize(buf) ⇒ Report
constructor
Parse codeml output file passed with
buf
, where buf contains the content of a codeml result file. -
#nb_sites ⇒ Object
Return a PositiveSites (naive empirical bayesian) object.
-
#num_codons ⇒ Object
Return the number of condons in the codeml alignment.
-
#num_sequences ⇒ Object
Return the number of sequences in the codeml alignment.
-
#significant ⇒ Object
If the number of models is two we can calculate whether the result is statistically significant, or not, at the 1% significance level.
-
#sites ⇒ Object
Return a PositiveSites Bayes Empirical Bayes (BEB) analysis.
-
#tree ⇒ Object
compatibility call for older interface (single models only).
-
#tree_length ⇒ Object
compatibility call for older interface (single models only).
-
#tree_log_likelihood ⇒ Object
compatibility call for older interface (single models only).
Constructor Details
#initialize(buf) ⇒ Report
Parse codeml output file passed with buf
, where buf contains the content of a codeml result file
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 222 def initialize buf # split the main buffer into sections for each model, header and footer. sections = buf.split("\nModel ") model_num = sections.size-1 raise ReportError,"Incorrect codeml data models=#{model_num}" if model_num > 2 foot2 = sections[model_num].split("\nNaive ") if foot2.size == 2 # We have a dual model sections[model_num] = foot2[0] @footer = 'Naive '+foot2[1] @models = [] sections[1..-1].each do | model_buf | @models.push Model.new(model_buf) end else # A single model is run sections = buf.split("\nTREE #") model_num = sections.size-1 raise ReportError,"Can not parse single model file" if model_num != 1 @models = [] @models.push sections[1] @footer = sections[1][/Time used/,1] @single = ReportSingle.new(buf) end @header = sections[0] end |
Instance Attribute Details
#footer ⇒ Object (readonly)
Returns the value of attribute footer.
218 219 220 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 218 def @footer end |
#header ⇒ Object (readonly)
Returns the value of attribute header.
218 219 220 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 218 def header @header end |
#models ⇒ Object (readonly)
Returns the value of attribute models.
218 219 220 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 218 def models @models end |
Instance Method Details
#alpha ⇒ Object
compatibility call for older interface (single models only)
325 326 327 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 325 def alpha @single.alpha end |
#descr ⇒ Object
Give a short description of the models, for example ‘M0-3’
250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 250 def descr num = @models.size case num when 0 'No model' when 1 @models[0].name else @models[0].name + '-' + @models[1].modelnum.to_s end end |
#nb_sites ⇒ Object
Return a PositiveSites (naive empirical bayesian) object
273 274 275 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 273 def nb_sites PositiveSites.new("Naive Empirical Bayes (NEB)",@footer,num_codons) end |
#num_codons ⇒ Object
Return the number of condons in the codeml alignment
263 264 265 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 263 def num_codons @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[5].to_i/3 end |
#num_sequences ⇒ Object
Return the number of sequences in the codeml alignment
268 269 270 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 268 def num_sequences @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[4].to_i end |
#significant ⇒ Object
If the number of models is two we can calculate whether the result is statistically significant, or not, at the 1% significance level. For example, for M7-8 the LRT statistic, or twice the log likelihood difference between the two compared models, may be compared against chi-square, with critical value 9.21 at the 1% significance level.
Here we support a few likely combinations, M0-3, M1-2 and M7-8, used most often in literature. For other combinations, or a different significance level, you’ll have to calculate chi-square yourself.
Returns true or false. If no result is calculated this method raises an error
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 294 def significant raise ReportError,"Wrong number of models #{@models.size}" if @models.size != 2 lnL1 = @models[0].lnL model1 = @models[0].modelnum lnL2 = @models[1].lnL model2 = @models[1].modelnum case [model1, model2] when [0,3] 2*(lnL2-lnL1) > 13.2767 # chi2: p=0.01, df=4 when [1,2] 2*(lnL2-lnL1) > 9.2103 # chi2: p=0.01, df=2 when [7,8] 2*(lnL2-lnL1) > 9.2103 # chi2: p=0.01, df=2 else raise ReportError,"Significance calculation for #{descr} not supported" end end |
#sites ⇒ Object
Return a PositiveSites Bayes Empirical Bayes (BEB) analysis
278 279 280 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 278 def sites PositiveSites.new("Bayes Empirical Bayes (BEB)",@footer,num_codons) end |
#tree ⇒ Object
compatibility call for older interface (single models only)
330 331 332 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 330 def tree @single.tree end |
#tree_length ⇒ Object
compatibility call for older interface (single models only)
320 321 322 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 320 def tree_length @single.tree_length end |
#tree_log_likelihood ⇒ Object
compatibility call for older interface (single models only)
315 316 317 |
# File 'lib/bio/appl/paml/codeml/report.rb', line 315 def tree_log_likelihood @single.tree_log_likelihood end |