Class: Statsample::FormulaWrapper

Inherits:
Object
  • Object
show all
Defined in:
lib/statsample/formula/formula.rb

Overview

This class recognizes what terms are numeric and accordingly forms groups which are fed to Formula Once they are parsed with Formula, they are combined back

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(formula, df) ⇒ FormulaWrapper

Note:

Specify 0 as a term in the formula if you do not want constant to be included in the parsed formula

Initializes formula wrapper object to parse a given formula into some tokens which do not overlap one another.

Examples:

df = Daru::DataFrame.from_csv 'spec/data/df.csv'
df.to_category 'c', 'd', 'e'
formula = Statsample::GLM::FormulaWrapper.new 'y~a+d:c', df
formula.canonical_to_s
#=> "1+c(-)+d(-):c+a"

Parameters:

  • formula (string)

    to parse

  • df (Daru::DataFrame)

    dataframe requried to know what vectors are numerical



21
22
23
24
25
26
27
28
29
# File 'lib/statsample/formula/formula.rb', line 21

def initialize(formula, df)
  @df = df
  # @y store the LHS term that is name of vector to be predicted
  # @tokens store the RHS terms of the formula
  @y, *@tokens = split_to_tokens(formula)
  @tokens = @tokens.uniq.sort
  manage_constant_term
  @canonical_tokens = non_redundant_tokens
end

Instance Attribute Details

#canonical_tokensObject (readonly)

Returns the value of attribute canonical_tokens.



6
7
8
# File 'lib/statsample/formula/formula.rb', line 6

def canonical_tokens
  @canonical_tokens
end

#tokensObject (readonly)

Returns the value of attribute tokens.



6
7
8
# File 'lib/statsample/formula/formula.rb', line 6

def tokens
  @tokens
end

#yObject (readonly)

Returns the value of attribute y.



6
7
8
# File 'lib/statsample/formula/formula.rb', line 6

def y
  @y
end

Instance Method Details

#canonical_to_sString

Note:

‘y~a+b(-)’ means ‘a’ exist in full rank expansion and ‘b(-)’ exist in reduced rank expansion

Returns canonical tokens in a readable form.

Examples:

df = Daru::DataFrame.from_csv 'spec/data/df.csv'
df.to_category 'c', 'd', 'e'
formula = Statsample::GLM::FormulaWrapper.new 'y~a+d:c', df
formula.canonical_to_s
#=> "1+c(-)+d(-):c+a"

Returns:

  • (String)

    canonical tokens in a readable form.



41
42
43
# File 'lib/statsample/formula/formula.rb', line 41

def canonical_to_s
  canonical_tokens.join '+'
end

#non_redundant_tokensArray

Returns tokens to produce non-redundant design matrix

Returns:

  • (Array)

    array of tokens that do not produce redundant matrix



47
48
49
50
51
52
53
54
# File 'lib/statsample/formula/formula.rb', line 47

def non_redundant_tokens
  groups = split_to_groups
  # TODO: An enhancement
  # Right now x:c appears as c:x
  groups.each { |k, v| groups[k] = strip_numeric v, k }
  groups.each { |k, v| groups[k] = Formula.new(v).canonical_tokens }
  groups.flat_map { |k, v| add_numeric v, k }
end