Method: Statsample::Codification.create_hash

Defined in:
lib/statsample/codification.rb

.create_hash(dataset, vectors, sep = Statsample::SPLIT_TOKEN) ⇒ Object

Create a hash, based on vectors, to create the dictionary. The keys will be vectors name on dataset and the values will be hashes, with keys = values, for recodification

Raises:

  • (ArgumentError)

35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/statsample/codification.rb', line 35

def create_hash(dataset, vectors, sep=Statsample::SPLIT_TOKEN)
  raise ArgumentError,"Array should't be empty" if vectors.size==0
  pro_hash = vectors.inject({}) do |h,v_name|
    v_name = v_name.is_a?(Numeric) ? v_name : v_name.to_sym
    raise Exception, "Vector #{v_name} doesn't exists on Dataset" if 
      !dataset.vectors.include?(v_name)
    v = dataset[v_name]
    split_data = v.splitted(sep)
                  .flatten
                  .collect { |c| c.to_s  }
                  .find_all{ |c| !c.nil? }

    factors   = split_data.uniq
                          .compact
                          .sort
                          .inject({}) { |ac,val| ac[val] = val; ac }
    h[v_name] = factors
    h
  end

  pro_hash
end