Method: Statsample::Codification.create_hash
- Defined in:
- lib/statsample/codification.rb
.create_hash(dataset, vectors, sep = Statsample::SPLIT_TOKEN) ⇒ Object
Create a hash, based on vectors, to create the dictionary. The keys will be vectors name on dataset and the values will be hashes, with keys = values, for recodification
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/statsample/codification.rb', line 35 def create_hash(dataset, vectors, sep=Statsample::SPLIT_TOKEN) raise ArgumentError,"Array should't be empty" if vectors.size==0 pro_hash = vectors.inject({}) do |h,v_name| v_name = v_name.is_a?(Numeric) ? v_name : v_name.to_sym raise Exception, "Vector #{v_name} doesn't exists on Dataset" if !dataset.vectors.include?(v_name) v = dataset[v_name] split_data = v.splitted(sep) .flatten .collect { |c| c.to_s } .find_all{ |c| !c.nil? } factors = split_data.uniq .compact .sort .inject({}) { |ac,val| ac[val] = val; ac } h[v_name] = factors h end pro_hash end |