Module: Daru::Category
- Defined in:
- lib/daru/category.rb
Overview
rubocop:disable Metrics/ModuleLength
Constant Summary collapse
- CODING_SCHEMES =
[:dummy, :deviation, :helmert, :simple].freeze
Instance Attribute Summary collapse
-
#array ⇒ Object
readonly
For debuggin.
-
#base_category ⇒ Object
Returns the value of attribute base_category.
-
#cat_hash ⇒ Object
readonly
For debuggin.
-
#coding_scheme ⇒ Object
Returns the value of attribute coding_scheme.
-
#index ⇒ Object
Returns the value of attribute index.
-
#map_int_cat ⇒ Object
readonly
For debuggin.
-
#name ⇒ Object
Returns the value of attribute name.
Instance Method Summary collapse
-
#==(other) ⇒ Object
Two categorical vectors are equal if their index and corresponding values are same return [true, false] true if two vectors are similar.
-
#[](*indexes) ⇒ Object
Returns vector for indexes/positions specified.
-
#[]=(*indexes, val) ⇒ Object
Modifies values at specified indexes/positions.
-
#add_category(*new_categories) ⇒ Object
Associates a category to the vector.
-
#at(*positions) ⇒ Object
Returns vector for positions specified.
-
#categories ⇒ Array
(also: #order)
Returns all the categories with the inherent order.
-
#categories=(cat_with_order) ⇒ Object
Sets order of the categories.
-
#contrast_code(opts = {}) ⇒ Daru::DataFrame
Contrast code the vector acording to the coding scheme set.
-
#count(category) ⇒ Object
Returns frequency of given category.
-
#count_values(*values) ⇒ Integer
Count the number of values specified.
-
#describe ⇒ Daru::Vector
Gives the summary of data using following parameters - size: size of the data - categories: total number of categories - max_freq: Max no of times a category occurs - max_category: The category which occurs max no of times - min_freq: Min no of times a category occurs - min_category: The category which occurs min no of times.
-
#dup ⇒ Daru::Vector
Duplicated a vector.
-
#each ⇒ Enumerator
Returns an enumerator that enumerates on categorical data.
-
#frequencies(type = :count) ⇒ Daru::Vector
Returns a vector storing count/frequency of each category.
-
#include_values?(*values) ⇒ true, false
Check if any one of mentioned values occur in the vector.
-
#indexes(*values) ⇒ Array
Return indexes of values specified.
-
#initialize_category(data, opts = {}) ⇒ Object
Initializes a vector to store categorical data.
-
#max ⇒ object
Returns the maximum category acording to the order specified.
-
#min ⇒ object
Returns the minimum category acording to the order specified.
-
#ordered=(bool) ⇒ Object
Make categorical data ordered or unordered.
-
#ordered? ⇒ Boolean
Tells whether vector is ordered or not.
- #plotting_library=(lib) ⇒ Object
- #positions(*values) ⇒ Object
-
#reindex!(idx) ⇒ Daru::Vector
Sets new index for vector.
-
#reject_values(*values) ⇒ Daru::Vector
Return a vector with specified values removed.
-
#remove_unused_categories ⇒ Daru::Vector
Removes the unused categories.
-
#rename_categories(old_to_new) ⇒ Object
Rename categories.
-
#reorder!(order) ⇒ Object
Reorder the vector with given positions.
-
#replace_values(old_values, new_value) ⇒ Daru::Vector
Replaces specified values with a new value.
-
#set_at(positions, val) ⇒ Object
Modifies values at specified positions.
-
#size ⇒ Object
Size of categorical data.
- #sort ⇒ Object
-
#sort! ⇒ Daru::Vector
Sorts the vector in the order specified.
-
#to_a ⇒ Array
Returns all categorical data.
-
#to_category ⇒ Daru::Vector
Does nothing since its already of type category.
-
#to_ints ⇒ Array
Returns integer coding for categorical data in the order starting from 0.
-
#to_non_category ⇒ Daru::Vector
Converts a category type vector to non category type vector.
-
#where(bool_array) ⇒ Daru::Vector
For querying the data.
Instance Attribute Details
#array ⇒ Object (readonly)
For debuggin. To be removed
7 8 9 |
# File 'lib/daru/category.rb', line 7 def array @array end |
#base_category ⇒ Object
Returns the value of attribute base_category.
3 4 5 |
# File 'lib/daru/category.rb', line 3 def base_category @base_category end |
#cat_hash ⇒ Object (readonly)
For debuggin. To be removed
7 8 9 |
# File 'lib/daru/category.rb', line 7 def cat_hash @cat_hash end |
#coding_scheme ⇒ Object
Returns the value of attribute coding_scheme.
4 5 6 |
# File 'lib/daru/category.rb', line 4 def coding_scheme @coding_scheme end |
#index ⇒ Object
Returns the value of attribute index.
4 5 6 |
# File 'lib/daru/category.rb', line 4 def index @index end |
#map_int_cat ⇒ Object (readonly)
For debuggin. To be removed
7 8 9 |
# File 'lib/daru/category.rb', line 7 def map_int_cat @map_int_cat end |
#name ⇒ Object
Returns the value of attribute name.
4 5 6 |
# File 'lib/daru/category.rb', line 4 def name @name end |
Instance Method Details
#==(other) ⇒ Object
Two categorical vectors are equal if their index and corresponding values are same return [true, false] true if two vectors are similar
497 498 499 500 501 |
# File 'lib/daru/category.rb', line 497 def == other size == other.size && to_a == other.to_a && index == other.index end |
#[](*indexes) ⇒ Object
Since it accepts both indexes and postions. In case of collision, arguement will be treated as index
Returns vector for indexes/positions specified
184 185 186 187 188 189 190 191 192 193 194 |
# File 'lib/daru/category.rb', line 184 def [] *indexes positions = @index.pos(*indexes) return category_from_position(positions) if positions.is_a? Integer Daru::Vector.new positions.map { |pos| category_from_position pos }, index: @index.subset(*indexes), name: @name, type: :category, ordered: @ordered, categories: categories end |
#[]=(*indexes, val) ⇒ Object
In order to add a new category you need to associate it via #add_category
Modifies values at specified indexes/positions.
238 239 240 241 242 243 244 245 246 247 |
# File 'lib/daru/category.rb', line 238 def []= *indexes, val positions = @index.pos(*indexes) if positions.is_a? Numeric modify_category_at positions, val else positions.each { |pos| modify_category_at pos, val } end self end |
#add_category(*new_categories) ⇒ Object
Associates a category to the vector.
123 124 125 126 |
# File 'lib/daru/category.rb', line 123 def add_category(*new_categories) new_categories -= categories add_extra_categories new_categories end |
#at(*positions) ⇒ Object
Returns vector for positions specified.
207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
# File 'lib/daru/category.rb', line 207 def at *positions original_positions = positions positions = coerce_positions(*positions) validate_positions(*positions) return category_from_position(positions) if positions.is_a? Integer Daru::Vector.new positions.map { |pos| category_from_position(pos) }, index: @index.at(*original_positions), name: @name, type: :category, ordered: @ordered, categories: categories end |
#categories ⇒ Array Also known as: order
Returns all the categories with the inherent order
308 309 310 |
# File 'lib/daru/category.rb', line 308 def categories @cat_hash.keys end |
#categories=(cat_with_order) ⇒ Object
If extra categories are specified, they get added too.
Sets order of the categories.
322 323 324 325 326 |
# File 'lib/daru/category.rb', line 322 def categories= cat_with_order validate_categories(cat_with_order) add_extra_categories(cat_with_order - categories) order_with cat_with_order end |
#contrast_code(opts = {}) ⇒ Daru::DataFrame
To set the coding scheme use #coding_scheme=
Contrast code the vector acording to the coding scheme set.
479 480 481 482 483 484 485 486 |
# File 'lib/daru/category.rb', line 479 def contrast_code opts={} if opts[:user_defined] user_defined_coding(opts[:user_defined]) else # TODO: Make various coding schemes code DRY send("#{coding_scheme}_coding".to_sym, opts[:full] || false) end end |
#count(category) ⇒ Object
Returns frequency of given category
135 136 137 138 139 140 |
# File 'lib/daru/category.rb', line 135 def count category raise ArgumentError, "Invalid category #{category}" unless categories.include?(category) @cat_hash[category].size end |
#count_values(*values) ⇒ Integer
Count the number of values specified
695 696 697 698 699 |
# File 'lib/daru/category.rb', line 695 def count_values(*values) values.map { |v| @cat_hash[v].size if @cat_hash.include? v } .compact .inject(0, :+) end |
#describe ⇒ Daru::Vector
Gives the summary of data using following parameters
-
size: size of the data
-
categories: total number of categories
-
max_freq: Max no of times a category occurs
-
max_category: The category which occurs max no of times
-
min_freq: Min no of times a category occurs
-
min_category: The category which occurs min no of times
618 619 620 621 622 623 624 625 626 627 |
# File 'lib/daru/category.rb', line 618 def describe Daru::Vector.new( size: size, categories: categories.size, max_freq: @cat_hash.values.map(&:size).max, max_category: @cat_hash.keys.max_by { |cat| @cat_hash[cat].size }, min_freq: @cat_hash.values.map(&:size).min, min_category: @cat_hash.keys.min_by { |cat| @cat_hash[cat].size } ) end |
#dup ⇒ Daru::Vector
Duplicated a vector
107 108 109 110 111 112 113 114 |
# File 'lib/daru/category.rb', line 107 def dup Daru::Vector.new to_a.dup, name: @name, index: @index.dup, type: :category, categories: categories, ordered: ordered? end |
#each ⇒ Enumerator
Returns an enumerator that enumerates on categorical data
80 81 82 83 84 |
# File 'lib/daru/category.rb', line 80 def each return enum_for(:each) unless block_given? @array.each { |pos| yield cat_from_int pos } self end |
#frequencies(type = :count) ⇒ Daru::Vector
Returns a vector storing count/frequency of each category
153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/daru/category.rb', line 153 def frequencies type=:count counts = @cat_hash.values.map(&:size) values = case type when :count counts when :fraction counts.map { |c| c / size.to_f } when :percentage counts.map { |c| c / size.to_f * 100 } end Daru::Vector.new values, index: categories, name: name end |
#include_values?(*values) ⇒ true, false
Check if any one of mentioned values occur in the vector
665 666 667 |
# File 'lib/daru/category.rb', line 665 def include_values?(*values) values.any? { |v| @cat_hash.include?(v) && !@cat_hash[v].empty? } end |
#indexes(*values) ⇒ Array
Return indexes of values specified
708 709 710 711 |
# File 'lib/daru/category.rb', line 708 def indexes(*values) values &= categories index.to_a.values_at(*values.flat_map { |v| @cat_hash[v] }.sort) end |
#initialize_category(data, opts = {}) ⇒ Object
Base category is set to the first category encountered in the vector.
Initializes a vector to store categorical data.
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/daru/category.rb', line 29 def initialize_category data, opts={} @type = :category initialize_core_attributes data if opts[:categories] validate_categories(opts[:categories]) add_extra_categories(opts[:categories] - categories) order_with opts[:categories] end # Specify if the categories are ordered or not. # By default its unordered @ordered = opts[:ordered] || false # The coding scheme to code with. Default is dummy coding. @coding_scheme = :dummy # Base category which won't be present in the coding @base_category = @cat_hash.keys.first # Stores the name of the vector @name = opts[:name] # Index of the vector @index = coerce_index opts[:index] self end |
#max ⇒ object
This operation will only work if vector is ordered. To set the vector ordered do ‘vector.ordered = true`
Returns the maximum category acording to the order specified.
402 403 404 405 |
# File 'lib/daru/category.rb', line 402 def max assert_ordered :max categories.last end |
#min ⇒ object
This operation will only work if vector is ordered. To set the vector ordered do ‘vector.ordered = true`
Returns the minimum category acording to the order specified.
388 389 390 391 |
# File 'lib/daru/category.rb', line 388 def min assert_ordered :min categories.first end |
#ordered=(bool) ⇒ Object
Make categorical data ordered or unordered.
296 297 298 |
# File 'lib/daru/category.rb', line 296 def ordered= bool @ordered = bool end |
#ordered? ⇒ Boolean
Tells whether vector is ordered or not.
285 286 287 |
# File 'lib/daru/category.rb', line 285 def ordered? @ordered end |
#plotting_library=(lib) ⇒ Object
63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/daru/category.rb', line 63 def plotting_library= lib case lib when :gruff, :nyaplot @plotting_library = lib extend Module.const_get( "Daru::Plotting::Category::#{lib.to_s.capitalize}Library" ) if Daru.send("has_#{lib}?".to_sym) else raise ArguementError, "Plotting library #{lib} not supported. "\ 'Supported libraries are :nyaplot and :gruff' end end |
#positions(*values) ⇒ Object
736 737 738 739 |
# File 'lib/daru/category.rb', line 736 def positions(*values) values &= categories values.flat_map { |v| @cat_hash[v] }.sort end |
#reindex!(idx) ⇒ Daru::Vector
Unlike #reorder! which takes positions as input it takes index as an input to reorder the vector
Sets new index for vector. Preserves index->value correspondence.
550 551 552 553 554 555 556 557 558 559 560 561 |
# File 'lib/daru/category.rb', line 550 def reindex! idx idx = Daru::Index.new idx unless idx.is_a? Daru::Index raise ArgumentError, 'Invalid index specified' unless idx.to_a.sort == index.to_a.sort old_categories = categories data = idx.map { |i| self[i] } initialize_core_attributes data self.categories = old_categories self.index = idx self end |
#reject_values(*values) ⇒ Daru::Vector
Return a vector with specified values removed
678 679 680 681 682 683 684 685 686 |
# File 'lib/daru/category.rb', line 678 def reject_values(*values) resultant_pos = size.times.to_a - values.flat_map { |v| @cat_hash[v] } dv = at(*resultant_pos) unless dv.is_a? Daru::Vector pos = resultant_pos.first dv = at(pos..pos) end dv.remove_unused_categories end |
#remove_unused_categories ⇒ Daru::Vector
If base category is removed, then the first occuring category in the data is taken as base category. Order of the undeleted categories remains preserved.
Removes the unused categories
369 370 371 372 373 374 375 376 377 |
# File 'lib/daru/category.rb', line 369 def remove_unused_categories old_categories = categories initialize_core_attributes to_a self.categories = old_categories & categories self.base_category = @cat_hash.keys.first unless categories.include? base_category self end |
#rename_categories(old_to_new) ⇒ Object
The order of categories after renaming is preserved but new categories are added at the end in the order. Also the base-category is reassigned to new value if it is renamed
Rename categories.
344 345 346 347 348 349 350 351 352 353 354 355 |
# File 'lib/daru/category.rb', line 344 def rename_categories old_to_new old_categories = categories data = to_a.map do |cat| old_to_new.include?(cat) ? old_to_new[cat] : cat end initialize_core_attributes data self.categories = (old_categories - old_to_new.keys) | old_to_new.values self.base_category = old_to_new[base_category] if old_to_new.include? base_category self end |
#reorder!(order) ⇒ Object
Unlike #reindex! which takes index as input, it takes positions as an input to reorder the vector
Reorder the vector with given positions
528 529 530 531 532 533 534 535 536 |
# File 'lib/daru/category.rb', line 528 def reorder! order raise ArgumentError, 'Invalid order specified' unless order.sort == size.times.to_a # TODO: Room for optimization old_data = to_a new_data = order.map { |i| old_data[i] } initialize_core_attributes new_data self end |
#replace_values(old_values, new_value) ⇒ Daru::Vector
It performs the replace in place.
Replaces specified values with a new value
730 731 732 733 734 |
# File 'lib/daru/category.rb', line 730 def replace_values old_values, new_value old_values = [old_values] unless old_values.is_a? Array rename_hash = old_values.map { |v| [v, new_value] }.to_h rename_categories rename_hash end |
#set_at(positions, val) ⇒ Object
Modifies values at specified positions.
263 264 265 266 267 |
# File 'lib/daru/category.rb', line 263 def set_at positions, val validate_positions(*positions) positions.map { |pos| modify_category_at pos, val } self end |
#size ⇒ Object
Size of categorical data.
275 276 277 |
# File 'lib/daru/category.rb', line 275 def size @array.size end |
#sort ⇒ Object
445 446 447 |
# File 'lib/daru/category.rb', line 445 def sort dup.sort! end |
#sort! ⇒ Daru::Vector
This operation will only work if vector is ordered. To set the vector ordered, do ‘vector.ordered = true`
Sorts the vector in the order specified.
422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 |
# File 'lib/daru/category.rb', line 422 def sort! # rubocop:disable Metrics/AbcSize # TODO: Simply the code assert_ordered :sort # Build sorted index old_index = @index.to_a new_index = @cat_hash.values.map do |positions| old_index.values_at(*positions) end.flatten @index = @index.class.new new_index # Build sorted data @cat_hash = categories.inject([{}, 0]) do |acc, cat| hash, count = acc cat_count = @cat_hash[cat].size cat_count.times { |i| @array[count+i] = int_from_cat(cat) } hash[cat] = (count...(cat_count+count)).to_a [hash, count + cat_count] end.first self end |
#to_a ⇒ Array
Returns all categorical data
92 93 94 |
# File 'lib/daru/category.rb', line 92 def to_a each.to_a end |
#to_category ⇒ Daru::Vector
Does nothing since its already of type category.
631 632 633 |
# File 'lib/daru/category.rb', line 631 def to_category self end |
#to_ints ⇒ Array
Returns integer coding for categorical data in the order starting from 0. For example if order is [:a, :b, :c], then :a, will be coded as 0, :b as 1 and :c as 2
512 513 514 |
# File 'lib/daru/category.rb', line 512 def to_ints @array end |
#to_non_category ⇒ Daru::Vector
Converts a category type vector to non category type vector
637 638 639 |
# File 'lib/daru/category.rb', line 637 def to_non_category Daru::Vector.new to_a, name: name, index: index end |
#where(bool_array) ⇒ Daru::Vector
For querying the data
595 596 597 |
# File 'lib/daru/category.rb', line 595 def where bool_array Daru::Core::Query.vector_where self, bool_array end |