Class: RedAmber::DataFrame
- Inherits:
-
Object
- Object
- RedAmber::DataFrame
- Includes:
- DataFrameCombinable, DataFrameDisplayable, DataFrameIndexable, DataFrameLoadSave, DataFrameReshaping, DataFrameSelectable, DataFrameVariableOperation, Helper
- Defined in:
- lib/red_amber/data_frame.rb
Overview
Class to represent a data frame. Variable @table holds an Arrow::Table object.
Instance Attribute Summary collapse
-
#table ⇒ Arrow::Table
(also: #to_arrow)
readonly
Returns the table having within.
Class Method Summary collapse
-
.create(table) ⇒ DataFrame
Quicker DataFrame constructor from a ‘Arrow::Table`.
-
.new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) ⇒ DataFrame
Return new DataFrame for specified schema and value.
Instance Method Summary collapse
-
#==(other) ⇒ true, false
Compare DataFrames.
-
#build_subframes(subset_specifier = nil, &block) ⇒ Object
Generic builder of sub-dataframes from self.
-
#each_row ⇒ Object
Enumerate for each row.
-
#empty? ⇒ true, false
Check if it is a empty DataFrame.
-
#group(*group_keys, &block) ⇒ Object
Create a Group object.
-
#initialize(*args) ⇒ DataFrame
constructor
Creates a new DataFrame.
-
#key?(key) ⇒ Boolean
(also: #has_key?)
Returns true if self has a specified key in the argument.
-
#key_index(key) ⇒ Integer
(also: #find_index, #index)
Returns index of specified key in the Array keys.
-
#keys ⇒ Array
(also: #column_names, #var_names)
Returns an Array of keys.
-
#method_missing(name, *args, &block) ⇒ Object
Catch variable (column) key as method name.
-
#n_keys ⇒ Integer
(also: #n_variables, #n_vars, #n_cols)
Returns the number of variables (columns).
-
#propagate(scalar = nil, &block) ⇒ Object
Returns a Vector such that all elements have value ‘scalar` and have same size as self.
-
#respond_to_missing?(name, include_private) ⇒ Boolean
Catch variable (column) key as method name.
-
#schema ⇒ Hash
Returns column name and data type in a Hash.
-
#shape ⇒ Array
Returns the numbers of rows and columns.
-
#size ⇒ Integer
(also: #n_records, #n_obs, #n_rows)
Returns the number of records (rows).
-
#sub_by_enum(enumerator_method, *args) ⇒ SubFrames
(also: #subframes_by_enum)
Create SubFrames by Grouping/Windowing by posion from a enumrator method.
-
#sub_by_kernel(kernel, step: 1) ⇒ SubFrames
(also: #subframes_by_kernel)
Create SubFrames by windowing with a kernel (i.e. masked window) and step.
-
#sub_by_value(*keys) ⇒ SubFrames
(also: #subframes_by_value, #sub_group)
Create SubFrames by value grouping.
-
#sub_by_window(from: 0, size: nil, step: 1) ⇒ SubFrames
(also: #subframes_by_window)
Create SubFrames by Windowing with ‘from`, `size` and `step`.
-
#to_a ⇒ Array
(also: #raw_records)
Returns a row-oriented array without header.
-
#to_h ⇒ Hash
Returns column-oriented data in a Hash.
-
#to_rover ⇒ Rover::DataFrame
Returns self in a ‘Rover::DataFrame`.
-
#type_classes ⇒ Array
Returns an Array of Classes of data type.
-
#types ⇒ Array
Returns abbreviated type names in an Array.
-
#variables ⇒ Hash
(also: #vars)
Returns a Hash of key and Vector pairs in the columns.
-
#vectors ⇒ Array
Returns Vectors in an Array.
Methods included from DataFrameVariableOperation
#assign, #assign_left, #drop, #pick, #rename
Methods included from DataFrameSelectable
#[], #filter, #first, #head, #last, #remove, #remove_nil, #sample, #shuffle, #slice, #slice_by, #tail, #take, #v
Methods included from DataFrameReshaping
#to_long, #to_wide, #transpose
Methods included from DataFrameLoadSave
Methods included from DataFrameIndexable
#indices, #sort, #sort_indices
Methods included from DataFrameDisplayable
#inspect, #shape_str, #summary, #tdr, #tdr_str, #tdra, #to_iruby, #to_s
Methods included from DataFrameCombinable
#anti_join, #concatenate, #difference, #full_join, #inner_join, #intersect, #join, #left_join, #merge, #right_join, #semi_join, #set_operable?, #union
Constructor Details
#initialize(hash) ⇒ DataFrame #initialize(table) ⇒ DataFrame #initialize(schama, row_oriented_array) ⇒ DataFrame #initialize(arrowable) ⇒ DataFrame #initialize(rover_like) ⇒ DataFrame #initialize ⇒ DataFrame #initialize(empty) ⇒ DataFrame
Creates a new DataFrame.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/red_amber/data_frame.rb', line 134 def initialize(*args) case args in nil | [nil] | [] | {} | [[]] | [{}] @table = Arrow::Table.new({}, []) in [Arrow::Table => table] @table = table in [arrowable] if arrowable.respond_to?(:to_arrow) table = arrowable.to_arrow unless table.is_a?(Arrow::Table) raise DataFrameTypeError, "to_arrow must return an Arrow::Table but #{table.class}: #{arrowable}" end @table = table in [rover_like] if rover_like.respond_to?(:to_h) begin # Accepts Rover::DataFrame @table = Arrow::Table.new(rover_like.to_h) rescue StandardError raise DataFrameTypeError, "to_h must return Arrowable object: #{rover_like}" end else begin @table = Arrow::Table.new(*args) rescue StandardError raise DataFrameTypeError, "invalid argument to create Arrow::Table: #{args}" end end name_unnamed_keys check_duplicate_keys(keys) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
Catch variable (column) key as method name.
775 776 777 778 779 |
# File 'lib/red_amber/data_frame.rb', line 775 def method_missing(name, *args, &block) return variables[name] if args.empty? && key?(name) super end |
Instance Attribute Details
#table ⇒ Arrow::Table (readonly) Also known as: to_arrow
Returns the table having within.
171 172 173 |
# File 'lib/red_amber/data_frame.rb', line 171 def table @table end |
Class Method Details
.create(table) ⇒ DataFrame
This method will allocate table directly and may be used in the method.
‘table` must have unique keys.
Quicker DataFrame constructor from a ‘Arrow::Table`.
31 32 33 34 35 |
# File 'lib/red_amber/data_frame.rb', line 31 def create(table) instance = allocate instance.instance_variable_set(:@table, table) instance end |
.new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) ⇒ DataFrame
Return new DataFrame for specified schema and value.
47 48 49 50 51 52 |
# File 'lib/red_amber/data_frame.rb', line 47 def new_dataframe_with_schema(dataframe_for_schema, dataframe_for_value) DataFrame.create( Arrow::Table.new(dataframe_for_schema.table.schema, dataframe_for_value.table.columns) ) end |
Instance Method Details
#==(other) ⇒ true, false
Compare DataFrames.
323 324 325 |
# File 'lib/red_amber/data_frame.rb', line 323 def ==(other) other.is_a?(DataFrame) && @table == other.table end |
#build_subframes(subset_specifier) ⇒ SubFrames #build_subframes {|self| ... } ⇒ Object
Generic builder of sub-dataframes from self.
- Experimental feature
-
this method may be removed or be changed in the future.
693 694 695 696 697 698 699 |
# File 'lib/red_amber/data_frame.rb', line 693 def build_subframes(subset_specifier = nil, &block) if block SubFrames.new(self, instance_eval(&block)) else SubFrames.new(self, subset_specifier) end end |
#each_row ⇒ Enumerator #each_row {|key_row_pairs| ... } ⇒ Integer
Enumerate for each row.
354 355 356 357 358 359 360 361 362 363 364 |
# File 'lib/red_amber/data_frame.rb', line 354 def each_row return enum_for(:each_row) unless block_given? size.times do |i| key_row_pairs = vectors.each_with_object({}) do |v, h| h[v.key] = v.data[i] end yield key_row_pairs end end |
#empty? ⇒ true, false
Check if it is a empty DataFrame.
332 333 334 |
# File 'lib/red_amber/data_frame.rb', line 332 def empty? variables.empty? end |
#group(*group_keys) ⇒ Group #group(*group_keys) {|group| ... } ⇒ DataFrame
Create a Group object. Or create a Group and summarize it.
416 417 418 419 420 |
# File 'lib/red_amber/data_frame.rb', line 416 def group(*group_keys, &block) g = Group.new(self, group_keys) g = g.summarize(&block) if block g end |
#key?(key) ⇒ Boolean Also known as: has_key?
Returns true if self has a specified key in the argument.
236 237 238 |
# File 'lib/red_amber/data_frame.rb', line 236 def key?(key) keys.include?(key.to_sym) end |
#key_index(key) ⇒ Integer Also known as: find_index, index
Returns index of specified key in the Array keys.
248 249 250 |
# File 'lib/red_amber/data_frame.rb', line 248 def key_index(key) keys.find_index(key.to_sym) end |
#keys ⇒ Array Also known as: column_names, var_names
Returns an Array of keys.
223 224 225 |
# File 'lib/red_amber/data_frame.rb', line 223 def keys @keys ||= init_instance_vars(:keys) end |
#n_keys ⇒ Integer Also known as: n_variables, n_vars, n_cols
Returns the number of variables (columns).
191 192 193 |
# File 'lib/red_amber/data_frame.rb', line 191 def n_keys @table.n_columns end |
#propagate(scalar) ⇒ Vector #propagate {|self| ... } ⇒ Vector
Returns a Vector such that all elements have value ‘scalar`
and have same size as self.
765 766 767 768 769 770 771 772 |
# File 'lib/red_amber/data_frame.rb', line 765 def propagate(scalar = nil, &block) if block raise VectorArgumentError, "can't specify both function and block" if scalar scalar = instance_eval(&block) end Vector.new([scalar] * size) end |
#respond_to_missing?(name, include_private) ⇒ Boolean
Catch variable (column) key as method name.
782 783 784 785 786 |
# File 'lib/red_amber/data_frame.rb', line 782 def respond_to_missing?(name, include_private) return true if key?(name) super end |
#schema ⇒ Hash
Returns column name and data type in a Hash.
313 314 315 |
# File 'lib/red_amber/data_frame.rb', line 313 def schema keys.zip(types).to_h end |
#shape ⇒ Array
Returns the numbers of rows and columns.
204 205 206 |
# File 'lib/red_amber/data_frame.rb', line 204 def shape [size, n_keys] end |
#size ⇒ Integer Also known as: n_records, n_obs, n_rows
Returns the number of records (rows).
179 180 181 |
# File 'lib/red_amber/data_frame.rb', line 179 def size @table.n_rows end |
#sub_by_enum(enumerator_method, *args) ⇒ SubFrames Also known as: subframes_by_enum
Create SubFrames by Grouping/Windowing by posion from a enumrator method.
This method will process the indices of self by enumerator.
- Experimental feature
-
this method may be removed or be changed in the future.
575 576 577 |
# File 'lib/red_amber/data_frame.rb', line 575 def sub_by_enum(enumerator_method, *args) SubFrames.new(self, indices.send(enumerator_method, *args).to_a) end |
#sub_by_kernel(kernel, step: 1) ⇒ SubFrames Also known as: subframes_by_kernel
Create SubFrames by windowing with a kernel (i.e. masked window) and step.
- Experimental feature
-
this method may be removed or be changed in the future.
613 614 615 616 617 618 619 620 621 |
# File 'lib/red_amber/data_frame.rb', line 613 def sub_by_kernel(kernel, step: 1) limit_size = size - kernel.size kernel_vector = Vector.new(kernel.concat([nil] * limit_size)) SubFrames.new(self) do 0.step(by: step, to: limit_size).map do |i| kernel_vector.shift(i) end end end |
#sub_by_value(*keys) ⇒ SubFrames Also known as: subframes_by_value, sub_group
Create SubFrames by value grouping.
- Experimental feature
-
this method may be removed or be changed in the future.
457 458 459 |
# File 'lib/red_amber/data_frame.rb', line 457 def sub_by_value(*keys) SubFrames.new(self, group(keys.flatten).filters) end |
#sub_by_window(from: 0, size: nil, step: 1) ⇒ SubFrames Also known as: subframes_by_window
Create SubFrames by Windowing with ‘from`, `size` and `step`.
- Experimental feature
-
this method may be removed or be changed in the future.
500 501 502 503 504 505 506 |
# File 'lib/red_amber/data_frame.rb', line 500 def sub_by_window(from: 0, size: nil, step: 1) SubFrames.new(self) do from.step(by: step, to: (size() - size)).map do |i| # rubocop:disable Style/MethodCallWithoutArgsParentheses [*i...(i + size)] end end end |
#to_a ⇒ Array Also known as: raw_records
If you need column-oriented array, use ‘.to_h.to_a`.
Returns a row-oriented array without header.
299 300 301 |
# File 'lib/red_amber/data_frame.rb', line 299 def to_a @table.raw_records end |
#to_h ⇒ Hash
Returns column-oriented data in a Hash.
288 289 290 |
# File 'lib/red_amber/data_frame.rb', line 288 def to_h variables.transform_values(&:to_a) end |
#to_rover ⇒ Rover::DataFrame
Returns self in a ‘Rover::DataFrame`.
371 372 373 374 |
# File 'lib/red_amber/data_frame.rb', line 371 def to_rover require 'rover' Rover::DataFrame.new(to_h) end |
#type_classes ⇒ Array
Returns an Array of Classes of data type.
270 271 272 |
# File 'lib/red_amber/data_frame.rb', line 270 def type_classes @type_classes ||= @table.columns.map { |column| column.data_type.class } end |
#types ⇒ Array
Returns abbreviated type names in an Array.
259 260 261 262 263 |
# File 'lib/red_amber/data_frame.rb', line 259 def types @types ||= @table.columns.map do |column| column.data.value_type.nick.to_sym end end |
#variables ⇒ Hash Also known as: vars
Returns a Hash of key and Vector pairs in the columns.
213 214 215 |
# File 'lib/red_amber/data_frame.rb', line 213 def variables @variables ||= init_instance_vars(:variables) end |
#vectors ⇒ Array
Returns Vectors in an Array.
279 280 281 |
# File 'lib/red_amber/data_frame.rb', line 279 def vectors @vectors ||= init_instance_vars(:vectors) end |