Class: DataFrame
- Inherits:
-
Object
- Object
- DataFrame
- Defined in:
- lib/data_frame/model.rb,
lib/data_frame/data_frame.rb
Overview
This allows me to have named columns and optionally named rows in a data frame, to work calculations (usually on the columns), to transpose the matrix and store the transposed matrix until the object is tainted.
Instance Attribute Summary collapse
-
#items ⇒ Object
(also: #rows)
readonly
The items stored in the frame.
-
#labels ⇒ Object
(also: #variables)
readonly
The labels of the data items.
Class Method Summary collapse
-
.from_csv(obj, opts = {}) ⇒ Object
This is the neatest part of this neat gem.
Instance Method Summary collapse
- #add_item(item) ⇒ Object (also: #add)
-
#append!(column_name, value = nil) ⇒ Object
Adds a unique column to the table.
-
#columns(reset = false) ⇒ Object
(also: #to_hash, #to_dictionary)
The columns as a Dictionary or Hash This is cached, call columns(true) to reset the cache.
- #drop!(*labels) ⇒ Object
- #filter(as = Array, &block) ⇒ Object
-
#filter!(as = Array, &block) ⇒ Object
Takes a block to evaluate on each row.
- #filter_by_category(hash) ⇒ Object
- #filter_by_category!(hash) ⇒ Object
-
#import(rows) ⇒ Object
Loads a batch of rows.
-
#initialize(*labels) ⇒ DataFrame
constructor
A new instance of DataFrame.
- #inspect ⇒ Object
-
#j_binary_ize!(*columns) ⇒ Object
A weird name.
- #method_missing(sym, *args, &block) ⇒ Object
-
#model(name = nil, &block) ⇒ Object
Returns a model if defined Defines a model with a block, if given and not defined Stores the model in the models container, which gives us access like: df.models.new_model_name…
- #models ⇒ Object
- #render_column(sym) ⇒ Object
- #render_row(sym) ⇒ Object
- #replace!(column, values = nil, &block) ⇒ Object
- #row_labels ⇒ Object
- #row_labels=(ary) ⇒ Object
-
#subset_from_columns(*cols) ⇒ Object
Creates a new data frame, only with the specified columns.
Constructor Details
#initialize(*labels) ⇒ DataFrame
Returns a new instance of DataFrame.
67 68 69 70 |
# File 'lib/data_frame/data_frame.rb', line 67 def initialize(*labels) @labels = labels.map {|e| e.to_underscore_sym } @items = TransposableArray.new end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(sym, *args, &block) ⇒ Object
119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/data_frame/data_frame.rb', line 119 def method_missing(sym, *args, &block) if self.labels.include?(sym) render_column(sym) elsif self.row_labels.include?(sym) render_row(sym) elsif @items.respond_to?(sym) @items.send(sym, *args, &block) else super end end |
Instance Attribute Details
#items ⇒ Object (readonly) Also known as: rows
The items stored in the frame
65 66 67 |
# File 'lib/data_frame/data_frame.rb', line 65 def items @items end |
#labels ⇒ Object (readonly) Also known as: variables
The labels of the data items
61 62 63 |
# File 'lib/data_frame/data_frame.rb', line 61 def labels @labels end |
Class Method Details
.from_csv(obj, opts = {}) ⇒ Object
This is the neatest part of this neat gem. DataFrame.from_csv can be called in a lot of ways: DataFrame.from_csv(csv_contents) DataFrame.from_csv(filename) DataFrame.from_csv(url) If you need to define converters for FasterCSV, do it before calling this method: FasterCSV::Converters = lambda{|f| f == ‘foo’ ? ‘bar’ : ‘foo’} DataFrame.from_csv(‘example.com/my_special_url.csv’, :converters => :special) This returns bar where ‘foo’ was found and ‘foo’ everywhere else.
19 20 21 22 23 24 25 |
# File 'lib/data_frame/data_frame.rb', line 19 def from_csv(obj, opts={}) labels, table = infer_csv_contents(obj, opts) return nil unless labels and table df = new(*labels) df.import(table) df end |
Instance Method Details
#add_item(item) ⇒ Object Also known as: add
72 73 74 |
# File 'lib/data_frame/data_frame.rb', line 72 def add_item(item) self.items << item end |
#append!(column_name, value = nil) ⇒ Object
Adds a unique column to the table
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
# File 'lib/data_frame/data_frame.rb', line 265 def append!(column_name, value=nil) raise ArgumentError, "Can't have duplicate column names" if self.labels.include?(column_name) self.labels << column_name.to_underscore_sym if value.is_a?(Array) self.items.each_with_index do |item, i| item << value[i] end else self.items.each do |item| item << value end end # Because we are tainting the sub arrays, the TaintableArray doesn't know it's been changed. self.items.taint end |
#columns(reset = false) ⇒ Object Also known as: to_hash, to_dictionary
The columns as a Dictionary or Hash This is cached, call columns(true) to reset the cache.
97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/data_frame/data_frame.rb', line 97 def columns(reset=false) @columns = nil if reset return @columns if @columns container = defined?(Dictionary) ? Dictionary.new : Hash.new i = 0 @columns = @items.transpose.inject(container) do |cont, col| cont[@labels[i]] = col i += 1 cont end end |
#drop!(*labels) ⇒ Object
131 132 133 134 135 136 |
# File 'lib/data_frame/data_frame.rb', line 131 def drop!(*labels) labels.each do |label| drop_one!(label) end self end |
#filter(as = Array, &block) ⇒ Object
193 194 195 196 |
# File 'lib/data_frame/data_frame.rb', line 193 def filter(as=Array, &block) new_data_frame = self.clone new_data_frame.filter!(as, &block) end |
#filter!(as = Array, &block) ⇒ Object
Takes a block to evaluate on each row. The row can be converted into an OpenStruct or a Hash for easier filter methods. Note, don’t try this with a hash or open struct unless you have facets available.
182 183 184 185 186 187 188 189 190 191 |
# File 'lib/data_frame/data_frame.rb', line 182 def filter!(as=Array, &block) as = infer_class(as) items = [] self.items.each do |row| value = block.call(cast_row(row, as)) items << row if value end @items = items.dup self end |
#filter_by_category(hash) ⇒ Object
281 282 283 284 285 286 287 288 289 290 |
# File 'lib/data_frame/data_frame.rb', line 281 def filter_by_category(hash) new_data_frame = self.dup hash.each do |key, value| key = key.to_underscore_sym next unless self.labels.include?(key) value = [value] unless value.is_a?(Array) or value.is_a?(Range) new_data_frame.filter!(:hash) {|row| value.include?(row[key])} end new_data_frame end |
#filter_by_category!(hash) ⇒ Object
292 293 294 295 296 297 298 299 |
# File 'lib/data_frame/data_frame.rb', line 292 def filter_by_category!(hash) hash.each do |key, value| key = key.to_underscore_sym next unless self.labels.include?(key) value = [value] unless value.is_a?(Array) or value.is_a?(Range) self.filter!(:hash) {|row| value.include?(row[key])} end end |
#import(rows) ⇒ Object
Loads a batch of rows. Expects an array of arrays, else you don’t know what you have.
50 51 52 53 54 |
# File 'lib/data_frame/data_frame.rb', line 50 def import(rows) rows.each do |row| self.add_item(row) end end |
#inspect ⇒ Object
56 57 58 |
# File 'lib/data_frame/data_frame.rb', line 56 def inspect "DataFrame rows: #{self.rows.size} labels: #{self.labels.inspect}" end |
#j_binary_ize!(*columns) ⇒ Object
A weird name. This creates a column for every category in a column and marks each row by its value
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 |
# File 'lib/data_frame/data_frame.rb', line 242 def j_binary_ize!(*columns) # Allows to mix a hash with the columns. = columns.find_all {|e| e.is_a?(Hash)}.inject({}) {|h, e| h.merge!(e)} columns.delete_if {|e| e.is_a?(Hash)} # Generates new columns columns.each do |col| values = render_column(col.to_underscore_sym) values.categories.each do |category| full_name = (col.to_s + "_" + category.to_s).to_sym if [:allow_overlap] category_map = values.inject([]) do |list, e| list << values.all_categories(e) end self.append!(full_name, category_map.map{|e| e.include?(category)}) else self.append!(full_name, values.category_map.map{|e| e == category}) end end end end |
#model(name = nil, &block) ⇒ Object
Returns a model if defined Defines a model with a block, if given and not defined Stores the model in the models container, which gives us access like: df.models.new_model_name…
8 9 10 11 12 13 14 15 16 |
# File 'lib/data_frame/model.rb', line 8 def model(name=nil, &block) return self.models[name] if self.models.table.keys.include?(name) return false unless block @pc = ParameterCapture.new(&block) model = self.filter(Hash) do |row| @pc.filter(row) end self.models.table[name] = model end |
#models ⇒ Object
18 19 20 |
# File 'lib/data_frame/model.rb', line 18 def models @models ||= OpenStruct.new end |
#render_column(sym) ⇒ Object
86 87 88 89 90 |
# File 'lib/data_frame/data_frame.rb', line 86 def render_column(sym) i = @labels.index(sym) return nil unless i @items.transpose[i] end |
#render_row(sym) ⇒ Object
113 114 115 116 117 |
# File 'lib/data_frame/data_frame.rb', line 113 def render_row(sym) i = self.row_labels.index(sym) return nil unless i @items[i] end |
#replace!(column, values = nil, &block) ⇒ Object
149 150 151 152 153 154 155 156 157 |
# File 'lib/data_frame/data_frame.rb', line 149 def replace!(column, values=nil, &block) column = validate_column(column) if not values values = self.send(column) values.map! {|e| block.call(e)} end replace_column(column, values) self end |
#row_labels ⇒ Object
77 78 79 |
# File 'lib/data_frame/data_frame.rb', line 77 def row_labels @row_labels ||= [] end |
#row_labels=(ary) ⇒ Object
81 82 83 84 |
# File 'lib/data_frame/data_frame.rb', line 81 def row_labels=(ary) raise ArgumentError, "Row labels must be an array" unless ary.is_a?(Array) @row_labels = ary end |
#subset_from_columns(*cols) ⇒ Object
Creates a new data frame, only with the specified columns.
227 228 229 230 231 232 233 234 235 236 237 238 |
# File 'lib/data_frame/data_frame.rb', line 227 def subset_from_columns(*cols) new_labels = self.labels.inject([]) do |list, label| list << label if cols.include?(label) list end new_data_frame = DataFrame.new(*self.labels) new_data_frame.import(self.items) self.labels.each do |label| new_data_frame.drop!(label) unless new_labels.include?(label) end new_data_frame end |