Class: DataFrame

Inherits:
Object
  • Object
show all
Defined in:
lib/data_frame/model.rb,
lib/data_frame/data_frame.rb

Overview

This allows me to have named columns and optionally named rows in a data frame, to work calculations (usually on the columns), to transpose the matrix and store the transposed matrix until the object is tainted.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(*labels) ⇒ DataFrame

Returns a new instance of DataFrame.



67
68
69
70
# File 'lib/data_frame/data_frame.rb', line 67

def initialize(*labels)
  @labels = labels.map {|e| e.to_underscore_sym }
  @items = TransposableArray.new
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(sym, *args, &block) ⇒ Object



119
120
121
122
123
124
125
126
127
128
129
# File 'lib/data_frame/data_frame.rb', line 119

def method_missing(sym, *args, &block)
  if self.labels.include?(sym)
    render_column(sym)
  elsif self.row_labels.include?(sym)
    render_row(sym)
  elsif @items.respond_to?(sym)
    @items.send(sym, *args, &block)
  else
    super
  end
end

Instance Attribute Details

#itemsObject (readonly) Also known as: rows

The items stored in the frame



65
66
67
# File 'lib/data_frame/data_frame.rb', line 65

def items
  @items
end

#labelsObject (readonly) Also known as: variables

The labels of the data items



61
62
63
# File 'lib/data_frame/data_frame.rb', line 61

def labels
  @labels
end

Class Method Details

.from_csv(obj, opts = {}) ⇒ Object

This is the neatest part of this neat gem. DataFrame.from_csv can be called in a lot of ways: DataFrame.from_csv(csv_contents) DataFrame.from_csv(filename) DataFrame.from_csv(url) If you need to define converters for FasterCSV, do it before calling this method: FasterCSV::Converters = lambda{|f| f == ‘foo’ ? ‘bar’ : ‘foo’} DataFrame.from_csv(‘example.com/my_special_url.csv’, :converters => :special) This returns bar where ‘foo’ was found and ‘foo’ everywhere else.



19
20
21
22
23
24
25
# File 'lib/data_frame/data_frame.rb', line 19

def from_csv(obj, opts={})
  labels, table = infer_csv_contents(obj, opts)
  return nil unless labels and table
  df = new(*labels)
  df.import(table)
  df
end

Instance Method Details

#add_item(item) ⇒ Object Also known as: add



72
73
74
# File 'lib/data_frame/data_frame.rb', line 72

def add_item(item)
  self.items << item
end

#append!(column_name, value = nil) ⇒ Object

Adds a unique column to the table

Raises:

  • (ArgumentError)


265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/data_frame/data_frame.rb', line 265

def append!(column_name, value=nil)
  raise ArgumentError, "Can't have duplicate column names" if self.labels.include?(column_name)
  self.labels << column_name.to_underscore_sym
  if value.is_a?(Array)
    self.items.each_with_index do |item, i|
      item << value[i]
    end
  else
    self.items.each do |item|
      item << value
    end
  end
  # Because we are tainting the sub arrays, the TaintableArray doesn't know it's been changed.
  self.items.taint
end

#columns(reset = false) ⇒ Object Also known as: to_hash, to_dictionary

The columns as a Dictionary or Hash This is cached, call columns(true) to reset the cache.



97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/data_frame/data_frame.rb', line 97

def columns(reset=false)
  @columns = nil if reset
  return @columns if @columns
  
  container = defined?(Dictionary) ? Dictionary.new : Hash.new
  i = 0
  
  @columns = @items.transpose.inject(container) do |cont, col|
    cont[@labels[i]] = col
    i += 1
    cont
  end
end

#drop!(*labels) ⇒ Object



131
132
133
134
135
136
# File 'lib/data_frame/data_frame.rb', line 131

def drop!(*labels)
  labels.each do |label|
    drop_one!(label)
  end
  self
end

#filter(as = Array, &block) ⇒ Object



193
194
195
196
# File 'lib/data_frame/data_frame.rb', line 193

def filter(as=Array, &block)
  new_data_frame = self.clone
  new_data_frame.filter!(as, &block)
end

#filter!(as = Array, &block) ⇒ Object

Takes a block to evaluate on each row. The row can be converted into an OpenStruct or a Hash for easier filter methods. Note, don’t try this with a hash or open struct unless you have facets available.



182
183
184
185
186
187
188
189
190
191
# File 'lib/data_frame/data_frame.rb', line 182

def filter!(as=Array, &block)
  as = infer_class(as)
  items = []
  self.items.each do |row|
    value = block.call(cast_row(row, as))
    items << row if value
  end
  @items = items.dup
  self
end

#filter_by_category(hash) ⇒ Object



281
282
283
284
285
286
287
288
289
290
# File 'lib/data_frame/data_frame.rb', line 281

def filter_by_category(hash)
  new_data_frame = self.dup
  hash.each do |key, value|
    key = key.to_underscore_sym
    next unless self.labels.include?(key)
    value = [value] unless value.is_a?(Array) or value.is_a?(Range)
    new_data_frame.filter!(:hash) {|row| value.include?(row[key])}
  end
  new_data_frame
end

#filter_by_category!(hash) ⇒ Object



292
293
294
295
296
297
298
299
# File 'lib/data_frame/data_frame.rb', line 292

def filter_by_category!(hash)
  hash.each do |key, value|
    key = key.to_underscore_sym
    next unless self.labels.include?(key)
    value = [value] unless value.is_a?(Array) or value.is_a?(Range)
    self.filter!(:hash) {|row| value.include?(row[key])}
  end
end

#import(rows) ⇒ Object

Loads a batch of rows. Expects an array of arrays, else you don’t know what you have.



50
51
52
53
54
# File 'lib/data_frame/data_frame.rb', line 50

def import(rows)
  rows.each do |row|
    self.add_item(row)
  end
end

#inspectObject



56
57
58
# File 'lib/data_frame/data_frame.rb', line 56

def inspect
  "DataFrame rows: #{self.rows.size} labels: #{self.labels.inspect}"
end

#j_binary_ize!(*columns) ⇒ Object

A weird name. This creates a column for every category in a column and marks each row by its value



242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/data_frame/data_frame.rb', line 242

def j_binary_ize!(*columns)
  # Allows to mix a hash with the columns.
  options = columns.find_all {|e| e.is_a?(Hash)}.inject({}) {|h, e| h.merge!(e)}
  columns.delete_if {|e| e.is_a?(Hash)}
  
  # Generates new columns
  columns.each do |col|
    values = render_column(col.to_underscore_sym)
    values.categories.each do |category|
      full_name = (col.to_s + "_" + category.to_s).to_sym
      if options[:allow_overlap]
        category_map = values.inject([]) do |list, e|
          list << values.all_categories(e)
        end
        self.append!(full_name, category_map.map{|e| e.include?(category)})
      else
        self.append!(full_name, values.category_map.map{|e| e == category})
      end
    end
  end
end

#model(name = nil, &block) ⇒ Object

Returns a model if defined Defines a model with a block, if given and not defined Stores the model in the models container, which gives us access like: df.models.new_model_name…



8
9
10
11
12
13
14
15
16
# File 'lib/data_frame/model.rb', line 8

def model(name=nil, &block)
  return self.models[name] if self.models.table.keys.include?(name)
  return false unless block
  @pc = ParameterCapture.new(&block)
  model = self.filter(Hash) do |row|
    @pc.filter(row)
  end
  self.models.table[name] = model
end

#modelsObject



18
19
20
# File 'lib/data_frame/model.rb', line 18

def models
  @models ||= OpenStruct.new
end

#render_column(sym) ⇒ Object



86
87
88
89
90
# File 'lib/data_frame/data_frame.rb', line 86

def render_column(sym)
  i = @labels.index(sym)
  return nil unless i
  @items.transpose[i]
end

#render_row(sym) ⇒ Object



113
114
115
116
117
# File 'lib/data_frame/data_frame.rb', line 113

def render_row(sym)
  i = self.row_labels.index(sym)
  return nil unless i
  @items[i]
end

#replace!(column, values = nil, &block) ⇒ Object



149
150
151
152
153
154
155
156
157
# File 'lib/data_frame/data_frame.rb', line 149

def replace!(column, values=nil, &block)
  column = validate_column(column)
  if not values
    values = self.send(column)
    values.map! {|e| block.call(e)}
  end
  replace_column(column, values)
  self
end

#row_labelsObject



77
78
79
# File 'lib/data_frame/data_frame.rb', line 77

def row_labels
  @row_labels ||= []
end

#row_labels=(ary) ⇒ Object

Raises:

  • (ArgumentError)


81
82
83
84
# File 'lib/data_frame/data_frame.rb', line 81

def row_labels=(ary)
  raise ArgumentError, "Row labels must be an array" unless ary.is_a?(Array)
  @row_labels = ary
end

#subset_from_columns(*cols) ⇒ Object

Creates a new data frame, only with the specified columns.



227
228
229
230
231
232
233
234
235
236
237
238
# File 'lib/data_frame/data_frame.rb', line 227

def subset_from_columns(*cols)
  new_labels = self.labels.inject([]) do |list, label|
    list << label if cols.include?(label)
    list
  end
  new_data_frame = DataFrame.new(*self.labels)
  new_data_frame.import(self.items)
  self.labels.each do |label|
    new_data_frame.drop!(label) unless new_labels.include?(label)
  end
  new_data_frame
end