Class: Bucket

Inherits:
Object
  • Object
show all
Defined in:
lib/etl/bucket.rb

Overview

Sometimes I have data coming from several sources. I want to combine the sources and release a consolidated record. This is meant to work like that. For a weird example: >> my_hash = => ‘me’

> :surprise=>“me”

>> b = Bucket.new(my_hash) {|h| h.inject({}) {|hsh, e| hsh = e.last % 3; hsh}}

> #<Bucket:0x232d230 @raw_data=:surprise=>“me”, @filter_block=#<Proc:0x0232d26c@(irb):2>>

>> b.add :this => 1

> :this=>1

>> b.add OpenStruct.new(:this => 6)

> :this=>6

>> b.raw_data

> :this=>6

>> b.filtered_data

> :this=>0

>> b.dump

> :this=>0

>> b.raw_data

> {}

A more practical use that I have for this is with screen scraping, when I’m getting the source of some concept, I may ask the same site for information at different times, or ask complimentary sites for overlaying data. A much more practical use of this is with the TimeBucket. That is a bucket that creates a time series from observations that may be on very different time schedules.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(obj = nil, &block) ⇒ Bucket

Returns a new instance of Bucket.



42
43
44
45
46
# File 'lib/etl/bucket.rb', line 42

def initialize(obj=nil, &block)
  @filter_block = block
  reset_bucket
  assert_object(obj) if obj
end

Instance Attribute Details

#filter_blockObject

The block used to filter the bucket. Useful for converting the data to a different data type. Examples: Return a hash b.filter_block = lambda{|o| o.table} Return an array b.filter_block = lambda{|o| o.table.values}



37
38
39
# File 'lib/etl/bucket.rb', line 37

def filter_block
  @filter_block
end

#raw_dataObject (readonly) Also known as: to_hash

The data in the bucket, as an OpenStruct



40
41
42
# File 'lib/etl/bucket.rb', line 40

def raw_data
  @raw_data
end

#white_listObject Also known as: labels

Reveals the white list. If this is set, it is an array, and it not only filters the data in the bucket, but also orders it.



93
94
95
# File 'lib/etl/bucket.rb', line 93

def white_list
  @white_list
end

Instance Method Details

#add(obj) ⇒ Object



48
49
50
# File 'lib/etl/bucket.rb', line 48

def add(obj)
  assert_object(obj)
end

#dumpObject



52
53
54
55
56
# File 'lib/etl/bucket.rb', line 52

def dump
  data = self.raw_data
  reset_bucket
  filter(data)
end

#filtered_dataObject



58
59
60
# File 'lib/etl/bucket.rb', line 58

def filtered_data
  filter(self.raw_data)
end

#ordered_dataObject

Uses the facets/dictionary to deliver an ordered hash, in the order of the white list.



64
65
66
67
68
69
70
71
# File 'lib/etl/bucket.rb', line 64

def ordered_data
  return self.raw_data unless self.white_list
  dictionary = Dictionary.new
  self.white_list.each do |k|
    dictionary[k] = self.raw_data[k]
  end
  dictionary
end

#to_aObject Also known as: to_array



73
74
75
# File 'lib/etl/bucket.rb', line 73

def to_a
  self.ordered_data.values
end

#to_obj(klass, use_hash = false) ⇒ Object Also known as: to_struct

Initializes a class with the values of the raw data. Good for structs and struct-like classes.



82
83
84
# File 'lib/etl/bucket.rb', line 82

def to_obj(klass, use_hash=false)
  use_hash ? klass.new(self.raw_data) : klass.new(*self.raw_data.values)
end

#to_open_structObject



87
88
89
# File 'lib/etl/bucket.rb', line 87

def to_open_struct
  OpenStruct.new(self.raw_data)
end