Module: ScraperWiki

Extended by:
ScraperWiki
Included in:
ScraperWiki
Defined in:
lib/scraperwiki.rb,
lib/scraperwiki/version.rb

Constant Summary collapse

VERSION =
'3.0.2'

Instance Method Summary collapse

Instance Method Details

#close_sqliteObject



104
105
106
107
# File 'lib/scraperwiki.rb', line 104

def close_sqlite
  sqlite_magic_connection.close
  @sqlite_magic_connection = nil
end

#config=(config_hash) ⇒ Object



71
72
73
# File 'lib/scraperwiki.rb', line 71

def config=(config_hash)
  @config ||= config_hash
end

#convert_data(value_data) ⇒ Object



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/scraperwiki.rb', line 52

def convert_data(value_data)
  return value_data if value_data.nil? or (value_data.respond_to?(:empty?) and value_data.empty?)
  [value_data].flatten(1).collect do |datum_hash|
    datum_hash.inject({}) do |hsh, (k,v)|
      hsh[k] =
        case v
        when Date, DateTime
          v.iso8601
        when Time
          # maintains existing ScraperWiki behaviour
          v.iso8601.sub(/([+-]00:00|Z)$/, '')
        else
          v
        end
      hsh
    end
  end
end

#default_table_nameObject



196
197
198
# File 'lib/scraperwiki.rb', line 196

def default_table_name
  (@config && @config[:default_table_name]) || 'swdata'
end

#get_var(name, default = nil, _verbose = 2) ⇒ Object

Allows the user to retrieve a previously saved variable

Parameters

  • name = The variable name to fetch

  • default = The value to use if the variable name is not found

  • verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)

Example

ScraperWiki.get_var(‘current’, 0)



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/scraperwiki.rb', line 120

def get_var(name, default=nil, _verbose=2)
  result = sqlite_magic_connection.execute("select value_blob, type from swvariables where name=?", [name])
  return default if result.empty?
  result_val = result.first['value_blob']
  case result.first['type']
  when 'Fixnum'
    result_val.to_i
  when 'Float'
    result_val.to_f
  when 'NilClass'
    nil
  when 'Array','Hash'
    JSON.parse(result_val)
  else
    result_val
  end
rescue SqliteMagic::NoSuchTable
  return default
end

#save(*args) ⇒ Object

legacy alias for #save_sqlite method, so works with older scrapers



96
97
98
# File 'lib/scraperwiki.rb', line 96

def save(*args)
  save_sqlite(*args)
end

#save_sqlite(unique_keys, data, table_name = nil, _verbose = 0) ⇒ Object

Saves the provided data into a local database for this scraper. Data is upserted into this table (inserted if it does not exist, updated if the unique keys say it does).

Parameters

  • unique_keys = A list of column names, that used together should be unique

  • data = A hash of the data where the Key is the column name, the Value the row

    value. If sending lots of data this can be a array of hashes.
    
  • table_name = The name that the newly created table should use (default is ‘swdata’).

  • verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)

Example

ScraperWiki::save(, ‘id’=>1)



89
90
91
92
93
# File 'lib/scraperwiki.rb', line 89

def save_sqlite(unique_keys, data, table_name=nil,_verbose=0)
  table_name ||= default_table_name
  converted_data = convert_data(data)
  sqlite_magic_connection.save_data(unique_keys, converted_data, table_name)
end

#save_var(name, value, _verbose = 2) ⇒ Object

Allows the user to save a single variable (at a time) to carry state across runs of the scraper.

Parameters

  • name = The variable name

  • value = The value of the variable

  • verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)

Example

ScraperWiki.save_var(‘current’, 100)



152
153
154
155
156
157
158
159
160
# File 'lib/scraperwiki.rb', line 152

def save_var(name, value, _verbose=2)
  val_type = value.class.to_s
  unless ['Fixnum','String','Float','NilClass', 'Array','Hash'].include?(val_type)
    puts "*** object of type #{val_type} converted to string\n"
  end
  val = val_type[/Array|Hash/] ? value.to_json : value.to_s
  data = { :name => name.to_s, :value_blob => val, :type => val_type }
  sqlite_magic_connection.save_data([:name], data, 'swvariables')
end

#scrape(url, params = nil, agent = nil) ⇒ Object

The scrape method fetches the content from a webserver.

Parameters

  • url = The URL to fetch

  • params = The parameters to send with a POST request

  • _agent = A manually supplied useragent string

NB This method hasn’t been refactored or tested, but could prob do with both

Example

ScraperWiki::scrape(‘scraperwiki.com’)



25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/scraperwiki.rb', line 25

def scrape(url, params = nil, agent = nil)
  if agent
    client = HTTPClient.new(:agent_name => agent)
  else
    client = HTTPClient.new
  end
  client.ssl_config.verify_mode = OpenSSL::SSL::VERIFY_NONE
  if HTTPClient.respond_to?("client.transparent_gzip_decompression=")
    client.transparent_gzip_decompression = true
  end

  if params.nil?
    html = client.get_content(url)
  else
    html = client.post_content(url, params)
  end

  unless HTTPClient.respond_to?("client.transparent_gzip_decompression=")
    begin
      gz = Zlib::GzipReader.new(StringIO.new(html))
      return gz.read
    rescue
      return html
    end
  end
end

#select(sqlquery, data = nil, _verbose = 1) ⇒ Object

Allows for a simplified select statement

Parameters

  • sqlquery = A valid select statement, without the select keyword

  • data = Bind variables provided for ? replacements in the query. See Sqlite3#execute for details

  • verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)

  • optionally

    a block can be also be passed and the result rows will be passed

one-by-one to the black rather than loading and returning the whole result set

Returns

An array of hashes containing the returned data

Example

ScraperWiki.select(‘* from swdata’)



178
179
180
181
182
183
184
185
186
187
188
# File 'lib/scraperwiki.rb', line 178

def select(sqlquery, data=nil, _verbose=1)
  if block_given?
    sqlite_magic_connection.database.
                            query("SELECT "+sqlquery, data).
                            each_hash do |row_hash|
       yield row_hash
    end
  else
    sqlite_magic_connection.execute("SELECT "+sqlquery, data)
  end
end

#sqlite_magic_connectionObject

Establish an SQLiteMagic::Connection (and remember it)



191
192
193
194
# File 'lib/scraperwiki.rb', line 191

def sqlite_magic_connection
  db = @config ? @config[:db] : 'scraperwiki.sqlite'
  @sqlite_magic_connection ||= SqliteMagic::Connection.new(db)
end

#sqliteexecute(query, data = nil, verbose = 2) ⇒ Object



100
101
102
# File 'lib/scraperwiki.rb', line 100

def sqliteexecute(query,data=nil, verbose=2)
  sqlite_magic_connection.execute(query,data)
end