Module: ScraperWiki
- Extended by:
- ScraperWiki
- Included in:
- ScraperWiki
- Defined in:
- lib/scraperwiki.rb,
lib/scraperwiki/version.rb
Constant Summary collapse
- VERSION =
'3.0.2'
Instance Method Summary collapse
- #close_sqlite ⇒ Object
- #config=(config_hash) ⇒ Object
- #convert_data(value_data) ⇒ Object
- #default_table_name ⇒ Object
-
#get_var(name, default = nil, _verbose = 2) ⇒ Object
Allows the user to retrieve a previously saved variable.
-
#save(*args) ⇒ Object
legacy alias for #save_sqlite method, so works with older scrapers.
-
#save_sqlite(unique_keys, data, table_name = nil, _verbose = 0) ⇒ Object
Saves the provided data into a local database for this scraper.
-
#save_var(name, value, _verbose = 2) ⇒ Object
Allows the user to save a single variable (at a time) to carry state across runs of the scraper.
-
#scrape(url, params = nil, agent = nil) ⇒ Object
The scrape method fetches the content from a webserver.
-
#select(sqlquery, data = nil, _verbose = 1) ⇒ Object
Allows for a simplified select statement.
-
#sqlite_magic_connection ⇒ Object
Establish an SQLiteMagic::Connection (and remember it).
- #sqliteexecute(query, data = nil, verbose = 2) ⇒ Object
Instance Method Details
#close_sqlite ⇒ Object
104 105 106 107 |
# File 'lib/scraperwiki.rb', line 104 def close_sqlite sqlite_magic_connection.close @sqlite_magic_connection = nil end |
#config=(config_hash) ⇒ Object
71 72 73 |
# File 'lib/scraperwiki.rb', line 71 def config=(config_hash) @config ||= config_hash end |
#convert_data(value_data) ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/scraperwiki.rb', line 52 def convert_data(value_data) return value_data if value_data.nil? or (value_data.respond_to?(:empty?) and value_data.empty?) [value_data].flatten(1).collect do |datum_hash| datum_hash.inject({}) do |hsh, (k,v)| hsh[k] = case v when Date, DateTime v.iso8601 when Time # maintains existing ScraperWiki behaviour v.iso8601.sub(/([+-]00:00|Z)$/, '') else v end hsh end end end |
#default_table_name ⇒ Object
196 197 198 |
# File 'lib/scraperwiki.rb', line 196 def default_table_name (@config && @config[:default_table_name]) || 'swdata' end |
#get_var(name, default = nil, _verbose = 2) ⇒ Object
Allows the user to retrieve a previously saved variable
Parameters
-
name = The variable name to fetch
-
default = The value to use if the variable name is not found
-
verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)
Example
ScraperWiki.get_var(‘current’, 0)
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# File 'lib/scraperwiki.rb', line 120 def get_var(name, default=nil, _verbose=2) result = sqlite_magic_connection.execute("select value_blob, type from swvariables where name=?", [name]) return default if result.empty? result_val = result.first['value_blob'] case result.first['type'] when 'Fixnum' result_val.to_i when 'Float' result_val.to_f when 'NilClass' nil when 'Array','Hash' JSON.parse(result_val) else result_val end rescue SqliteMagic::NoSuchTable return default end |
#save(*args) ⇒ Object
legacy alias for #save_sqlite method, so works with older scrapers
96 97 98 |
# File 'lib/scraperwiki.rb', line 96 def save(*args) save_sqlite(*args) end |
#save_sqlite(unique_keys, data, table_name = nil, _verbose = 0) ⇒ Object
Saves the provided data into a local database for this scraper. Data is upserted into this table (inserted if it does not exist, updated if the unique keys say it does).
Parameters
-
unique_keys = A list of column names, that used together should be unique
-
data = A hash of the data where the Key is the column name, the Value the row
value. If sending lots of data this can be a array of hashes.
-
table_name = The name that the newly created table should use (default is ‘swdata’).
-
verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)
Example
ScraperWiki::save(, ‘id’=>1)
89 90 91 92 93 |
# File 'lib/scraperwiki.rb', line 89 def save_sqlite(unique_keys, data, table_name=nil,_verbose=0) table_name ||= default_table_name converted_data = convert_data(data) sqlite_magic_connection.save_data(unique_keys, converted_data, table_name) end |
#save_var(name, value, _verbose = 2) ⇒ Object
Allows the user to save a single variable (at a time) to carry state across runs of the scraper.
Parameters
-
name = The variable name
-
value = The value of the variable
-
verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)
Example
ScraperWiki.save_var(‘current’, 100)
152 153 154 155 156 157 158 159 160 |
# File 'lib/scraperwiki.rb', line 152 def save_var(name, value, _verbose=2) val_type = value.class.to_s unless ['Fixnum','String','Float','NilClass', 'Array','Hash'].include?(val_type) puts "*** object of type #{val_type} converted to string\n" end val = val_type[/Array|Hash/] ? value.to_json : value.to_s data = { :name => name.to_s, :value_blob => val, :type => val_type } sqlite_magic_connection.save_data([:name], data, 'swvariables') end |
#scrape(url, params = nil, agent = nil) ⇒ Object
The scrape method fetches the content from a webserver.
Parameters
-
url = The URL to fetch
-
params = The parameters to send with a POST request
-
_agent = A manually supplied useragent string
NB This method hasn’t been refactored or tested, but could prob do with both
Example
ScraperWiki::scrape(‘scraperwiki.com’)
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/scraperwiki.rb', line 25 def scrape(url, params = nil, agent = nil) if agent client = HTTPClient.new(:agent_name => agent) else client = HTTPClient.new end client.ssl_config.verify_mode = OpenSSL::SSL::VERIFY_NONE if HTTPClient.respond_to?("client.transparent_gzip_decompression=") client.transparent_gzip_decompression = true end if params.nil? html = client.get_content(url) else html = client.post_content(url, params) end unless HTTPClient.respond_to?("client.transparent_gzip_decompression=") begin gz = Zlib::GzipReader.new(StringIO.new(html)) return gz.read rescue return html end end end |
#select(sqlquery, data = nil, _verbose = 1) ⇒ Object
Allows for a simplified select statement
Parameters
-
sqlquery = A valid select statement, without the select keyword
-
data = Bind variables provided for ? replacements in the query. See Sqlite3#execute for details
-
verbose = A verbosity level (not currently implemented, and there just to avoid breaking existing code)
- optionally
-
a block can be also be passed and the result rows will be passed
one-by-one to the black rather than loading and returning the whole result set
Returns
An array of hashes containing the returned data
Example
ScraperWiki.select(‘* from swdata’)
178 179 180 181 182 183 184 185 186 187 188 |
# File 'lib/scraperwiki.rb', line 178 def select(sqlquery, data=nil, _verbose=1) if block_given? sqlite_magic_connection.database. query("SELECT "+sqlquery, data). each_hash do |row_hash| yield row_hash end else sqlite_magic_connection.execute("SELECT "+sqlquery, data) end end |
#sqlite_magic_connection ⇒ Object
Establish an SQLiteMagic::Connection (and remember it)
191 192 193 194 |
# File 'lib/scraperwiki.rb', line 191 def sqlite_magic_connection db = @config ? @config[:db] : 'scraperwiki.sqlite' @sqlite_magic_connection ||= SqliteMagic::Connection.new(db) end |
#sqliteexecute(query, data = nil, verbose = 2) ⇒ Object
100 101 102 |
# File 'lib/scraperwiki.rb', line 100 def sqliteexecute(query,data=nil, verbose=2) sqlite_magic_connection.execute(query,data) end |