Class: ScraperWiki::API
- Inherits:
-
Object
- Object
- ScraperWiki::API
- Includes:
- HTTParty
- Defined in:
- lib/scraperwiki-api.rb,
lib/scraperwiki-api/version.rb,
lib/scraperwiki-api/matchers.rb
Overview
A Ruby wrapper for the ScraperWiki API.
Defined Under Namespace
Modules: Matchers
Constant Summary collapse
- RUN_INTERVALS =
{ :never => -1, :monthly => 2678400, :weekly => 604800, :daily => 86400, :hourly => 3600, }
- VERSION =
"0.0.7"
Class Method Summary collapse
-
.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
-
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper's overview.
Instance Method Summary collapse
-
#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
Queries and extracts data via a general purpose SQL interface.
-
#initialize(apikey = nil) ⇒ API
constructor
Initializes a ScraperWiki API object.
-
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Extracts data about a scraper's code, owner, history, etc.
-
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
See what the scraper did during each run.
-
#scraper_getuserinfo(username) ⇒ Array
Find out information about a user.
-
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
-
#scraper_usersearch(opts = {}) ⇒ Array
Search for a user by name.
Constructor Details
#initialize(apikey = nil) ⇒ API
Initializes a ScraperWiki API object.
37 38 39 |
# File 'lib/scraperwiki-api.rb', line 37 def initialize(apikey = nil) @apikey = apikey end |
Class Method Details
.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
31 32 33 |
# File 'lib/scraperwiki-api.rb', line 31 def edit_scraper_url(shortname) "https://scraperwiki.com/scrapers/#{shortname}/edit/" end |
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper's overview.
23 24 25 |
# File 'lib/scraperwiki-api.rb', line 23 def scraper_url(shortname) "https://scraperwiki.com/scrapers/#{shortname}/" end |
Instance Method Details
#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
The query string parameter is +name+, not +shortname+ as in the ScraperWiki docs
Queries and extracts data via a general purpose SQL interface.
To make an RSS feed you need to use SQL's +AS+ keyword (e.g. "SELECT name AS description") to make columns called +title+, +link+, +description+, +guid+ (optional, uses link if not available) and +pubDate+ or +date+.
+jsondict+ example output:
[
{
"fieldA": "valueA",
"fieldB": "valueB",
"fieldC": "valueC",
},
...
]
+jsonlist+ example output:
{
"keys": ["fieldA", "fieldB", "fieldC"],
"data": [
["valueA", "valueB", "valueC"],
...
]
}
+csv+ example output:
fieldA,fieldB,fieldC
valueA,valueB,valueC
...
86 87 88 89 90 91 |
# File 'lib/scraperwiki-api.rb', line 86 def datastore_sqlite(shortname, query, opts = {}) if Array === opts[:attach] opts[:attach] = opts[:attach].join ';' end request_with_apikey '/datastore/sqlite', {:name => shortname, :query => query}.merge(opts) end |
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Returns an array although the array seems to always have only one item
The +tags+ field seems to always be an empty array
Fields like +last_run+ seem to follow British Summer Time.
The query string parameter is +name+, not +shortname+ as in the ScraperWiki docs
Extracts data about a scraper's code, owner, history, etc.
- +runid+ is a Unix timestamp with microseconds and a UUID.
- The value of +records+ is the same as that of +total_rows+ under +datasummary+.
- +run_interval+ is the number of seconds between runs. It is one of:
- -1 (never)
- 2678400 (monthly)
- 604800 (weekly)
- 86400 (daily)
- 3600 (hourly)
- +privacy_status+ is one of:
- "public" (everyone can see and edit the scraper and its data)
- "visible" (everyone can see the scraper, but only contributors can edit it)
- "private" (only contributors can see and edit the scraper and its data)
- An individual +runevents+ hash will have an +exception_message+ key if there was an error during that run.
Example output:
[
{
"code": "require 'nokogiri'\n...",
"datasummary": {
"tables": {
"swdata": {
"keys": [
"fieldA",
...
],
"count": 42,
"sql": "CREATE TABLE `swdata` (...)"
},
"swvariables": {
"keys": [
"value_blob",
"type",
"name"
],
"count": 2,
"sql": "CREATE TABLE `swvariables` (`value_blob` blob, `type` text, `name` text)"
},
...
},
"total_rows": 44,
"filesize": 1000000
},
"description": "Scrapes websites for data.",
"language": "ruby",
"title": "Example scraper",
"tags": [],
"short_name": "example-scraper",
"userroles": {
"owner": [
"johndoe"
],
"editor": [
"janedoe",
...
]
},
"last_run": "1970-01-01T00:00:00",
"created": "1970-01-01T00:00:00",
"runevents": [
{
"still_running": false,
"pages_scraped": 5,
"run_started": "1970-01-01T00:00:00",
"last_update": "1970-01-01T00:00:00",
"runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
"records_produced": 42
},
...
],
"records": 44,
"wiki_type": "scraper",
"privacy_status": "visible",
"run_interval": 604800,
"attachable_here": [],
"attachables": [],
"history": [
...,
{
"date": "1970-01-01T00:00:00",
"version": 0,
"user": "johndoe",
"session": "Thu, 1 Jan 1970 00:00:08 GMT"
}
]
}
]
198 199 200 201 202 203 |
# File 'lib/scraperwiki-api.rb', line 198 def scraper_getinfo(shortname, opts = {}) if Array === opts[:quietfields] opts[:quietfields] = opts[:quietfields].join '|' end request_with_apikey '/scraper/getinfo', {:name => shortname}.merge(opts) end |
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
Returns an array although the array seems to always have only one item
The query string parameter is +name+, not +shortname+ as in the ScraperWiki docs
See what the scraper did during each run.
Example output:
[
{
"run_ended": "1970-01-01T00:00:00",
"first_url_scraped": "http://www.iana.org/domains/example/",
"pages_scraped": 5,
"run_started": "1970-01-01T00:00:00",
"runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
"domainsscraped": [
{
"domain": "http://example.com",
"bytes": 1000000,
"pages": 5
}
...
],
"output": "...",
"records_produced": 42
}
]
237 238 239 |
# File 'lib/scraperwiki-api.rb', line 237 def scraper_getruninfo(shortname, opts = {}) request_with_apikey '/scraper/getruninfo', {:name => shortname}.merge(opts) end |
#scraper_getuserinfo(username) ⇒ Array
Returns an array although the array seems to always have only one item
The date joined field is +date_joined+ (with underscore) on #scraper_usersearch
Find out information about a user.
Example output:
[
{
"username": "johndoe",
"profilename": "John Doe",
"coderoles": {
"owner": [
"johndoe.emailer",
"example-scraper",
...
],
"email": [
"johndoe.emailer"
],
"editor": [
"yet-another-scraper",
...
]
},
"datejoined": "1970-01-01T00:00:00"
}
]
273 274 275 |
# File 'lib/scraperwiki-api.rb', line 273 def scraper_getuserinfo(username) request_with_apikey '/scraper/getuserinfo', :username => username end |
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
Example output:
[
{
"description": "Scrapes websites for data.",
"language": "ruby",
"created": "1970-01-01T00:00:00",
"title": "Example scraper",
"short_name": "example-scraper",
"privacy_status": "public"
},
...
]
299 300 301 |
# File 'lib/scraperwiki-api.rb', line 299 def scraper_search(opts = {}) request_with_apikey '/scraper/search', opts end |
#scraper_usersearch(opts = {}) ⇒ Array
The date joined field is +datejoined+ (without underscore) on #scraper_getuserinfo
Search for a user by name.
Example output:
[
{
"username": "johndoe",
"profilename": "John Doe",
"date_joined": "1970-01-01T00:00:00"
},
...
]
327 328 329 330 331 332 |
# File 'lib/scraperwiki-api.rb', line 327 def scraper_usersearch(opts = {}) if Array === opts[:nolist] opts[:nolist] = opts[:nolist].join ' ' end request '/scraper/usersearch', opts end |