Class: ScraperWiki::API
- Inherits:
-
Object
- Object
- ScraperWiki::API
- Includes:
- HTTParty
- Defined in:
- lib/scraperwiki-api.rb,
lib/scraperwiki-api/version.rb,
lib/scraperwiki-api/matchers.rb
Overview
A Ruby wrapper for the ScraperWiki API.
Defined Under Namespace
Modules: Matchers
Constant Summary collapse
- RUN_INTERVALS =
{ never: -1, monthly: 2678400, weekly: 604800, daily: 86400, hourly: 3600, }
- VERSION =
"0.0.6"
Class Method Summary collapse
-
.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
-
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper’s overview.
Instance Method Summary collapse
-
#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
Queries and extracts data via a general purpose SQL interface.
-
#initialize(apikey = nil) ⇒ API
constructor
Initializes a ScraperWiki API object.
-
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Extracts data about a scraper’s code, owner, history, etc.
-
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
See what the scraper did during each run.
-
#scraper_getuserinfo(username) ⇒ Array
Find out information about a user.
-
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
-
#scraper_usersearch(opts = {}) ⇒ Array
Search for a user by name.
Constructor Details
#initialize(apikey = nil) ⇒ API
Initializes a ScraperWiki API object.
37 38 39 |
# File 'lib/scraperwiki-api.rb', line 37 def initialize(apikey = nil) @apikey = apikey end |
Class Method Details
.edit_scraper_url(shortname) ⇒ String
Returns the URL to edit the scraper.
31 32 33 |
# File 'lib/scraperwiki-api.rb', line 31 def edit_scraper_url(shortname) "https://scraperwiki.com/scrapers/#{shortname}/edit/" end |
.scraper_url(shortname) ⇒ String
Returns the URL to the scraper’s overview.
23 24 25 |
# File 'lib/scraperwiki-api.rb', line 23 def scraper_url(shortname) "https://scraperwiki.com/scrapers/#{shortname}/" end |
Instance Method Details
#datastore_sqlite(shortname, query, opts = {}) ⇒ Array, ...
The query string parameter is name
, not shortname
as in the ScraperWiki docs
Queries and extracts data via a general purpose SQL interface.
To make an RSS feed you need to use SQL’s AS
keyword (e.g. “SELECT name AS description”) to make columns called title
, link
, description
, guid
(optional, uses link if not available) and pubDate
or date
.
jsondict
example output:
[
{
"fieldA": "valueA",
"fieldB": "valueB",
"fieldC": "valueC",
},
...
]
jsonlist
example output:
{
"keys": ["fieldA", "fieldB", "fieldC"],
"data": [
["valueA", "valueB", "valueC"],
...
]
}
csv
example output:
fieldA,fieldB,fieldC
valueA,valueB,valueC
...
86 87 88 89 90 91 |
# File 'lib/scraperwiki-api.rb', line 86 def datastore_sqlite(shortname, query, opts = {}) if Array === opts[:attach] opts[:attach] = opts[:attach].join ';' end request_with_apikey '/datastore/sqlite', {name: shortname, query: query}.merge(opts) end |
#scraper_getinfo(shortname, opts = {}) ⇒ Array
Returns an array although the array seems to always have only one item
The tags
field seems to always be an empty array
Fields like last_run
seem to follow British Summer Time.
The query string parameter is name
, not shortname
as in the ScraperWiki docs
Extracts data about a scraper’s code, owner, history, etc.
-
runid
is a Unix timestamp with microseconds and a UUID. -
The value of
records
is the same as that oftotal_rows
underdatasummary
. -
run_interval
is the number of seconds between runs. It is one of:-
-1 (never)
-
2678400 (monthly)
-
604800 (weekly)
-
86400 (daily)
-
3600 (hourly)
-
-
privacy_status
is one of:-
“public” (everyone can see and edit the scraper and its data)
-
“visible” (everyone can see the scraper, but only contributors can edit it)
-
“private” (only contributors can see and edit the scraper and its data)
-
-
An individual
runevents
hash will have anexception_message
key if there was an error during that run.
Example output:
[
{
"code": "require 'nokogiri'\n...",
"datasummary": {
"tables": {
"swdata": {
"keys": [
"fieldA",
...
],
"count": 42,
"sql": "CREATE TABLE `swdata` (...)"
},
"swvariables": {
"keys": [
"value_blob",
"type",
"name"
],
"count": 2,
"sql": "CREATE TABLE `swvariables` (`value_blob` blob, `type` text, `name` text)"
},
...
},
"total_rows": 44,
"filesize": 1000000
},
"description": "Scrapes websites for data.",
"language": "ruby",
"title": "Example scraper",
"tags": [],
"short_name": "example-scraper",
"userroles": {
"owner": [
"johndoe"
],
"editor": [
"janedoe",
...
]
},
"last_run": "1970-01-01T00:00:00",
"created": "1970-01-01T00:00:00",
"runevents": [
{
"still_running": false,
"pages_scraped": 5,
"run_started": "1970-01-01T00:00:00",
"last_update": "1970-01-01T00:00:00",
"runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
"records_produced": 42
},
...
],
"records": 44,
"wiki_type": "scraper",
"privacy_status": "visible",
"run_interval": 604800,
"attachable_here": [],
"attachables": [],
"history": [
...,
{
"date": "1970-01-01T00:00:00",
"version": 0,
"user": "johndoe",
"session": "Thu, 1 Jan 1970 00:00:08 GMT"
}
]
}
]
198 199 200 201 202 203 |
# File 'lib/scraperwiki-api.rb', line 198 def scraper_getinfo(shortname, opts = {}) if Array === opts[:quietfields] opts[:quietfields] = opts[:quietfields].join '|' end request_with_apikey '/scraper/getinfo', {name: shortname}.merge(opts) end |
#scraper_getruninfo(shortname, opts = {}) ⇒ Array
Returns an array although the array seems to always have only one item
The query string parameter is name
, not shortname
as in the ScraperWiki docs
See what the scraper did during each run.
Example output:
[
{
"run_ended": "1970-01-01T00:00:00",
"first_url_scraped": "http://www.iana.org/domains/example/",
"pages_scraped": 5,
"run_started": "1970-01-01T00:00:00",
"runid": "1325394000.000000_xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx",
"domainsscraped": [
{
"domain": "http://example.com",
"bytes": 1000000,
"pages": 5
}
...
],
"output": "...",
"records_produced": 42
}
]
237 238 239 |
# File 'lib/scraperwiki-api.rb', line 237 def scraper_getruninfo(shortname, opts = {}) request_with_apikey '/scraper/getruninfo', {name: shortname}.merge(opts) end |
#scraper_getuserinfo(username) ⇒ Array
Returns an array although the array seems to always have only one item
The date joined field is date_joined
(with underscore) on #scraper_usersearch
Find out information about a user.
Example output:
[
{
"username": "johndoe",
"profilename": "John Doe",
"coderoles": {
"owner": [
"johndoe.emailer",
"example-scraper",
...
],
"email": [
"johndoe.emailer"
],
"editor": [
"yet-another-scraper",
...
]
},
"datejoined": "1970-01-01T00:00:00"
}
]
273 274 275 |
# File 'lib/scraperwiki-api.rb', line 273 def scraper_getuserinfo(username) request_with_apikey '/scraper/getuserinfo', username: username end |
#scraper_search(opts = {}) ⇒ Array
Search the titles and descriptions of all the scrapers.
Example output:
[
{
"description": "Scrapes websites for data.",
"language": "ruby",
"created": "1970-01-01T00:00:00",
"title": "Example scraper",
"short_name": "example-scraper",
"privacy_status": "public"
},
...
]
299 300 301 |
# File 'lib/scraperwiki-api.rb', line 299 def scraper_search(opts = {}) request_with_apikey '/scraper/search', opts end |
#scraper_usersearch(opts = {}) ⇒ Array
The date joined field is datejoined
(without underscore) on #scraper_getuserinfo
Search for a user by name.
Example output:
[
{
"username": "johndoe",
"profilename": "John Doe",
"date_joined": "1970-01-01T00:00:00"
},
...
]
327 328 329 330 331 332 |
# File 'lib/scraperwiki-api.rb', line 327 def scraper_usersearch(opts = {}) if Array === opts[:nolist] opts[:nolist] = opts[:nolist].join ' ' end request '/scraper/usersearch', opts end |