Class: Infoboxer::MediaWiki
- Inherits:
-
Object
- Object
- Infoboxer::MediaWiki
- Defined in:
- lib/infoboxer/media_wiki.rb,
lib/infoboxer/media_wiki/page.rb,
lib/infoboxer/media_wiki/traits.rb
Overview
MediaWiki client class.
Usage:
client = Infoboxer::MediaWiki
.new('http://en.wikipedia.org/w/api.php', user_agent: 'My Own Project')
page = client.get('Argentina')
Consider using shortcuts like #wiki, #wikipedia, #wp and so on instead of direct instation of this class (although you can if you want to!)
Defined Under Namespace
Constant Summary collapse
- UA =
Default Infoboxer User-Agent header.
You can set yours as an option to Infoboxer#wiki and its shortcuts, or to #initialize
"Infoboxer/#{Infoboxer::VERSION} "\ '(https://github.com/molybdenum-99/infoboxer; [email protected])'
Class Attribute Summary collapse
-
.user_agent ⇒ String
User agent getter/setter.
Instance Attribute Summary collapse
Instance Method Summary collapse
-
#category(title, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
-
#get(*titles, interwiki: nil, &processor) ⇒ Page, Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided.
-
#get_h(*titles, &processor) ⇒ Hash<String, Page>
Same as #get, but returns hash of
{requested title => page}
. -
#initialize(api_base_url, ua: nil, user_agent: ua) ⇒ MediaWiki
constructor
Creating new MediaWiki client.
- #inspect ⇒ String
-
#prefixsearch(prefix, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix.
-
#raw(*titles, &processor) ⇒ Hash{String => Hash}
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
-
#search(query, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query.
Constructor Details
#initialize(api_base_url, ua: nil, user_agent: ua) ⇒ MediaWiki
Creating new MediaWiki client. Infoboxer#wiki provides shortcut for it, as well as shortcuts for some well-known wikis, like Infoboxer#wikipedia.
57 58 59 60 61 |
# File 'lib/infoboxer/media_wiki.rb', line 57 def initialize(api_base_url, ua: nil, user_agent: ua) @api_base_url = Addressable::URI.parse(api_base_url) @api = MediaWiktory::Wikipedia::Api.new(api_base_url, user_agent: user_agent(user_agent)) @traits = Traits.get(@api_base_url.host, siteinfo) end |
Class Attribute Details
.user_agent ⇒ String
User agent getter/setter.
Default value is UA.
You can also use per-instance option, see #initialize
40 41 42 |
# File 'lib/infoboxer/media_wiki.rb', line 40 def user_agent @user_agent end |
Instance Attribute Details
#api ⇒ MediaWiktory::Wikipedia::Client (readonly)
47 48 49 |
# File 'lib/infoboxer/media_wiki.rb', line 47 def api @api end |
Instance Method Details
#category(title, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
178 179 180 181 182 |
# File 'lib/infoboxer/media_wiki.rb', line 178 def category(title, limit: 'max', &processor) title = normalize_category_title(title) list(@api.query.generator(:categorymembers).title(title), limit, &processor) end |
#get(*titles, interwiki: nil, &processor) ⇒ Page, Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided. All pages are received with single query to MediaWiki API.
NB: if you are requesting more than 50 titles at once
(MediaWiki limitation for single request), Infoboxer will do as
many queries as necessary to extract them all (it will be like
(titles.count / 50.0).ceil
requests)
130 131 132 133 134 135 |
# File 'lib/infoboxer/media_wiki.rb', line 130 def get(*titles, interwiki: nil, &processor) return interwikis(interwiki).get(*titles, &processor) if interwiki pages = get_h(*titles, &processor).values.compact titles.count == 1 ? pages.first : Tree::Nodes[*pages] end |
#get_h(*titles, &processor) ⇒ Hash<String, Page>
Same as #get, but returns hash of {requested title => page}
.
Useful quirks:
- when requested page not existing, key will be still present in
resulting hash (value will be
nil
); - when requested page redirects to another, key will still be the
requested title. For ex.,
get_h('Einstein')
will return hash with key 'Einstein' and page titled 'Albert Einstein'.
This allows you to be in full control of what pages of large list you've received.
157 158 159 160 161 162 |
# File 'lib/infoboxer/media_wiki.rb', line 157 def get_h(*titles, &processor) raw_pages = raw(*titles, &processor) .tap { |ps| ps.detect { |_, p| p['invalid'] }.tap { |_, i| i && fail(i['invalidreason']) } } .reject { |_, p| p.key?('missing') } titles.map { |title| [title, make_page(raw_pages, title)] }.to_h end |
#inspect ⇒ String
222 223 224 |
# File 'lib/infoboxer/media_wiki.rb', line 222 def inspect "#<#{self.class}(#{@api_base_url.host})>" end |
#prefixsearch(prefix, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix. See MediaWiki API docs for details.
217 218 219 |
# File 'lib/infoboxer/media_wiki.rb', line 217 def prefixsearch(prefix, limit: 'max', &processor) list(@api.query.generator(:prefixsearch).search(prefix), limit, &processor) end |
#raw(*titles, &processor) ⇒ Hash{String => Hash}
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/infoboxer/media_wiki.rb', line 75 def raw(*titles, &processor) # could emerge on "automatically" created page lists, should work return {} if titles.empty? titles.each_slice(50).map do |part| request = prepare_request(@api.query.titles(*part), &processor) response = request.response # If additional props are required, there may be additional pages, even despite each_slice(50) response = response.continue while response.continue? sources = response['pages'].values.map { |page| [page['title'], page] }.to_h redirects = if response['redirects'] response['redirects'].map { |r| [r['from'], sources[r['to']]] }.to_h else {} end # This way for 'Einstein' query we'll have {'Albert Einstein' => page, 'Einstein' => same page} sources.merge(redirects) end.inject(:merge) end |
#search(query, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query. See MediaWiki API docs for details.
200 201 202 |
# File 'lib/infoboxer/media_wiki.rb', line 200 def search(query, limit: 'max', &processor) list(@api.query.generator(:search).search(query), limit, &processor) end |