Class: Hermaeus::Client
- Inherits:
-
Object
- Object
- Hermaeus::Client
- Defined in:
- lib/hermaeus/client.rb
Overview
Public: Wraps a reddit client for access to reddit’s API, and provides methods for downloading posts from reddit.
Constant Summary collapse
Instance Method Summary collapse
-
#get_fullnames(data, **opts) ⇒ Object
Public: Transforms a list of raw reddit links (“/r/SUB/comments/ID/NAME”) into their reddit fullname (“t3_ID”).
-
#get_global_listing(**opts) ⇒ Object
Public: Scrapes the Compilation full index.
-
#get_posts(fullnames, &block) ⇒ Object
Public: Collects posts from reddit.
-
#get_weekly_listing(ids, **opts) ⇒ Object
Public: Scrapes a Weekly Community Thread patch index.
-
#initialize ⇒ Client
constructor
Public: Connects the Hermaeus::Client to reddit.
Constructor Details
#initialize ⇒ Client
Public: Connects the Hermaeus::Client to reddit.
18 19 20 21 22 23 24 |
# File 'lib/hermaeus/client.rb', line 18 def initialize Config.validate! cfg = Config.info[:client] @client = Redd.it(cfg.delete(:type).to_sym, *cfg.values, user_agent: USER_AGENT) @client. @html_filter = HTMLEntities.new end |
Instance Method Details
#get_fullnames(data, **opts) ⇒ Object
Public: Transforms a list of raw reddit links (“/r/SUB/comments/ID/NAME”) into their reddit fullname (“t3_ID”).
data - A String Array such as that returned by get_global_listing.
Optional parameters:
regex: A Regular Expression used to match the reddit ID out of a link.
Returns a String Array containing the reddit fullnames harvested from the input list. Input elements that do not match are stripped.
65 66 67 68 69 70 71 72 73 |
# File 'lib/hermaeus/client.rb', line 65 def get_fullnames data, **opts # TODO: Move this regex to the configuration file. regex = opts[:regex] || %r(/r/.+/(comments/)?(?<id>[0-9a-z]+)/.+) data.map do |item| m = item.match regex "t3_#{m[:id]}" if m end .reject { |item| item.nil? } end |
#get_global_listing(**opts) ⇒ Object
Public: Scrapes the Compilation full index.
Wraps Client#scrape_index; see it for documentation.
29 30 31 |
# File 'lib/hermaeus/client.rb', line 29 def get_global_listing **opts scrape_index Config.info[:index][:path], opts end |
#get_posts(fullnames, &block) ⇒ Object
Public: Collects posts from reddit.
fullnames - A String Array of reddit fullnames (“tNUM_ID”, following reddit documentation) to query.
Yields a sequence of Hashes, each describing a reddit post.
Returns an Array of the response bodies from the reddit call(s).
Examples
get_posts get_fullnames get_global_listing do |post|
puts post[:selftext] # Prints the Markdown source of each post
end
> returns an array of hashes, each of which includes an array of posts.
90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/hermaeus/client.rb', line 90 def get_posts fullnames, &block ret = [] # reddit has finite limits on acceptable query sizes. Split the list into # manageable portions fullnames.each_slice(100).each do |chunk| # Assemble the list of reddit objects being queried query = "/by_id/#{chunk.join(",")}.json" response = scrape_posts query, &block ret << response.body end ret end |
#get_weekly_listing(ids, **opts) ⇒ Object
Public: Scrapes a Weekly Community Thread patch index.
ids - A String Array of reddit post IDs for Weekly Community Threads.
Examples:
get_weekly_listing “56j7pq” # Targets one Community Thread get_weekly_listing “56j7pq”, “55erkr” # Targets two Community Threads get_weekly_listing “55erkr”, css: “td:last-child a” # Custom CSS selector
Wraps Client#scrape_index; see it for documentation.
44 45 46 47 48 49 50 51 52 |
# File 'lib/hermaeus/client.rb', line 44 def get_weekly_listing ids, **opts ids.map! do |id| "t3_#{id}" unless id.match /^t3_/ end # TODO: Ensure that this is safe (only query <= 100 IDs at a time), and # call the scraper multiple times and reassemble output if necessary. query = "/by_id/#{ids.join(",")}" scrape_index query, opts end |