Class: GoodNews::Scraper

Inherits:
Object
  • Object
show all
Defined in:
lib/good_news/scraper.rb

Constant Summary collapse

HOMEPAGEURL =

A constant to store the homepage.

"https://www.goodnewsnetwork.org/category/news/"

Class Method Summary collapse

Class Method Details

.get_articlesObject

This method is used to get and store each topic’s articles. Calls Topic’s @@all Class variable array to loop through Topic objects. Instantiates new Article object. Saves article’s web address and title to Article object. Pushes Article object into the Topic object’s articles attribute(an array).



28
29
30
31
32
33
34
35
36
37
38
# File 'lib/good_news/scraper.rb', line 28

def self.get_articles
    GoodNews::Topic.all.each do |topic|
        doc = self.get_page(topic.web_addr)
        doc.css("h3.entry-title a").each do |info| 
            new_article = GoodNews::Article.new
            new_article.web_addr = info.attribute("href").value
            new_article.title = info.text
            topic.articles.push(new_article) 
        end
    end
end

.get_page(url) ⇒ Object

Uses open-uri and nokogiri to grab and parse the HTML. Returns the parsed page in a array which sets it up for a search using CSS selectors.



7
8
9
# File 'lib/good_news/scraper.rb', line 7

def self.get_page(url)
    return Nokogiri::HTML(open(url))
end

.get_topicsObject

This method grabs Topics and stores them. Uses Class method #get_page and saves to doc. Instantiates a Topic object and stores the topic name and web address in the Topic object. Saves each Topic object in the Topic Class variable @@all using the #save method.



14
15
16
17
18
19
20
21
22
# File 'lib/good_news/scraper.rb', line 14

def self.get_topics
    doc = self.get_page(HOMEPAGEURL)
    doc.css("ul.td-category a").each do |topic|
        new_topic = GoodNews::Topic.new
        new_topic.name = topic.text
        new_topic.web_addr = topic.attribute("href").value
        new_topic.save
    end
end