Class: CourseScraper::Catalonia
- Inherits:
-
Object
- Object
- CourseScraper::Catalonia
- Includes:
- Capybara::DSL
- Defined in:
- lib/course_scraper/catalonia.rb
Overview
Public: A scraper for all the vocational training courses in Catalonia.
Examples
courses = Catalonia.scrape
# => [#<CourseScraper::Course ...>, #<CourseScraper::Course ...>]
Class Method Summary collapse
-
.scrape ⇒ Object
Public: Instantiates a new scraper and fires it to grab all the vocational training courses in Catalonia.
Instance Method Summary collapse
-
#each_category(&block) ⇒ Object
Internal: Call a block for every category.
-
#each_course(category_url, &block) ⇒ Object
Internal: Call a block for every course in a category URL.
-
#scrape ⇒ Object
Public: Scrapes the vocational training courses in Catalonia.
-
#setup_capybara ⇒ Object
Internal: Sets the configuration for capybara to work with the Gencat website.
-
#visit_category_list ⇒ Object
Internal: Visits the main page where the course categories are listed.
Class Method Details
.scrape ⇒ Object
Public: Instantiates a new scraper and fires it to grab all the vocational training courses in Catalonia.
Returns the Array collection of CourseScraper::Course instances.
20 21 22 |
# File 'lib/course_scraper/catalonia.rb', line 20 def self.scrape new.scrape end |
Instance Method Details
#each_category(&block) ⇒ Object
Internal: Call a block for every category.
Yields the String name of the category and its String URL.
Returns nothing.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/course_scraper/catalonia.rb', line 67 def each_category(&block) visit_category_list links = [] within ".FW_bColEsquerraCos" do links += all('a') end within ".FW_bColDretaCos" do links += all('a') end links.each do |link| block.call link.text, link[:href] end end |
#each_course(category_url, &block) ⇒ Object
Internal: Call a block for every course in a category URL.
category_url - the String category URL
Yields the String name of the course and its Symbol type.
Returns nothing.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/course_scraper/catalonia.rb', line 92 def each_course(category_url, &block) visit category_url courses = [] all('.CEDU_vertical').each do |ul| type = ul.find('li:nth-of-type(2)').text =~ /superior/ ? :high : :medium courses << [ul.find('li.titol').text, type] end courses.each do |course| block.call *course end end |
#scrape ⇒ Object
Public: Scrapes the vocational training courses in Catalonia.
Returns the Array collection of CourseScraper::Category instances with nested Courses.
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/course_scraper/catalonia.rb', line 38 def scrape categories = [] each_category do |name, href| categories << { name: name, url: href } end categories.map do |category| cat = Category.new(category[:name], []) each_course category[:url] do |name, type| cat.courses << Course.new(name, type) end cat end end |
#setup_capybara ⇒ Object
Internal: Sets the configuration for capybara to work with the Gencat website.
Returns nothing.
28 29 30 31 32 |
# File 'lib/course_scraper/catalonia.rb', line 28 def Capybara.run_server = false Capybara.current_driver = :webkit Capybara.app_host = 'http://www20.gencat.cat' end |
#visit_category_list ⇒ Object
Internal: Visits the main page where the course categories are listed.
Returns nothing.
58 59 60 |
# File 'lib/course_scraper/catalonia.rb', line 58 def visit_category_list visit "/portal/site/queestudiar/menuitem.d7cfc336363a7af8e85c7273b0c0e1a0/?vgnextoid=0a8137a9f4f2b210VgnVCM2000009b0c1e0aRCRD&vgnextchannel=0a8137a9f4f2b210VgnVCM2000009b0c1e0aRCRD&vgnextfmt=default" end |