Korfzone::Scraper

Code Climate

Using this scraper you can extract all information about korfball games from the korfbal.be website.

The game information extracted using this tool is not linked. For example, to list all games for one club you will have to map all team names to a club. The Korfzone API offers a linked version of the data scraped from the KBKB website. It also hosts data from previous seasons.

This project is extracted from the Korfzone code base. The korfbal.be website uses very little to no semantic HTML. The scraper is therefore very brittle. If the developers decide to add a column to the games table or they change the class of the rows, this code will break.

Korfzone and the Korfzone Scraper are not affiliated with the KBKB in any way.

Installation

Add this line to your application's Gemfile:

gem 'korfzone-scraper'

And then execute:

$ bundle

Or install it yourself as:

$ gem install korfzone-scraper

Usage

The scraper has two main functions: Firstly finding the uri's to all pages containing the game information and secondly extracting that information from those pages. The korfbal.be website does not offer an easy to use single point of entry from which to scrape all uri's. The links to the relevant pages themselves are also scattered across a number of pages. Below are examples on how to find the relevant uri's and how to extract games from a single uri.

Scraping the uri's to all pages for a certain category

entry_points = Korfzone::Scraper::Page.for_category :senioren
entry_points.each do |page|
  puts page.block_uris.map { |uri| uri.to_s }.join( "\n" )
end

Scraping the games of an individual page

page = Korfzone::Scraper::Page.new 'http://www.korfbal.be/beta/Wedstrijden/senioren/veld/V01'
page.games do |game|
  puts game[ :starts_at ]
  puts game[ :teams ].join( ' - ' )
  puts game[ :location ]
  puts "=" * 80
end

ToDo list

  1. Handle errors. The korfbal.be site occasionally returns status code 500. The scraper should handle this.
  2. Check support for E-tags on the korfbal.be website. If they are supported, it would make sense to support them in the scraper as well.
  3. Write a binary, probably based on Thor to facilitate scraping.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Consider also contributing to the overall Korfzone project.