Omnivore: a library for decrufting HTML documents
Omnivore is a library for extracting "real" content from HTML documents. Currently, the approach is limited to analysing text density to distiguish relevant sections from navigation, advertising, and other non-relevant elements. As such, the results are far from perfect but will hopefully improve as more sophisticated features are added.
INSTALL
sudo gem install omnivore
EXAMPLE
require 'omnivore'
document = Omnivore::Document.from_url('http://www.slashgear.com/sennheiser-hd-700-hands-on-10208572')
puts document.to_text