Mida
Description
A Microdata parser and extractor library for ruby. This is based on the latest Published version of the Microdata Specification dated 5th April 2011.
Installation
Mida keeps RubyGems up-to-date with its latest version, so installing is as easy as:
gem install mida
Requirements:
-
Nokogiri
Usage
The following examples assume that you have required mida
and open-uri
.
Extracting Microdata from a page
All the Microdata is extracted from a page when a new Mida::Document
instance is created.
To extract all the Microdata from a webpage:
url = 'http://example.com'
open(url) {|f| doc = Mida::Document.new(f, url)}
The top-level Items
will be held in an array accessible via doc.items
.
To simply list all the top-level Items
that have been found:
puts doc.items
Searching
If you want to search for an Item
that has a specific itemtype
/vocabulary this can be done with the search
method.
To return all the Items
that use one of Google’s Review vocabularies:
doc.search(%r{http://data-vocabulary\.org.*?review.*?}i)
Inspecting an Item
Each Item
is a Mida::Item
instance and has four main methods of interest: type
, vocabulary
, properties
and id
.
To find out the itemtype
of the Item
:
puts doc.items.first.type
To find out the itemid
of the Item
:
puts doc.items.first.id
Properties are returned as a hash containing name/values pairs. The values will be an array of either String
or Mida::Item
instances.
To see the properties
of the Item
:
puts doc.items.first.properties
Working with Vocabularies
Mida allows you to define vocabularies, so that input data can be constrained to match expected patterns. By default a generic vocabulary (Mida::GenericVocabulary
) is registered which will match against any itemtype
with any number of properties.
If you want to specify a vocabulary you create a class derived from Mida::Vocabulary
and use itemtype
, has_one
, has_many
and extract
to describe the vocabulary.
As an example the following describes a subset of Google’s Review vocabulary:
class Rating < Mida::Vocabulary
itemtype %r{http://data-vocabulary.org/rating}i
has_one 'best'
has_one 'worst'
has_one 'value'
end
class Review < Mida::Vocabulary
itemtype %r{http://data-vocabulary.org/review}i
has_one 'itemreviewed'
has_one 'rating' do
extract Rating, Mida::DataType::Text
end
end
When you create a subclass of Mida::Vocabulary
it automatically registers the Vocabulary.
Now if Mida is parsing some input and manages to match against the Review
itemtype
, it will only allow the specified properties and will reject any that don’t have the correct number. It will also set Item#vocabulary
accordingly, e.g.
doc.items.first.vocabulary # => Review
Bugs/Feature Requests
If you find a bug or want to make a feature request, please report it at the Mida project’s issues tracker on github.
License
Copyright © 2011 Lawrence Woodman. This software is licensed under the MIT License. Please see the file, LICENSE.rdoc, for details.