Wptemplates
Gem for collecting template informations from mediawiki markup.
It will help you to extract useful machine-readable data from wikipedia articles, since there ist a lot of useful stuff encoded as templates.
Currently only templates and links are parsed, all other markup is ignored.
Installation
Add this line to your application's Gemfile:
gem 'wptemplates', git: 'git://github.com/bxt/wptemplates.git'
And then execute:
$ bundle
The gem is currently not in the rubygems.org repository.
Usage
To parse a piece of markup simply call:
ast = Wptemplates.parse("{{foo | bar | x = 3 }} baz [[bam (2003)|]]y")
You will get an instance of Wptemplates::Soup which is an array of Wptemplates::Template, Wptemplates::Link and Wptemplates::Text. You can explore the AST with these methods:
ast.templates.is_a?(Array) && ast.templates.length # => 1
ast.text # => " baz bamy"
ast[0].name # => :foo
ast[0].params[0].text # => " bar "
ast[0].params[:x].text # => "3"
ast.all_templates_of(:foo).map{|t| t.params[:x].text} # => ["3"]
You can access the links via:
ast.links.length # => 1
ast.links[0].text # => "bamy"
ast.all_links.map{|l| l.link} # => ["Bam (2003)"]
Developing
Here's some useful info if you want to improve/customize this gem.
Getting Started
Checkout the project, run bundle
and then rake
to see if the tests
pass. Run rake -T
to see the rake tasks.
Markup
MediaWiki markup is not trivial to parse and there might always be compatibility issues. There's a useful help page about templates and a markup spec. For links there is a page about links and about the pipe trick. Also, there is a page with link's BNF.
Known Issues
- If you have images in your templates the pipes cause a new parameter
- Namespaced links are not recognized
- Templates in links are not recognized
- Links contents are not htmldecoded
- nowiki, pre and math blocks might cause problems
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request