Wptemplates

Build Status

Gem for collecting template informations from mediawiki markup.

It will help you to extract useful machine-readable data from wikipedia articles, since there ist a lot of useful stuff encoded as templates.

Currently only templates and links are parsed, all other markup is ignored.

Installation

Add this line to your application's Gemfile:

gem 'wptemplates', git: 'git://github.com/bxt/wptemplates.git'

And then execute:

$ bundle

The gem is currently not in the rubygems.org repository.

Usage

To parse a piece of markup simply call:

ast = Wptemplates.parse("{{foo | bar | x = 3 }} baz [[bam (2003)|]]y")

You will get an instance of Wptemplates::Soup which is an array of Wptemplates::Template, Wptemplates::Link and Wptemplates::Text. You can explore the AST with these methods:

ast.templates.is_a?(Array) && ast.templates.length # => 1
ast.text # => " baz bamy"
ast[0].name # => :foo
ast[0].params[0].text # => " bar "
ast[0].params[:x].text # => "3"
ast.all_templates_of(:foo).map{|t| t.params[:x].text} # => ["3"]

You can access the links via:

ast.links.length # => 1
ast.links[0].text # => "bamy"
ast.all_links.map{|l| l.link} # => ["Bam (2003)"]

Developing

Here's some useful info if you want to improve/customize this gem.

Getting Started

Checkout the project, run bundle and then rake to see if the tests pass. Run rake -T to see the rake tasks.

Markup

MediaWiki markup is not trivial to parse and there might always be compatibility issues. There's a useful help page about templates and a markup spec. For links there is a page about links and about the pipe trick. Also, there is a page with link's BNF.

Known Issues

  • If you have images in your templates the pipes cause a new parameter
  • Namespaced links are not recognized
  • Templates in links are not recognized
  • Links contents are not htmldecoded
  • nowiki, pre and math blocks might cause problems

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request