Module: Infoboxer::Tree

Included in:: Parser, Parser::HTML, Parser::Image, Parser::Inline, Parser::Paragraphs, Parser::Table, Parser::Template, Infoboxer::Templates::Base

Defined in:: lib/infoboxer/tree.rb,
lib/infoboxer/tree/ref.rb,
lib/infoboxer/tree/html.rb,
lib/infoboxer/tree/list.rb,
lib/infoboxer/tree/math.rb,
lib/infoboxer/tree/node.rb,
lib/infoboxer/tree/text.rb,
lib/infoboxer/tree/image.rb,
lib/infoboxer/tree/nodes.rb,
lib/infoboxer/tree/table.rb,
lib/infoboxer/tree/inline.rb,
lib/infoboxer/tree/gallery.rb,
lib/infoboxer/tree/compound.rb,
lib/infoboxer/tree/document.rb,
lib/infoboxer/tree/linkable.rb,
lib/infoboxer/tree/template.rb,
lib/infoboxer/tree/wikilink.rb,
lib/infoboxer/tree/paragraphs.rb

Overview

Infoboxer provides you with tree structure of the Wikipedia page, which you can introspect and navigate with ease. This tree structure tries to be simple, close to Wikipedia source and logical.

You can always inspect entire page tree yourself:

page = Infoboxer.wp.get('Argentina')
puts page.to_tree

Inspecting and understanding single node

Each tree node is descendant of Node, so you should look at this class to understand what you can do.

Alongside with basic methods, defined in Node class, some useful utility methods are defined in subclasses.

Here's full list of subclasses, representing real nodes, with their respective roles:

inline markup: Text, Bold, Italic, BoldItalic, Wikilink, ExternalLink, Image;
embedded HTML: HTMLTag, HTMLOpeningTag, HTMLClosingTag;
paragraph-level nodes: Heading, Paragraph, Pre, HR;
lists: OrderedList, UnorderedList, DefinitionList, ListItem, DTerm, DDefinition;
tables: Table, TableCaption, TableRow, TableHeading, TableCell;
special elements: Template, Ref.

Node class has a standard list of methods for traversing tree upwards, downwards and sideways: children, parent, siblings, index. Read through class documentation for their detailed descriptions.

Navigation module contains more advanced navigational functionality, like XPath-like selectors, friendly shortcuts, breakup of document into logical "sections" and so on.

Most of navigational and other Node's methods return Nodes type, which is an Array descendant with additional functionality.

Complex data extraction

Most of uniform, machine-extractable data in Wikipedia is stored in templates and tables. There's entire Templates module, which is documented explaining what you can do about Wikipedia templates, how to understand them and use information. Also, you can look at Table class, which for now is not that powerful, yet allows you to extract some columns and rows.

Also, consider that WIKIpedia is maid of WIKIlinks, and Linkable#follow (as well as Nodes#follow for multiple links at once) is you good friend.

Defined Under Namespace

Modules: HTMLTagCommons, Linkable Classes: BaseCell, BaseParagraph, Bold, BoldItalic, Compound, DDefinition, DTerm, DefinitionList, Document, ExternalLink, Gallery, HR, HTMLClosingTag, HTMLOpeningTag, HTMLTag, Heading, Image, ImageCaption, Italic, Link, List, ListItem, Math, Node, Nodes, OrderedList, Paragraph, Pre, Ref, Table, TableCaption, TableCell, TableHeading, TableRow, Template, Text, UnorderedList, Var, Wikilink

Module: Infoboxer::Tree

Overview

Inspecting and understanding single node

Tree navigation

Complex data extraction

Defined Under Namespace