Module: Infoboxer::Tree
- Included in:
- Parser, Parser::HTML, Parser::Image, Parser::Inline, Parser::Paragraphs, Parser::Table, Parser::Template, Infoboxer::Templates::Base
- Defined in:
- lib/infoboxer/tree.rb,
lib/infoboxer/tree/ref.rb,
lib/infoboxer/tree/html.rb,
lib/infoboxer/tree/list.rb,
lib/infoboxer/tree/math.rb,
lib/infoboxer/tree/node.rb,
lib/infoboxer/tree/text.rb,
lib/infoboxer/tree/image.rb,
lib/infoboxer/tree/nodes.rb,
lib/infoboxer/tree/table.rb,
lib/infoboxer/tree/inline.rb,
lib/infoboxer/tree/gallery.rb,
lib/infoboxer/tree/compound.rb,
lib/infoboxer/tree/document.rb,
lib/infoboxer/tree/linkable.rb,
lib/infoboxer/tree/template.rb,
lib/infoboxer/tree/wikilink.rb,
lib/infoboxer/tree/paragraphs.rb
Overview
Infoboxer provides you with tree structure of the Wikipedia page, which you can introspect and navigate with ease. This tree structure tries to be simple, close to Wikipedia source and logical.
You can always inspect entire page tree yourself:
page = Infoboxer.wp.get('Argentina')
puts page.to_tree
Inspecting and understanding single node
Each tree node is descendant of Node, so you should look at this class to understand what you can do.
Alongside with basic methods, defined in Node class, some useful utility methods are defined in subclasses.
Here's full list of subclasses, representing real nodes, with their respective roles:
- inline markup: Text, Bold, Italic, BoldItalic, Wikilink, ExternalLink, Image;
- embedded HTML: HTMLTag, HTMLOpeningTag, HTMLClosingTag;
- paragraph-level nodes: Heading, Paragraph, Pre, HR;
- lists: OrderedList, UnorderedList, DefinitionList, ListItem, DTerm, DDefinition;
- tables: Table, TableCaption, TableRow, TableHeading, TableCell;
- special elements: Template, Ref.
Tree navigation
Node class has a standard list of methods for traversing tree
upwards, downwards and sideways: children
, parent
, siblings
,
index
. Read through class documentation for their detailed
descriptions.
Navigation module contains more advanced navigational functionality, like XPath-like selectors, friendly shortcuts, breakup of document into logical "sections" and so on.
Most of navigational and other Node's methods return Nodes type,
which is an Array
descendant with additional functionality.
Complex data extraction
Most of uniform, machine-extractable data in Wikipedia is stored in templates and tables. There's entire Templates module, which is documented explaining what you can do about Wikipedia templates, how to understand them and use information. Also, you can look at Table class, which for now is not that powerful, yet allows you to extract some columns and rows.
Also, consider that WIKIpedia is maid of WIKIlinks, and Linkable#follow (as well as Nodes#follow for multiple links at once) is you good friend.
Defined Under Namespace
Modules: HTMLTagCommons, Linkable Classes: BaseCell, BaseParagraph, Bold, BoldItalic, Compound, DDefinition, DTerm, DefinitionList, Document, ExternalLink, Gallery, HR, HTMLClosingTag, HTMLOpeningTag, HTMLTag, Heading, Image, ImageCaption, Italic, Link, List, ListItem, Math, Node, Nodes, OrderedList, Paragraph, Pre, Ref, Table, TableCaption, TableCell, TableHeading, TableRow, Template, Text, UnorderedList, Var, Wikilink