Class: WikiParser::Page
- Inherits:
-
Object
- Object
- WikiParser::Page
- Defined in:
- lib/wikiParserPage.rb
Overview
A Wikipedia article page object.
Constant Summary collapse
- Namespaces =
The Wikipedia namespaces for all special pages #special_page, #page_type.
%w(WP Aide Help Talk User Template Wikipedia File Book Portal Portail TimedText Module MediaWiki Special Spécial Media Category Catégorie [^:]+)
- Disambiguation =
["disambiguation","homonymie", "значения", "disambigua", "peker", "ujednoznacznienie", "olika betydelser", "Begriffsklärung", "desambiguación"]
Instance Attribute Summary collapse
-
#article ⇒ Object
readonly
the content of the Wikipedia article.
-
#disambiguation_page ⇒ Object
readonly
Returns the value of attribute disambiguation_page.
-
#id ⇒ Object
readonly
The Wikipedia id of the article.
-
#internal_links ⇒ Object
readonly
Returns the value of attribute internal_links.
-
#language ⇒ Object
readonly
Returns the value of attribute language.
-
#page_type ⇒ Object
readonly
the wikipedia namespace for this page.
-
#redirect ⇒ Object
readonly
is this page a redirection page?.
-
#redirect_title ⇒ Object
readonly
the title of the page this article redirects to.
-
#special_page ⇒ Object
readonly
is this page ‘special`? Is it in the Namespaces?.
-
#title ⇒ Object
readonly
Title of the Wikipedia article.
Instance Method Summary collapse
-
#article_to_internal_links(article) ⇒ Array<Hash>
Extracts internals links from a wikipedia article into an array of ‘uri`s and `title`s:.
-
#finish_processing ⇒ WikiParser::Page
Extracts internals links from a wikipedia article into an array of ‘uri`s and `title`s, starting from the stopping point given to the parser earlier.
-
#initialize(opts = {}) ⇒ Page
constructor
Create a new article page from an XML node.
Constructor Details
#initialize(opts = {}) ⇒ Page
Create a new article page from an XML node.
34 35 36 37 38 39 40 41 42 43 |
# File 'lib/wikiParserPage.rb', line 34 def initialize (opts={}) @language = opts[:language] @title = @article = @redirect_title = "" @redirect = @special_page = @disambiguation_page = false @internal_links, @page_type = [], nil return unless !opts[:node].nil? process_node opts trigs = article_to_internal_links(@article) @internal_links = trigs end |
Instance Attribute Details
#article ⇒ Object (readonly)
the content of the Wikipedia article
16 17 18 |
# File 'lib/wikiParserPage.rb', line 16 def article @article end |
#disambiguation_page ⇒ Object (readonly)
Returns the value of attribute disambiguation_page.
25 26 27 |
# File 'lib/wikiParserPage.rb', line 25 def disambiguation_page @disambiguation_page end |
#id ⇒ Object (readonly)
The Wikipedia id of the article.
13 14 15 |
# File 'lib/wikiParserPage.rb', line 13 def id @id end |
#internal_links ⇒ Object (readonly)
Returns the value of attribute internal_links.
14 15 16 |
# File 'lib/wikiParserPage.rb', line 14 def internal_links @internal_links end |
#language ⇒ Object (readonly)
Returns the value of attribute language.
26 27 28 |
# File 'lib/wikiParserPage.rb', line 26 def language @language end |
#page_type ⇒ Object (readonly)
the wikipedia namespace for this page
22 23 24 |
# File 'lib/wikiParserPage.rb', line 22 def page_type @page_type end |
#redirect ⇒ Object (readonly)
is this page a redirection page?
18 19 20 |
# File 'lib/wikiParserPage.rb', line 18 def redirect @redirect end |
#redirect_title ⇒ Object (readonly)
the title of the page this article redirects to.
20 21 22 |
# File 'lib/wikiParserPage.rb', line 20 def redirect_title @redirect_title end |
#special_page ⇒ Object (readonly)
is this page ‘special`? Is it in the Namespaces?
24 25 26 |
# File 'lib/wikiParserPage.rb', line 24 def special_page @special_page end |
#title ⇒ Object (readonly)
Title of the Wikipedia article.
11 12 13 |
# File 'lib/wikiParserPage.rb', line 11 def title @title end |
Instance Method Details
#article_to_internal_links(article) ⇒ Array<Hash>
Extracts internals links from a wikipedia article into an array of ‘uri`s and `title`s:
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/wikiParserPage.rb', line 88 def article_to_internal_links article links = [] matches = article.scan(/\[\[(?<name>[^\]\|:]+)(?<trigger>\|[^\]]+)?\]\]/) if matches matches.each do |match| name_match = match[0].strip.chomp.match(/^(?<name>[^#]+)(?<hashtag>#.+)?/) link_match = match[1] ? match[1].strip.chomp.match(/^\|[\t\n\s\/]*(?<name>[^#]+)(?<hashtag>#.+)?/) : name_match if name_match name_match = name_match[:name].gsub('_', ' ') link_match = link_match ? link_match[:name] : name_match links << {:uri => name_match, :title => {@language => link_match}} end end end links end |
#finish_processing ⇒ WikiParser::Page
Extracts internals links from a wikipedia article into an array of ‘uri`s and `title`s, starting from the stopping point given to the parser earlier.
76 77 78 79 80 81 82 83 |
# File 'lib/wikiParserPage.rb', line 76 def finish_processing @stop_index||= 0 process_node :node => @node, :from => @stop_index @node = nil trigs = article_to_internal_links(@article) @internal_links = trigs self end |