Class: HtmlPageTitle
- Inherits:
-
Object
- Object
- HtmlPageTitle
- Defined in:
- lib/html_page_title.rb
Overview
A simple class for finding the title of a given http url by fetching the given url, following all eventual redirects and finally parsing it through hpricot.
You can either use the shorthand form or initialize the instance properly:
* HtmlPageTitle('http://github.com')
* HtmlPageTitle.new('http://github.com')
Those calls are equivalent, except for one subtle difference: The shorthand form will swallow SocketErrors and return nil (i.e. this will happen for invalid urls), while the regular instantiation via new will throw that error.
You can either get the title, the heading (which will be the content of the first h1 tag in the body) or the label, which will be (in the following order by availability) either the heading, or the title, or the target url after redirecting. Note that if the title or the heading can not be found (e.g. a non-HTML document), both methods will return nil, so the label method is the only one that will always return some kind of string
Instance Attribute Summary collapse
-
#original_url ⇒ Object
readonly
Returns the value of attribute original_url.
Instance Method Summary collapse
-
#body ⇒ Object
Returns the body of the document at the (redirected?) target.
- #document ⇒ Object
-
#heading ⇒ Object
Retrieves the first h1 tag in the page and returns it's content.
-
#initialize(original_url) ⇒ HtmlPageTitle
constructor
A new instance of HtmlPageTitle.
-
#label ⇒ Object
Returns either the heading, or the title, or the url in this order by availability.
-
#redirect ⇒ Object
Returns the redirect follower instance used for resolving this instances url.
- #title ⇒ Object
-
#url ⇒ Object
Returns the target url after all redirects.
Constructor Details
#initialize(original_url) ⇒ HtmlPageTitle
Returns a new instance of HtmlPageTitle.
35 36 37 38 |
# File 'lib/html_page_title.rb', line 35 def initialize(original_url) @original_url = original_url title # retrieve data so exceptions can be thrown end |
Instance Attribute Details
#original_url ⇒ Object (readonly)
Returns the value of attribute original_url.
34 35 36 |
# File 'lib/html_page_title.rb', line 34 def original_url @original_url end |
Instance Method Details
#body ⇒ Object
Returns the body of the document at the (redirected?) target
77 78 79 |
# File 'lib/html_page_title.rb', line 77 def body redirect.body end |
#document ⇒ Object
40 41 42 |
# File 'lib/html_page_title.rb', line 40 def document @document ||= Hpricot(redirect.body) end |
#heading ⇒ Object
Retrieves the first h1 tag in the page and returns it's content
52 53 54 55 56 57 |
# File 'lib/html_page_title.rb', line 52 def heading return @heading if @heading if heading_tag = document.at('body h1') @heading = HTMLEntities.new.decode(heading_tag.inner_html.strip.chomp) end end |
#label ⇒ Object
Returns either the heading, or the title, or the url in this order by availability
61 62 63 |
# File 'lib/html_page_title.rb', line 61 def label heading or title or url end |
#redirect ⇒ Object
Returns the redirect follower instance used for resolving this instances url
67 68 69 |
# File 'lib/html_page_title.rb', line 67 def redirect @redirect = RedirectFollower.new(original_url) end |
#title ⇒ Object
44 45 46 47 48 49 |
# File 'lib/html_page_title.rb', line 44 def title return @title if @title if title_tag = document.at('head title') @title = HTMLEntities.new.decode(title_tag.inner_html.strip.chomp) end end |
#url ⇒ Object
Returns the target url after all redirects
72 73 74 |
# File 'lib/html_page_title.rb', line 72 def url redirect.url end |