Module: XboxLive::Scraper
- Defined in:
- lib/xbox_live/scraper.rb
Overview
Scraper is a collection of methods to log into the Xbox Live web site and retrieve web pages.
The only public function is XboxLive::Scraper.get_page(url)
Class Method Summary collapse
-
.agent ⇒ Object
Create and memoize the Mechanize agent.
-
.get_page(url) ⇒ Object
Load a page from Xbox Live and return a Mechanize/Nokogiri page TODO: cache pages for some time to prevent duplicative HTTP activity.
-
.log(message) ⇒ Object
Write out a log entry.
-
.login(page) ⇒ Object
Log in to Xbox Live using the supplied login page.
-
.login_page?(page) ⇒ Boolean
Check to see if the provided page the Xbox Live login page.
-
.post_page(url, params) ⇒ Object
POST a page to Xbox Live and return the result.
-
.safe_get(page) ⇒ Object
Get a page, but catch any errors so processing can continue.
Class Method Details
.agent ⇒ Object
Create and memoize the Mechanize agent
141 142 143 144 |
# File 'lib/xbox_live/scraper.rb', line 141 def self.agent log " Initializing mechanize agent @ #{Time.now.to_s}" if !defined? @@agent @@agent ||= Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' } end |
.get_page(url) ⇒ Object
Load a page from Xbox Live and return a Mechanize/Nokogiri page TODO: cache pages for some time to prevent duplicative HTTP activity
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/xbox_live/scraper.rb', line 16 def self.get_page(url) log "Loading page #{url}." # Check to see if there is a recent version of the page in cache if @cache[url] log " Found page in cache." return @cache[url][:page] if Time.now - @cache[url][:updated_at] < XboxLive.[:refresh_age] log " but the cached page is stale." end # Load the specified page via Mechanize log " Getting page from Xbox Live." page = safe_get(url) # Most pages require authentication. If the Mechanize agent has # not logged in yet, or if the session has expired, it will be # redirected to the Xbox Live login page. if login_page?(page) # Log the agent in via the returned login page. log " Page load failed - not signed in." page = login(page) # The login SHOULD have returned the original page requested, # but the URL will be the POST URL, so there is no way to be # certain. Therefore, it is safest to just load the page again # now that the Mechanize agent has logged in. log " Retrying page #{url}" page = safe_get(url) end if page.nil? or page.title.match /Error/ log " ERROR: failed to load page. Trying again." page = safe_get(url) if page.nil? or page.title.match /Error/ log " ERROR: failed on second try. Aborting." return nil else log " SUCCESS: page loaded on retry." end end if page.uri.to_s != url log " ERROR: loaded page URL does not match expected URL. Loaded: #{page.uri.to_s}" return nil end log " Loaded page '#{page.title.strip}'. Storing in cache." @cache[url] = { page: page, updated_at: Time.now } page end |
.log(message) ⇒ Object
Write out a log entry
147 148 149 |
# File 'lib/xbox_live/scraper.rb', line 147 def self.log() puts if XboxLive.[:debug] end |
.login(page) ⇒ Object
Log in to Xbox Live using the supplied login page.
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/xbox_live/scraper.rb', line 88 def self.login(page) return nil if !login_page?(page) # Find the URL where the login form should be POSTed to. url = page.body.match(/srf_uPost='([^']+)/)[1] if url.empty? log " ERROR: Trying to log in but 'Sign In' page doesn't contain needed info." return nil end # PPFT appears to be some kind of session identifier which is # required for the login process. ppft_html = page.body.match(/srf_sFT='([^']+)/)[1] ppft = ppft_html.match(/value="([^"]+)/)[1] # The rest of the parameters are either user-provided (i.e. # username and password) or are constants. params = { 'login' => XboxLive.[:username], 'passwd' => XboxLive.[:password], 'type' => '11', 'LoginOptions' => '3', 'NewUser' => '1', 'PPSX' => 'Passpor', 'PPFT' => ppft, 'idshbo' => '1' } # POST the login form and hope for the best. log " Submitting login form via POST" page = agent.post(url, params) # The login will fail and return a page saying that Javascript must be # enabled. However, there is a hidden form in the page that can be # submitted to enable non-javascript support. form = page.form('fmHF') if form.nil? log " ERROR: The non-JS login page doesn't contain form fmHF." return nil end # Submitting the form on the Javascript error page completes the # login process, and SHOULD return the originally requested page. log " Submitting final non-JS login form" agent.submit(form) end |
.login_page?(page) ⇒ Boolean
Check to see if the provided page the Xbox Live login page.
136 137 138 |
# File 'lib/xbox_live/scraper.rb', line 136 def self.login_page?(page) page and page.title == "Welcome to Windows Live" end |
.post_page(url, params) ⇒ Object
POST a page to Xbox Live and return the result.
68 69 70 71 72 |
# File 'lib/xbox_live/scraper.rb', line 68 def self.post_page(url, params) log "POSTing page #{url} with params #{params}." page = agent.post(url, params) page end |
.safe_get(page) ⇒ Object
Get a page, but catch any errors so processing can continue
78 79 80 81 82 83 84 85 |
# File 'lib/xbox_live/scraper.rb', line 78 def self.safe_get(page) begin return agent.get(page) # rescue Errno::ETIMEDOUT, Timeout::Error, Mechanize::ResponseCodeError rescue return nil end end |