Class: Youtube::SearchResultScraper

Inherits:

Object

Object
Youtube::SearchResultScraper

show all

Defined in:: lib/youtube/searchresultscraper.rb

Overview

Introduction

Youtube::SearchResultScraper scrapes video information from search result page on www.youtube.com.

You can get result as array or xml.

XML format is same as YouTube Developer API (www.youtube.com/dev_api_ref?m=youtube.videos.list_by_tag).

Example

require "rubygems"
require "youtube/searchresultscraper"

scraper = Youtube::SearchResultScraper.new(keyword, page)
scraper.open
scraper.scrape
puts scraper.get_xml

More Information

www.ark-web.jp/sandbox/wiki/184.html (japanese only)

Author: Yuki SHIDA <[email protected]>
Author: Konuma Akio <[email protected]>
Version: 0.0.3
License: MIT license

Constant Summary collapse

Relevance =

'relevance'

DateAdded =

'video_date_uploaded'

ViewCount =

'video_view_count'

Rating =

'video_avg_rating'

@@youtube_search_base_url =

"http://www.youtube.com/results?search_query="

Instance Attribute Summary collapse

#keyword ⇒ Object

Returns the value of attribute keyword.
#page ⇒ Object

Returns the value of attribute page.
#sort ⇒ Object

Returns the value of attribute sort.
#video_count ⇒ Object readonly

Returns the value of attribute video_count.
#video_from ⇒ Object readonly

Returns the value of attribute video_from.
#video_to ⇒ Object readonly

Returns the value of attribute video_to.

Instance Method Summary collapse

#each ⇒ Object

Iterator for scraped videos.
#get_xml ⇒ Object

Return videos information as XML Format.
#initialize(keyword, page = nil, sort = nil) ⇒ SearchResultScraper constructor

Create Youtube::SearchResultScraper object specifying keyword and number of page.
#open ⇒ Object

Get search result from youtube by specified keyword.
#scrape ⇒ Object

Scrape video information from search result html.

Constructor Details

#initialize(keyword, page = nil, sort = nil) ⇒ `SearchResultScraper`

Create Youtube::SearchResultScraper object specifying keyword and number of page.

You cannot specify number of videos per page. Always, the number of videos is 20 per page.

keyword - specify keyword that you want to search on YouTube. You must specify keyword encoded by UTF-8.
page - specify number of page
sort - specify sort rule

# File 'lib/youtube/searchresultscraper.rb', line 86

def initialize keyword, page=nil, sort=nil
  @keyword = keyword
  @page    = page if not page == nil
  @sort    = sort if not sort == nil
end

Instance Attribute Details

#keyword ⇒ `Object`

Returns the value of attribute keyword.



62
63
64

# File 'lib/youtube/searchresultscraper.rb', line 62

def keyword
  @keyword
end

#page ⇒ `Object`

Returns the value of attribute page.



63
64
65

# File 'lib/youtube/searchresultscraper.rb', line 63

def page
  @page
end

#sort ⇒ `Object`

Returns the value of attribute sort.



64
65
66

# File 'lib/youtube/searchresultscraper.rb', line 64

def sort
  @sort
end

#video_count ⇒ `Object` (readonly)

Returns the value of attribute video_count.



65
66
67

# File 'lib/youtube/searchresultscraper.rb', line 65

def video_count
  @video_count
end

#video_from ⇒ `Object` (readonly)

Returns the value of attribute video_from.



66
67
68

# File 'lib/youtube/searchresultscraper.rb', line 66

def video_from
  @video_from
end

#video_to ⇒ `Object` (readonly)

Returns the value of attribute video_to.



67
68
69

# File 'lib/youtube/searchresultscraper.rb', line 67

def video_to
  @video_to
end

Instance Method Details

#each ⇒ `Object`

Iterator for scraped videos.

# File 'lib/youtube/searchresultscraper.rb', line 136

def each
  @videos.each do |video|
    yield video
  end
end

#get_xml ⇒ `Object`

Return videos information as XML Format.

# File 'lib/youtube/searchresultscraper.rb', line 143

def get_xml
  xml = "<ut_response status=\"ok\">" +
          "<video_count>" + @video_count.to_s +  "</video_count>" +
          "<video_list>\n"
  each do |video|
    xml += video.to_xml
  end
  xml += "</video_list></ut_response>"
end

#open ⇒ `Object`

Get search result from youtube by specified keyword.

# File 'lib/youtube/searchresultscraper.rb', line 93

def open
  @url = @@youtube_search_base_url + CGI.escape(@keyword)
  @url += "&page=#{@page}" if not @page == nil
  @url += "&search_sort=#{@sort}" if not @sort == nil
  @html = Kernel.open(@url).read
  replace_document_write_javascript
  @search_result = Hpricot.parse(@html)
end

#scrape ⇒ `Object`

Scrape video information from search result html.

# File 'lib/youtube/searchresultscraper.rb', line 103

def scrape
  @videos = []

  @search_result.search("//div[@class='vEntry']").each do |video_html|
    video = Youtube::Video.new
    video.id             = scrape_id(video_html)
    video.author         = scrape_author(video_html)
    video.title          = scrape_title(video_html)
    video.length_seconds = scrape_length_seconds(video_html)
    video.rating_avg     = scrape_rating_avg(video_html)
    video.rating_count   = scrape_rating_count(video_html)
    video.description    = scrape_description(video_html)
    video.view_count     = scrape_view_count(video_html)
    video.thumbnail_url  = scrape_thumbnail_url(video_html)
    video.tags           = scrape_tags(video_html)
    video.upload_time    = scrape_upload_time(video_html)
    video.url            = scrape_url(video_html)

    check_video video

    @videos << video
  end

  @video_count = scrape_video_count
  @video_from  = scrape_video_from
  @video_to    = scrape_video_to

  raise "scraping error" if (is_no_result != @videos.empty?)

  @videos
end

Class: Youtube::SearchResultScraper

Overview

Introduction

Example

More Information

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(keyword, page = nil, sort = nil) ⇒ SearchResultScraper

Instance Attribute Details

#keyword ⇒ Object

#page ⇒ Object

#sort ⇒ Object

#video_count ⇒ Object (readonly)

#video_from ⇒ Object (readonly)

#video_to ⇒ Object (readonly)