Class: Youtube::SearchResultScraper

Inherits:
Object
  • Object
show all
Defined in:
lib/youtube/searchresultscraper.rb

Overview

Introduction

Youtube::SearchResultScraper scrapes video information from search result page on www.youtube.com.

You can get result as array or xml.

XML format is same as YouTube Developer API (www.youtube.com/dev_api_ref?m=youtube.videos.list_by_tag).

Example

require "rubygems"
require "youtube/searchresultscraper"

scraper = Youtube::SearchResultScraper.new(keyword, page)
scraper.open
scraper.scrape
puts scraper.get_xml

More Information

www.ark-web.jp/sandbox/wiki/184.html (japanese only)

Author

Yuki SHIDA <[email protected]>

Author

Konuma Akio <[email protected]>

Version

0.0.3

License

MIT license

Constant Summary collapse

Relevance =
'relevance'
DateAdded =
'video_date_uploaded'
ViewCount =
'video_view_count'
Rating =
'video_avg_rating'
@@youtube_search_base_url =
"http://www.youtube.com/results?search_query="

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(keyword, page = nil, sort = nil) ⇒ SearchResultScraper

Create Youtube::SearchResultScraper object specifying keyword and number of page.

You cannot specify number of videos per page. Always, the number of videos is 20 per page.

  • keyword - specify keyword that you want to search on YouTube. You must specify keyword encoded by UTF-8.

  • page - specify number of page

  • sort - specify sort rule



86
87
88
89
90
# File 'lib/youtube/searchresultscraper.rb', line 86

def initialize keyword, page=nil, sort=nil
  @keyword = keyword
  @page    = page if not page == nil
  @sort    = sort if not sort == nil
end

Instance Attribute Details

#keywordObject

Returns the value of attribute keyword.



62
63
64
# File 'lib/youtube/searchresultscraper.rb', line 62

def keyword
  @keyword
end

#pageObject

Returns the value of attribute page.



63
64
65
# File 'lib/youtube/searchresultscraper.rb', line 63

def page
  @page
end

#sortObject

Returns the value of attribute sort.



64
65
66
# File 'lib/youtube/searchresultscraper.rb', line 64

def sort
  @sort
end

#video_countObject (readonly)

Returns the value of attribute video_count.



65
66
67
# File 'lib/youtube/searchresultscraper.rb', line 65

def video_count
  @video_count
end

#video_fromObject (readonly)

Returns the value of attribute video_from.



66
67
68
# File 'lib/youtube/searchresultscraper.rb', line 66

def video_from
  @video_from
end

#video_toObject (readonly)

Returns the value of attribute video_to.



67
68
69
# File 'lib/youtube/searchresultscraper.rb', line 67

def video_to
  @video_to
end

Instance Method Details

#eachObject

Iterator for scraped videos.



136
137
138
139
140
# File 'lib/youtube/searchresultscraper.rb', line 136

def each
  @videos.each do |video|
    yield video
  end
end

#get_xmlObject

Return videos information as XML Format.



143
144
145
146
147
148
149
150
151
# File 'lib/youtube/searchresultscraper.rb', line 143

def get_xml
  xml = "<ut_response status=\"ok\">" +
          "<video_count>" + @video_count.to_s +  "</video_count>" +
          "<video_list>\n"
  each do |video|
    xml += video.to_xml
  end
  xml += "</video_list></ut_response>"
end

#openObject

Get search result from youtube by specified keyword.



93
94
95
96
97
98
99
100
# File 'lib/youtube/searchresultscraper.rb', line 93

def open
  @url = @@youtube_search_base_url + CGI.escape(@keyword)
  @url += "&page=#{@page}" if not @page == nil
  @url += "&search_sort=#{@sort}" if not @sort == nil
  @html = Kernel.open(@url).read
  replace_document_write_javascript
  @search_result = Hpricot.parse(@html)
end

#scrapeObject

Scrape video information from search result html.



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/youtube/searchresultscraper.rb', line 103

def scrape
  @videos = []

  @search_result.search("//div[@class='vEntry']").each do |video_html|
    video = Youtube::Video.new
    video.id             = scrape_id(video_html)
    video.author         = scrape_author(video_html)
    video.title          = scrape_title(video_html)
    video.length_seconds = scrape_length_seconds(video_html)
    video.rating_avg     = scrape_rating_avg(video_html)
    video.rating_count   = scrape_rating_count(video_html)
    video.description    = scrape_description(video_html)
    video.view_count     = scrape_view_count(video_html)
    video.thumbnail_url  = scrape_thumbnail_url(video_html)
    video.tags           = scrape_tags(video_html)
    video.upload_time    = scrape_upload_time(video_html)
    video.url            = scrape_url(video_html)

    check_video video

    @videos << video
  end

  @video_count = scrape_video_count
  @video_from  = scrape_video_from
  @video_to    = scrape_video_to

  raise "scraping error" if (is_no_result != @videos.empty?)

  @videos
end