Image Downloader
Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.
Installation
sudo gem install image_downloader
Requirements
-
ruby 1.8 or 1.9
-
gem nokogiri
Description
Image Downloader is a rather simple library which does the following:
-
get web page (with Net::HTTP)
-
parse html page (use regexp or nokogiri)
-
download images (in one or multi-threads)
Example usage
After installation, you can use the following code as an example:
require 'rubygems'
require 'image_downloader'
page_url = 'www.test.com'
target_path = 'img_dir/'
downloader = ImageDownloader::Process.new(page_url,target_path)
#####
# download all images on page in any place (by regexp, all that look like url with image)
downloader.parse(:any_looks_like_image => true)
##### or
# download images from all elements where usually images placed (<img...>, <a...>, ...)
downloader.parse()
##### or
# download image from exect places in page
downloader.parse(:collect => {:link_icon => true})
##### or
# download images by regexp
downloader.parse(:regexp => /[^'"]+\.jpg/i)
downloader.download()
For “parse” method available following options
# find all url which contain image extansion
:any_looks_like_image => true
# find images in specified location
:collect => {
:all => true, # all image places
:(img_src|a_href|style_url|link_icon) => true # specified location
}
# find by regexp
:regexp => /['"]([^'"]+\.jpg)[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method)
:regexp => /[^'"]+\.jpg/i # the same, but shorter
:regexp => /[^'"]+\.css/ # other files can also be downloaded
# ignore URLs with images according to given parameters
:ignore_without => {:(extension|image_extension) => true}
# setting the favorite User-Agent (vary important for exclude 403, 404... responses from server)
:user_agent => "ruby" # Mozilla/5.0 by default
Detailed location description
-
img_src - tag: img, attribute: src=“url”
-
a_href - tag: a, attribute: href=“url”
-
style_url - tag: any, attribute: style=“(background|background-image): url(‘url’)”
-
link_icon - tag: link, attribute: rel=“shortcut icon” href=“url”
For “download” method you can use following directives
:parallel => true # for multi thread downloading (this is default if no options)
:consequentially => true, # for sequential downloading into a single stream
:user_agent => "ruby" # Mozilla/5.0 by default
Executables
You can simply use the executed shell commands:
For any looks like image download
download_any_images url dir/
For download favicon only
download_icon url dir/
For download all, that is located in the places for pictures
download_images url dir/
For download by regexp
download_by_regexp url dir/ "[^'\"]+\\.js"
Debugging
“-d”, “–debug”
To monitor the process of downloading, use the -d flag in the parameters. Perhaps there is an error URI::InvalidURIError in some cases.
download_images url dir/ -d
Copyright
Copyright © 2011 Malykh Oleg. See LICENSE.txt for further details.
License
The MIT License
Authors
Personal blog author: Malykh Oleg - blog in russian