Image Downloader

Quite often there is a need to collect pictures from one or another page on the Internet. This plugin solves this particular task.

Installation

sudo gem install image_downloader

Requirements

ruby 1.8 or 1.9
gem nokogiri

Description

Image Downloader is a rather simple library which does the following:

get web page (with Net::HTTP)
parse html page (use regexp or nokogiri)
download images (in one or multi-threads)

Example usage

After installation, you can use the following code as an example:

require 'rubygems'
require 'image_downloader'

page_url = 'www.test.com'
target_path = 'img_dir/'
downloader = ImageDownloader::Process.new(page_url,target_path)

#####
# download all images on page in any place (by regexp, all that look like url with image)
downloader.parse(:any_looks_like_image => true)

##### or
# download images from all elements where usually images placed (<img...>, <a...>, ...)
downloader.parse()

##### or
# download image from exect places in page
downloader.parse(:collect => {:link_icon => true})

##### or
# download images by regexp
downloader.parse(:regexp => /[^'"]+\.jpg/i)

downloader.download()

For “parse” method available following options

# find all url which contain image extansion
:any_looks_like_image => true

# find images in specified location
:collect => {
  :all => true, # all image places
  :(img_src|a_href|style_url|link_icon) => true # specified location
}

# find by regexp
:regexp => /['"]([^'"]+\.jpg)[^'"]*['"]/i) # for ruby 1.8 (in 1.9 not allowed () for scan method)
:regexp => /[^'"]+\.jpg/i # the same, but shorter
:regexp => /[^'"]+\.css/  # other files can also be downloaded

# ignore URLs with images according to given parameters
:ignore_without => {:(extension|image_extension) => true}

# setting the favorite User-Agent (vary important for exclude 403, 404... responses from server)
:user_agent => "ruby" # Mozilla/5.0 by default

Detailed location description

img_src - tag: img, attribute: src=“url”
a_href - tag: a, attribute: href=“url”
style_url - tag: any, attribute: style=“(background|background-image): url(‘url’)”
link_icon - tag: link, attribute: rel=“shortcut icon” href=“url”

For “download” method you can use following directives

:parallel => true # for multi thread downloading (this is default if no options)
:consequentially => true, # for sequential downloading into a single stream
:user_agent => "ruby" # Mozilla/5.0 by default

Executables

You can simply use the executed shell commands:

For any looks like image download

download_any_images url dir/

For download favicon only

download_icon url dir/

For download all, that is located in the places for pictures

download_images url dir/

For download by regexp

download_by_regexp url dir/ "[^'\"]+\\.js"

Debugging

“-d”, “–debug”

To monitor the process of downloading, use the -d flag in the parameters. Perhaps there is an error URI::InvalidURIError in some cases.

download_images url dir/ -d