TitleGrabber

Grab page & article titles from lists of URLs contained in files passed in as arguments

Installation

Add this line to your application's Gemfile:

gem 'title_grabber'

And then execute:

$ bundle

Or install it yourself as:

$ gem install title_grabber

Usage

Just pass it a list of files containing URLs (one per line)

title-grabber -f /abs/path/2/file1.txt,rel/path/2/file2.txt

Data is either recorded to out.csv in the CWD or the file specified using the -o/--output argument, e.g.

title-grabber -o ~/output.csv -f /abs/path/2/file1.txt,rel/path/2/file2.txt

See all available CLI switches and env vars

title-grabber -h
Usage: title-grabber [options]
    -V, --version                    Print program version and exit
    -f, --files /f1.txt,f2.txt       1 or more comma-separated paths to text files containing 1 URL per line
    -o, --output FILE                Output file. Defaults to out.csv
        --connect-timeout TIMEOUT    HTTP Connect timeout. Defaults to the value of the CONNECT_TIMEOUT env var or 15
        --read-timeout TIMEOUT       HTTP Read timeout. Defaults to the value of the READ_TIMEOUT env var or 15
        --write-timeout TIMEOUT      HTTP Write timeout. Defaults to the value of the WRITE_TIMEOUT env var or 15
        --max-redirects REDIRECTS    Max. # of HTTP redirects to follow. Defaults to the value of the MAX_REDIRECTS env var or 5
    -r, --max-retries RETRIES        Max. # of times to retry failed HTTP reqs. Defaults to the value of the MAX_RETRIES env var or 5
    -t, --max-threads THREADS        Max. # of threads to use. Defaults to the value of the MAX_THREADS env var or the # of logical processors in the system
    -d, --debug                      Log to STDOUT instead of to a file in the CWD

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Run rake (the default task runs the test suite) to make sure all tests pass.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/cristian-rasch/title_grabber.

License

The gem is available as open source under the terms of the MIT License.