TitleGrabber
Grab page & article titles from lists of URLs contained in files passed in as arguments
Installation
Add this line to your application's Gemfile:
gem 'title_grabber'
And then execute:
$ bundle
Or install it yourself as:
$ gem install title_grabber
Usage
Just pass it a list of files containing URLs (one per line)
title-grabber -f /abs/path/2/file1.txt,rel/path/2/file2.txt
Data is either recorded to out.csv in the CWD or the file specified using the -o/--output argument, e.g.
title-grabber -o ~/output.csv -f /abs/path/2/file1.txt,rel/path/2/file2.txt
See all available CLI switches and env vars
title-grabber -h
Usage: title-grabber [options]
-V, --version Print program version and exit
-f, --files /f1.txt,f2.txt 1 or more comma-separated paths to text files containing 1 URL per line
-o, --output FILE Output file. Defaults to out.csv
--connect-timeout TIMEOUT HTTP Connect timeout. Defaults to the value of the CONNECT_TIMEOUT env var or 15
--read-timeout TIMEOUT HTTP Read timeout. Defaults to the value of the READ_TIMEOUT env var or 15
--write-timeout TIMEOUT HTTP Write timeout. Defaults to the value of the WRITE_TIMEOUT env var or 15
--max-redirects REDIRECTS Max. # of HTTP redirects to follow. Defaults to the value of the MAX_REDIRECTS env var or 5
-r, --max-retries RETRIES Max. # of times to retry failed HTTP reqs. Defaults to the value of the MAX_RETRIES env var or 5
-t, --max-threads THREADS Max. # of threads to use. Defaults to the value of the MAX_THREADS env var or the # of logical processors in the system
-d, --debug Log to STDOUT instead of to a file in the CWD
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Run rake (the default task runs the test suite) to make sure all tests pass.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/cristian-rasch/title_grabber.
License
The gem is available as open source under the terms of the MIT License.