pdfmd
Pdf Meta data managing script.
I use the script pdfmd.rb
/pdfmetadata (with a slightly different name) to manage my PDF documents and keep the naming in line.
Hidden deep in the directory structure of my disks I can quickly find the
documents I need with a quick find /document/path -type f -iname
'*<keyword>*'
which matches some string in the filename.
Requirements
Although the requirements are listed in the script itself as well (header documentation!), here they are again:
Ruby Gems
Install the requirements as usual
$ gem install thor
$ gem install highline
$ gem install fileutils
$ gem install i18n
$ gem install pathname
$ gem install logger
Platforms
Fedora 21/CentOS 7
- Install the depencies (required to install the rmagick gem)
$ sudo yum install -y rubygems rubygems-devel gcc ImageMagick ruby-devel ImageMagick-devel
- Install Gem
$ gem install pdfmd
Ubuntu 14.04 LTS
- Install the dependencies
$ sudo apt-get install -y rubygems-integration imagemagick libmagickwand-dev ruby-dev
- Install gem
$ sudo gem install pdfmd
Applications
This is usually already in your os repositories
$ sudo yum install Perl-Image-Exiftool
- hiera can be optionally used to configure some default settings (instead of a configuration file).
$ gem install hiera
Usage
The usage is quite simple:
$ ./pdfmd.rb [show|edit|rename|sort] [options] <filename>
The interface has been setup using Thor.
So in order to get more information just run the required help command:
$ pdfmd # General information
$ pdfmd help <action> # Command specific help
My usual workflow is like this:
$ cd /my/pdf/directory # Step 1
$ pdfmd show test.pdf # Step 2
$ pdfmd edit -t all -r test.pdf # Step 3
$ pdfmd sort . # Step 4
- Step 1: Change into the directory with the mess of pdf documents. Here all the files from the scanning before end up.
- Step 2: A quick look at the currently set metadata does not hurt. If I find the metadata already in order, I skip this document.
- Step 3: For each document I update the PDF metadata to the settings I prefer. The command
pdfmd explain <topic>
explains what the value are used for. Some parameters like -r are actually ommited on my systems, because they have been set by Hiera. - Step 4: In the end I sort all documents according to their metadata into correct subdirectories. The parameter -d is being set from Hiera and makes sure the files end up where they are supposed to be.
There's an underlying logic in the renaming and sorting of the files according to the metadata. Make sure you read at least the help-information before you use it or it might be confusing.
It's also usefull to define some default settings in Hiera to avoid unnecessary typing.
HINT: Before you start using the script, make sure you have a backup of your files or you know what you're doing. If you loose information/files I will not be able to help you.
Password protected files
pdfmd recognises if a pdf file is password protected and will ask for the password.
A password string can be defined in hiera that will be used per default.
Hiera
In order for Hiera to provide (default) configuration data, setup a configuration hash e.g. inside the YAML backend:
pdfmd::config:
default:
password : xxxxxxxxxx
sort:
destination : /data/tmp
copy : true
interactive : false
rename:
#allkeywords : true # Does not make sense in combination with _keywords_
keywords : 2
outputdir : /data/output/sorted
copy : true
edit:
rename : true
Information about which hiera configuration settings are available can be either found in pdfmd help <command>
or pdfmd explain hiera
.
Test your hiera configuration with
$ hiera pdfmd::config
Contact
If you have improvements and suggestions -> let me know. If you can help me writing tests for this, please let me know as well.