SP Services Bill Scanner <img src=“https://secure.travis-ci.org/tardate/sps_bill_scanner.png” />
Extracts bill details from SP Services PDF bills so that you can, um, do geeky data analysis n’stuff, and because I loathe data entry! One day we’ll have smart meters and SP Services will let us download our raw meter data. But until then…
If you are an SP Services subscriber, download your bills from services.spservices.sg
If you are not an SP Services subscriber, this gem ain’t going to be much use for you!
Some example analysis using R is included in the scripts
folder. The inspiration for hacking away with R comes from reading Sau Sheong’s new book Exploring Everyday Things with Ruby and R. Check it out!
Requirements and Known Limitations
-
Requires Ruby 1.9 (1.8 compatibility not tested or assured)
-
R is used for data analysis examples - this is optional
-
Currently it does not handle multi-property bills.
-
May not handle extensive multi-page bills for a single property correctly.
If you do come across bills that this gem can’t read correctly, your help to get it fixed is greatly needed: either submit a fix yourself, or report the problem at github.com/tardate/sps_bill_scanner/issues
Help Required: Test this with your own bills
Unfortunately, there isn’t a definitive set of bill examples that make it possible to ensure that this gem will work for every bill issued by SP Services. And since bills are in PDF format, extracting the data in a structured manner is a task fraught with pitfalls.
I have tested this with my own bills (going back over a year) but that definitely doesn’t mean it will work for others.
So I need help from others who are willing to test with their own bills. I’m not asking for your bills as that raises privacy concerns, and I definitely don’t want real bills committed to the git repository.
Instead, I have setup the tests in a way that should make it easy for you to test with your own bills.
Here’s the basic outline:
-
First, make sure tests are running green for you ‘as-is’:
-
fork or clone the repo
-
bundle
will install development dependencies -
rake
will run the tests - they should all be OK
-
-
Get you own PDF bills from from services.spservices.sg
-
put them in
spec/fixtures/personal_pdf_samples
-
NB: these are ignored by git so you won’t accidentally commit them
-
-
At this point you can run the tests with
rake
and it will do a very basic check of the PDFs you have added -
To run complete checks to ensure all the data is being extracted correctly:
-
see the doc in
spec/fixtures/personal_pdf_samples/expectations.yml.sample
-
copy this to
spec/fixtures/personal_pdf_samples/expectations.yml
-
enter in the details that describe each bill you have added
-
now when you run
rake
it will also verify the data extracted from your bills usingexpectations.yml
-
Feel free to get in touch or discuss in the github issues area if you are trying to help but run into problems with this!
If you are more interested in the data analytics, I’m keen to add more interesting R scripts to the collection. Your contributions are most welcome.
Installation
gem install sps_bill
Command Line Usage
Once the gem is installed, use sps_bill
at the command line to interact with the library manually.
To get help on command options:
$ sps_bill -h
For example: to extract all data in CSV format from a set of PDF bills:
$ sps_bill --data=all ./path_to/my_bills*.pdf
Programmatic Usage
You can use the gem from your own scripts or applications. There are just two classes you really need to understand:
-
SpsBill::BillCollection is an Array-like class that represents a collection of bills.
-
the
load
method is used to initialise it given a path or array of filenames -
a range of collection methods are provided to extract sets of data (e.g.
electricity_usages
)
-
-
SpsBill::Bill represents an individual bill
-
initialised given a file name
-
provides a range of accessors to get at individual data elements (e.g.
electricity_usage
)
-
To load a collection of bills:
> require 'sps_bill'
> bills = SpsBill::BillCollection.load('./my_bills/*.pdf')
> bills.total_amounts
=> [["2011-10-01", 168.86], ["2011-11-01", 196.46], ["2011-12-01", 176.54]]
> bills.electricity_usages
=> [["2011-10-01", 14.0, 0.2728, 3.82], ["2011-10-01", 444.0, 0.2698, 119.79],
["2011-11-01", 2.0, 0.2728, 0.54], ["2011-11-01", 537.0, 0.2698, 144.88],
["2011-12-01", 482.0, 0.2698, 130.04]]
> bills.gas_usages
=> [["2011-10-01", 12.0, 0.1961, 2.35], ["2011-11-01", 12.0, 0.2117, 2.54], ["2011-12-01", 12.0, 0.2117, 2.54]]
> bills.water_usages
=> [["2011-10-01", 8.4, 1.17, 9.83], ["2011-11-01", 11.4, 1.17, 13.34], ["2011-12-01", 9.6, 1.17, 11.23]]
To load and examine a specific bill:
> require 'sps_bill'
> pdf_bill_file = "./my_latest_bill.pdf"
> bill = SpsBill::Bill.new(pdf_bill_file)
> bill.account_number
8123123123
> bill.total_amount
251.44
> bill.invoice_date
2011-05-31
> bill.invoice_month
2011-05-01
> bill.electricity_usage
[{:kwh=>4.0, :rate=>0.241, :amount=>0.97},{:kwh=>616.0, :rate=>0.2558, :amount=>157.57}]
> bill.gas_usage
[{:kwh=>18.0, :rate=>0.1799, :amount=>3.24}]
> bill.water_usage
[{:cubic_m=>36.1, :rate=>1.17, :amount=>42.24},{:cubic_m=>-3.0, :rate=>1.4, :amount=>-4.2}]
> bill.to_s
Account number: 8123123123
Invoice date : 2011-10-31
Service month : 2011-10-01
Total bill : $168.86
Electricity Usage
-----------------
[{:kwh=>14.0, :rate=>0.2728, :amount=>3.82}, {:kwh=>444.0, :rate=>0.2698, :amount=>119.79}]
Gas Usage
---------
[{:kwh=>12.0, :rate=>0.1961, :amount=>2.35}]
Water Usage
-----------
[{:cubic_m=>8.4, :rate=>9.83, :amount=>0.0}, {:cubic_m=>1.17, :rate=>0.0, :amount=>0.0}]
Data Analysis with R
Some examples of bill data and analysis using R are included in the scripts
folder.
sample scripts
- scripts/scan_all_bills.sh
-
an example script to scan a set of bills and produce a csv file for analysis
- scripts/full_analysis.R
-
an example script that prepares a one-page PDF summary analysis
sample data and analysis
- data/all_services.csv.sample
-
sample CSV data for a years worth of elec, gas, and water
- data/all_services.sample.pdf
-
PDF analysis produced by
full_analysis.R
using theall_services.csv.sample
data set. - data/elec_and_water_only.csv.sample
-
sample CSV data for a years worth of elec and water
- data/elec_and_water_only.sample.pdf
-
PDF analysis produced by
full_analysis.R
using theelec_and_water_only.csv.sample
data set.
example run
./scan_all_bills.sh ../path_to_my_bills/*.pdf > my_bill_data.csv
./full_analysis.R my_bill_data.csv
This will have produced an analysis of all your bills in full_analysis.pdf
.
Contributing to sps_bill_scanner
-
Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet
-
Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it
-
Fork the project
-
Start a feature/bugfix branch
-
Commit and push until you are happy with your contribution
-
Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.
-
Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
Copyright
Copyright © 2012 Paul Gallagher. See LICENSE.txt for further details.