For command line usage see the bottom of this readme.

This archive contains a couple of ruby scripts. The main script
will decode most mono-alphabetic simple substitution ciphers
by using a shotgun-hill climbing algorithm. It works by
selecting a random key and scoring the result, then making small
adjustments to the key that improve the score (which is the
hill climbing part). If it can't find any obvious adjustments
that will improve the score, it will make a small
random change to the key and then start hill climbing again.
The algorithm will do this 10 times after the score plateaus, at
which point if it can't come up with a better score, the
program gives up (assuming that it has either reached
a dead end or that it has the answer). Then the program
will generate a new, completely random key and start the process
over from the beginning, (this is the shotgun part of the
algorithm). It's a little like trying to find the highest peak
of a long polynomial expression where you can't plot the line
ahead of time. The tactic here would be, pick a random point
on the line, start climbing the hill until you can't climb any
higher, then pick another random point on the line and start
climbing again. If you have a curve with a small number of
relatively uniformly distributed peaks, than this
is a moderately efficient, albeit uncertain way of
ascertaining the correct answer.

This program has no way of knowing whether or not it has hit
upon the correct answer, so it will keep on looking for better
scores until the user hits CTRL-C to end the program. It's
up to the user to decide whether the answer the program has
come up with makes any sense or not. The scoring algorithm
is really the heart of this program. Basically the program
has a dictionary of tetra graph frequencies, that is, how
often do these four letters appear next to one another in the
English language. Using this number as a guide, it produces
the score by taking the log of each of the tetra graph
frequencies in the cipher text and adds them all together to
produce a score. adding up the logs of the frequencies to
produce the score rather than adding up the scores themselves
appears to be important in making the program able to score
plain text higher than gibberish. I didn't realize this the
first time I wrote this program and didn't get very good
results. I should note that I took somebody else's code as
a guide in producing these ruby scripts. I've modified the
strategy slightly from the original author's, and obviously
I translated it from C to Ruby. I'm very grateful to the
original author for sharing his C code with the world. I
would have had a hard time getting this program working
without using his program as a guide, In particular taking
the logs of the scores would not have occurred to me.

This script runs pretty slowly, owing mostly to the fact that
I've written it as a ruby script. I've seen other implementations
of this idea written in C which run orders of magnatude faster
than this implementation does. I plan to retranslate this code
into several languages to see how well they can run it.

At any rate, included in this archive is the following
1. the substitution solver script which will over time,
(hopefully) recover the plain text.
2. a sample tetra graph dictionary which was generated using
"Harry Potter and the Goblet of Fire" as it's source txt,
(yes I own a legal copy of the book)
3. a script which will generate a tetra graph dictionary from
an ascii text file.
4. this readme.
5. an inspector script which will show the contents of the
english.dic file in a readable format.

This program is distributed under the GNU General Public
License. If you would like to use this source code in
your own programs, or rewrite it or whatever, just be
sure to read the GNU GPL so that you know your rights
and responsibilities.

examples of usage

to decrypt a cipher text in an ascii text file
ruby substitution_solver.rb cipher.txt

to generate an english.dic file using another source use
ruby dictionary_builder.rb novel.txt

finally I'm including a script which will print out the english.dic
file in a readable form. It's interesting to note that the
frequencies increase exponentially as you go down the list.

example
ruby dictionary_inspector.rb > output.txt
or
ruby dictionary_inspector.rb | less
or just plain old
ruby dictionary_inspector.rb