Decoder Ring Cracker

April 24, 2010, 3:27 p.m.


Anyone who has seen "A Christmas Story" or those newspaper cryptograms is familiar with the idea of a decoder ring. There are many implementations of Substitution Ciphers throughout history. Wikipedia can fill you in, I'm not going to post a history lesson here.

Recently I cranked out a solution to a decoder ring challenge and thought that I would share. The download link is at the bottom of this post.

    Rough overview:
  1. Create a set of words from the cipher text. This eliminates processing repeated words.
  2. Create a list from the set sorted in reverse order of word length
  3. Start at the root word and find words of the same length in the language file
  4. Use the words in this list to fill in the key. Discard if they key cannot fit the word due to conflicting letters e.g. 'ABCD' could not decode to 'FOOT' because B and C would be O.
  5. If the key fits the word, move to the next cipher text word in the list.
  6. Use the ring to decode this next cipher text word as much as possible. Regex match the language dictionary to find partial matching words which are candidates to fill in the gaps.
  7. Use the partial matching word list and start at 4 again until tree depth is the same as word list length or the ring is solved completely.


Example Decoder Ring:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
L H E V S B Y W U G O T Z A D X R I K J M F C Q N P

Using the ring above you can decode any message into its original format. For example "BCAAK HKQAO." would translate using the ring above to "HELLO WORLD."

My program would not do well cracking a message that short. There are far too many 5 letter words even if you eliminate those which cannot fit due to repeat letters and such, which I do. There are still dozens of solutions like "GUCCI PINCH" and "DITTO ROUTE."

By changing the cipher text to "BCAAK HKQAO CPNJJCQNLC." You immediately get 3 results: "(JELLO|HELLO|CELLO) WORLD EXAGGERATE." There are still 3 equally likely (to a machine) candidates for the solution. Add another word and word like "BCAO" or "HELD" and it will return a single solution.

The example above assumes you are running my program with the "-m=0" argument which will prevent any missed words from being skipped. That is usually only a good idea for very short messages. The default value of "-m=1" is a lot more practical for everyday use. It will allow the program to continue solving for cipher text which contains unknown proper nouns, slang, and misspellings. Values of m greater than 1 will find a lot of nonsense words and may lead to a large amount of garbage output, especially on shorter cipher texts. If there are a lot of solutions with garbage words in them with the default of m=1, or the cipher text is very short; then try m=0.


Usage: python cracker.py languagefile [-d] [-m=N] [-g] file1 [file2] [filen]
-d Debug. Default is no debug information.
-m=N Allow Skipping of N consecutive non-dictionary words in scan.
This is usefull for proper nouns. Default is 1.
-g Greedy scan. Once a potential solution is found,
stop exploring for alternatives. Default is comprehensive scan.

Download This download includes all of the source code as well as a "words" file that contains a decent English dictionary. The testfiles folder contains a test cases you can run the program against to uncover the contents.


tags( #code #python )

Code

Various Coding and Scripting Endeavors

Life

Whats going on generally.

Neat Stuff

Things I have found useful or interesting on the internet, for what it is worth.

Follow caller9com on Twitter