dimanche 13 juin 2010

Decoding vigenère

I was reading Simon Singh book "The Code Book" when I stumbled accross Vigenère, I had time so I decided to code something to break Vigenère.
Here I'll present the method used to break Vigenère and release the code. The code is incomplete! I didn't code the decoding routine completeoy since I realize that decoding Vigenère is useless nowadays and I did the most interesting part : the kasisky algorithm.
The main interest of Vigenère code breaking is about elaborating the needed algorithms.

Vigenère code breaking go as follow :
- Find the key size using kasisky examination or index of coincidence
- Frequency analysis on each subtext : nbSubtext = szKey

I chose to use the kasisky examination for searching the key size.
I elaborated a recursive algorithm to build my dictionnary of ALL possible words of any size in the crypted text using a binary tree.

The algorithm steps are as follow :
begin
    size = 1

    begin loop_word_size
        begin loop_word_add
             dict_word_add (crypted, szWord)
             crypted = crypted + 1
        goto loop_word_add as long as we aren't at end of crypted
    size = size + 1
goto loop_word_size as long as size != szCrypted

The pseudo code is iterative.
The recursive algorithm is in dict_word_add().

With a 512 bytes word, you have more than 130k possible words all uniques, no duplicates.
We track offsets, and counts of each words

After finding the key size, we do a frequency analysis of each subtext.
The only part missing to have a complete Vigenère code breaking is a reliable (and simple) algorithm to shift the frequencies to match. When that's done, you have the correspondance between the characters and can thus decode the ciphered text without the key.

In the sources I present, I sorted the frequencies in order to show them on screen. It isn't a step of the decoding.

There is a mitigation against Vigenère code breaking, use a key as long as the text, or the Vernam Code which is more commonly known as the one time pad code. The problem with one time pad code is the management and distribution of keys.


The code isn't clean at all but I release it for interested people as if I don't, it will be forgotten in my hard drive.

Hope you enjoy it.

m_101

- Source code : http://rapidshare.com/files/398364832/cryptanalysis.tar.gz.html
MD5: CF5BA403FB9D9106FDF51D5AEB33C526

2 commentaires :

  1. Hi, you can upload the file again?

    Thanks!

    RépondreSupprimer
  2. Hello,

    Sorry, I don't think I still have it. If I find it, I'll re-upload it.
    From what I recall, the code was not very advanced though.
    May need to recode that :p.

    Cheers,

    RépondreSupprimer