MD5 in Java

MD5 is a hashing algorithm invented by Ronald Rivest in 1991. A hashing function is used to make a short (usually constant-length) check sum from an indefinite amount of binary input data. There are two criteria:

  1. It must be deterministic; that is, the same input yields the same output
  2. It must be uniformly distributed over the possible outputs (the "hashing address space") regardless of any bias in the input.
  3. It must be unlikely to get the same result for two different inputs chosen at random
  4. More importantly, it must be hard or impossible to deliberately figure out which input generates a certain output or
  5. to figure out which two inputs generate the same output.

Aside from a minor flaw in point 5, MD5 fulfills all these. I'd repost the algorithm here, but it's insanely complex and not really intuitive. I've seen hashing analogized to "putting all the bits in a pot and stirring well", and that's roughly what this is. It chooses a few arbitrary constants, multiplies, shifts and otherwise manipulates the input bytes, then returns 128 bits as output. You can see the pseudocode on Wikipedia.

The algorithm looked fairly interesting and I had some time, so I decided to implement it in Java.

My implementation uses 3 classes (Main, Input/Output and MD5 algorithm). I am making a good effort to keep the algorithm optimized for speed, memory and elegance - which is somewhat tricky since the algorithm works on words of binary data that are best stored as unsigned ints. Java has only signed ints, so there are certain pitfalls when working with them.

However, after three days of work, I finally got from something completely wrong, past something that's wrong most of the time, to something that finally is correct all of the time (or at least on all values I tested). Unicode and binary data is probably still broken, because I treat all characters as byte-sized ASCII. But I don't know.

With the algorithm done, I made a wrapper program for calling it from the commandline, and I'm uploading it here:

The program call is this:

java -jar md5.jar INPUTFILE

The program will output the MD5 signature of the file's content.

News Category: 
© 2006-2012: All content, unless otherwise noted, is the property of Arancaytar. It may be copied and modified with attribution for non-commercial purposes. By publishing comments on this site, you grant Arancaytar a non-exclusive, perpetual license to reproduce and publish these comments along with any identifying information provided. (You may request your comments to be deleted or edited voluntarily.)