Well, If you study in my college you’ll know that our LAN is as important an commodity as water or air in our lives. Consequently is DC++.
Well, the status of a person is, to a great extent, determined by what his share size is on the DC hubs (some DC gods have over 1.5 Tb’s.). Another criterion is how high the person’s rank is on the Anagrams Hall of Fame (HoF).
Well, one night ( 5 days before my 1st In-Sem exams ) I had this strong desire to play Anagrams on the NightHalters Hub. So I tried. And after one hour was depressed enough to outsource the work onto my brain-extension. My computer, that is. So I wrote this cute little anagram solver.
First I got a huge english word dictionary – if you use a debian system, the command
sudo apt-get install wamerican-huge
downloads and places a huge (6.4 MB, 6,38,669 words) text file at /usr/share/dict called american-english-huge.
It contains almost every single word commonly/uncommonly used, including vulgar slang (yeah, i searched). It also contains my name!
Now, the original dumb brute-force approach – comparing each permutation with all the crazy words in a dictionary – would be O(n!), but really, even a third-grade third grader would be smart enough to figure out a better method. So I did this – just calculated the hash of every string in my dictionary using a certain hash array of length 26 ( the number of english alphabets ). I also did not use any modular arithmetic – to reduce hash collisions, of course, and also because memory was not exactly a scarcity ( so i could afford having huge hashes) in a case such a simple english dictionary.
Why use a hash ?
Because every permutation of a string will have the same index-independent hash.
For example, consider the two words, tastier and artiste. They are both permutations of each other. They are also permutations of the jumbled input aeittsr.
Suppose the hash I select is :
a = 1
e = 3
i = 5
t = 9
s = 17
r = 23
(In reality the values should be much bigger odd numbers for best results – shake well before consumption.)
Then, the index-independent hash of aeittsr (the input string) is 1+3+5+9+9+17+23 = 67.
Call this h(aeittsr) = 67.
And since in this particular method of calculating hashes uses a commutative operation ( Addition on integers ), we have :
h(artistes) = 67
h(tastier) = 67
Therefore, we saw that given an input string, we just have to calculate the hash and compare it with the (precomputed) hashes of other dictionary strings.
Now note that the choice of the hash key is important. I used really large hash values for every character – this minimizes the number of hash collisions
, greatly improving the search time – at the expense of more memory used. But as I said, hard disk space is not really scarce for an english dictionary.
Also, for generating the hash-table, I organised the words into files that contained words of the same size. In these files, I printed the words and their hashes beside them, one word-hash pair on every line.
Now the realtime heavy stuff was complete.
So now the unjumbler works as follows : the hash and length of the input is calculated. Now the program dives directly into the file which contains words of that size. After this, it does a linear search for the hash and only if a match is found, checks if the string beside the hash is a permutation of the input ( The function for that is quite easy ), and if it is, it gets printed to the console.
And the result of this was that after 3 hours of playing Anagrams (with the aid of my brain extension), I was the #1 guy in the HoF!!!