Sunday, June 29, 2008

Word Unscrambler

One a train journey I sat doing the puzzles given in the Mid Day newspaper. So my brain cells got to work and I got the idea of coding a word unscrambler.

In the aforesaid problem there are two major tasks to be accomplished:
  • Creating the permutations from a given set of alphabets
  • Looking up each permutation in an English dictionary for its existence.
The code can be downloaded here. Though it is very unoptimized and not professional it does work.

Framework

After pondering over a long time I tried initially to use Python but due to the lack of the English Dictionary modules in Python decided on using Java. Java has an API called JADT courtesy IBM. But as I could not install the API I used a txt file with 47000 English words in an easily usable format. I know java is famous for its slowness and I would be very happy if anyone volunteers to adapt it to any faster format. I initially started out with Python but the Probstat module did not work on my computer due to lack of some required C compiler. So getting irritated I stuck to Java which has been easier.

Explanation

The first part is producing all possible permutations of the given letters. This is the tougher part though it might seem simpler than the second part. The algorithm is a recursive one.

perm1(String prefix, String s) {
int N = s.length();
if (N == 0) Display the string //one per
for (int i = 0; i <>

The second part is checking weather each permutation exists in the English dictionary. It is a very simple procedure and a time consuming one. Each permutation has to be checked with all the words in the dictionary file. If the word exists in the dictionary then it is an unscrambled word.

Complexity Issue

The complexity of the algorithm is kn! where k is a constant based on the number of entries present in the dictionary and n is the length of the scrambled word. Now the algorithm produces n! sequences for each sequence. It also compares each combination with all the words in the dictionary which in this case is 47158. Now this program works instantaneously for n<=6. As n! grows rapidly the program consumes lot of time for larger values of n.

Improvements

There are a lot of improvements which can be attempted for this program.
  • The permutation algorithm produces produces the same word twice if it has same letters. For example if you give 'ertreat' as input you will get 'retreat' twice as e repeats twice. Different algorithms can be attempted to eliminate this problem.
  • The dicitonary API can be implemented which will reduce the time for checking the existence of a word.
  • An applet can be created which will make it net-savvy
  • Finally if a data structure consisting of all the English words on the following lines will reduce the time in searching for the word (image will be uploaded shortly)

Random Thoughts
When we try to unscramble a word we try a random combination and check in our vocabulary. Now in the case of the comp it can try all combination s and its vocabulary is the entire English dictionary. So where the human brain is more powerful is intuition. We can check for some letters and eliminate a few combination. For example when a word has two u's we are almost sure they don't exist next to each other. But this program does not work like that. There are aspects where you can do this, implementing small aspects of AI and NLP. But this is a long shot.

If anyone wants to get in touch with me about this leave me a comment.