Thursday, February 12, 2009

Auto Complete on the Google Search Appliance

At my job we have an .aspx page that displays search results from a Google Search Appliance. Nothing too fancy. The page passes the query, gets the XML results from the GSA, we pair that with an XSLT and sha-zam, search results. It was brought to my attention that some one higher up the chain wanted to know if we could have that "cool google suggest drop down thing" on our search page. Google has it. Amazon has it. Dang near every text box on FaceBook has it. Why don't we? Seveal people told me that there must be a way to enable it, because it's a GSA and Google runs it on their site. Now, I'm as guilty as the next guy when it comes to assuming how hard/easy something is going to be. I'm not pointing fingers. As with most things, there's a little more to it.

First off, the Ajax toolkit from Microsoft has an auto complete extender. It's great for running against SQL tables, or XML files or just about any data source. You can get back the ID of the record that the user selected and really make things snazzy. The question is, how do I use a GSA as a data source?

The GSA does return results in XML, but it doesn't do wild card queries. You can't do a search for "hea*" and get back head, hearing, heart and things like that. You get Dr.Hea and some mis-spellings and that's it. I did some looking on the Google Code site and found a "Search as you Type" project that seemed to do just what I wanted. It said in the documentation that you could turn any text box into a Google suggest box. It was written in PHP but I figured I could download it and convert the page to .Net. Most of it was in javascript files anyway so no biggie. Except that the page doesn't connect to a GSA. It has a pipe delimited text file that it uses as the demo source for the drop down. I looked through the readme.txt looking for some way to magically refernce the GSA. Nothing. What I had found was a Google Base code project that did the same thing as the ajax toolkit from Microsoft. I was right back where I started.

I fumed for a little bit. I drummed my fingers on my desk. I went back to Google's site and looked over their page source and javascript files. Then, as I was taking a swig of Red Bull, I realized what Google was doing. They weren't running against their index either.

Google is fast but not so fast that they can ajax call from your browser back to their index and back to your browser with search results each time you press a key (Update 7/12/2011:  Actually, they are that fast now.  Most of my speculations about Google in this post are obsolete). They have been in the search business for quite a while so there is no doubt that they know the top 1,000 search terms for words that start with each letter of the alphabet. The top 1000 As, the top 1000 Bs and so forth. Google had compiled a list of the Top 26,000 key words (something like that, I'm gusessing) and that's what they are running against when they do the Google suggest ajax call.. That's why when you do an obscure search it doesn't show anything but when you hit "search" you get results.

Now, back to my problem. It's not a problem anymore. I can run a report against our GSA, get back the top 10,000 or so search results, filter out the trash, and have everything I need in a nice and tidy XML document. I can put it in a SQL table or leave it in XML. I'll b able to sort it either way. Now I can use the ajax toolkit Auto Compete extender and in a short amount of time I have the same functionality as Google Suggest for our GSA. Only I'll be using .Net.

I'll post the code and link to the search page when I'm done. Shouldn't be long (will I ever learn to stop saying that).

No comments:

Post a Comment

Post a Comment