Monday, February 26, 2007

In a recent article, on his always readable blog, Andrew Denny is discussing the virtues of Linux for running web sites and says "If you really want to see a state of the art Linux web site, try Nick Atty's Canalplan with its wonderful drop-down placenames in the Gazetteer (how cool is that? find a canal location as you type!),"

I'm glad Andrew likes this - it's a neat feature I'm quite proud of (although it's still not as good as it should be).

What's interesting is that he's picked up on something that is a wonderful example of the strengths of Open Source Software. Here's the story behind how that feature came to be written...

I subscribe to the mailing list for the Google Maps API, and a while ago someone mentioned a neat tool in Google Labs called suggest. I really liked the look of it, and decided I'd steal adapt it for CanalplanAC.

But before sitting down to write something as complicated as this I thought I'd do a search and see if someone had already written it. And, lo and behold, I quickly located actb which does almost all that I wanted. So that was all the messy interaction with the DOM dealt with. Here was a bit of Javascript that I could stick into my page and I'd get drop-down selectors.

However, the project was clearly dead - no updates since early 2005 - and didn't do everything I wanted. In particular, it required you to give it the list of things it was going to search in before it started. But I've got 7000 places in my database - I hardly want each user to have to download that each time.

If this had been a commercial package I'd be stuck. But it wasn't - it was open source (it's under a very liberal Creative Commons license). So I took the source code and hacked around with it - I grabbed a bit of example code for how to use XMLHttpRequest from the web, wrote a bit of glue to code (so that it only goes off and gets suggestions when it has three letters, and only goes back to the server when they change). And that was it - one Javascript file to stick into each page when I wanted to add the new auto-suggest. So that's the "client" end - what about the server?

Google no doubt have no problems powering Suggest, but then they do have a lot of computing power to throw at the problem. I've got one PC that runs the whole site. So I knocked up a bit of C to produce suggestions as quickly and efficiently as possible (a list is produced at build time that contains lower case and correctly formatted versions of each name, so that it can quickly match candidate suggestions but return an attactive version). And there we are. It took about a days work to implement.

It's not Linux that makes CanalplanAC great - it's the philosophy behind it. If you'd like to know more about this, and about how us web geeks work, have a look at Eric Raymond's The Art of Unix Programming. It's only his computing philosophy I'm espousing here, btw.

And, of course, it's not finished. But then software is alive, and the only time it stops growing is when it's dead. In particular I'm unhappy with the order that suggestions appear. A recent tweak helped - it now does two searches, the first for matches at the start of the string and the second for anywhere (so if you type "wig" it finds "Wigan Pier" ahead of "Parbold to Wigan Railway Bridge"). But it still could be improved - if you give it "wolverh" it comes back with "Wolverhampton Boat Club", then "Wolverhampton Bottom Lock", then the rest of the locks in sorted order (10-19, 2, 20, 21, 3, 4, 5, 6, 7, 8, 9, Top), then a random selection of railway bridges and then - finally - Wolverhampton itself.

Ah well - it doesn't look like I'll have to find another hobby for a while.

2 Comments:

At 8:30 AM, Anonymous Anonymous said...

Can't you use the weighting of each place to work out what to display. Obviously Wolverhampton should have a heavier weighting than the boat club which should have a heavier weighting than any of the locks?

 
At 8:29 PM, Blogger Nick said...

Not a bad idea. But firstly I ought to sort them in a more sensible way (so short forms come before longer ones, and numbers sort numerically).

I also need to add aliases - nothing useful appears if you type "gas street basin" in, although if you leave that and run it, it will turn into Worcester Bar.

What's nice is that I can do some fairly heavy duty processing for this - as long as I end up with a text file in order it won't get in the way of rapid generation of suggestions.

 

Post a Comment

<< Home