Monday, February 26, 2007

In a recent article, on his always readable blog, Andrew Denny is discussing the virtues of Linux for running web sites and says "If you really want to see a state of the art Linux web site, try Nick Atty's Canalplan with its wonderful drop-down placenames in the Gazetteer (how cool is that? find a canal location as you type!),"

I'm glad Andrew likes this - it's a neat feature I'm quite proud of (although it's still not as good as it should be).

What's interesting is that he's picked up on something that is a wonderful example of the strengths of Open Source Software. Here's the story behind how that feature came to be written...

I subscribe to the mailing list for the Google Maps API, and a while ago someone mentioned a neat tool in Google Labs called suggest. I really liked the look of it, and decided I'd steal adapt it for CanalplanAC.

But before sitting down to write something as complicated as this I thought I'd do a search and see if someone had already written it. And, lo and behold, I quickly located actb which does almost all that I wanted. So that was all the messy interaction with the DOM dealt with. Here was a bit of Javascript that I could stick into my page and I'd get drop-down selectors.

However, the project was clearly dead - no updates since early 2005 - and didn't do everything I wanted. In particular, it required you to give it the list of things it was going to search in before it started. But I've got 7000 places in my database - I hardly want each user to have to download that each time.

If this had been a commercial package I'd be stuck. But it wasn't - it was open source (it's under a very liberal Creative Commons license). So I took the source code and hacked around with it - I grabbed a bit of example code for how to use XMLHttpRequest from the web, wrote a bit of glue to code (so that it only goes off and gets suggestions when it has three letters, and only goes back to the server when they change). And that was it - one Javascript file to stick into each page when I wanted to add the new auto-suggest. So that's the "client" end - what about the server?

Google no doubt have no problems powering Suggest, but then they do have a lot of computing power to throw at the problem. I've got one PC that runs the whole site. So I knocked up a bit of C to produce suggestions as quickly and efficiently as possible (a list is produced at build time that contains lower case and correctly formatted versions of each name, so that it can quickly match candidate suggestions but return an attactive version). And there we are. It took about a days work to implement.

It's not Linux that makes CanalplanAC great - it's the philosophy behind it. If you'd like to know more about this, and about how us web geeks work, have a look at Eric Raymond's The Art of Unix Programming. It's only his computing philosophy I'm espousing here, btw.

And, of course, it's not finished. But then software is alive, and the only time it stops growing is when it's dead. In particular I'm unhappy with the order that suggestions appear. A recent tweak helped - it now does two searches, the first for matches at the start of the string and the second for anywhere (so if you type "wig" it finds "Wigan Pier" ahead of "Parbold to Wigan Railway Bridge"). But it still could be improved - if you give it "wolverh" it comes back with "Wolverhampton Boat Club", then "Wolverhampton Bottom Lock", then the rest of the locks in sorted order (10-19, 2, 20, 21, 3, 4, 5, 6, 7, 8, 9, Top), then a random selection of railway bridges and then - finally - Wolverhampton itself.

Ah well - it doesn't look like I'll have to find another hobby for a while.

Thursday, February 01, 2007

More archaobugology

They do keep coming out of the woodwork. I've just found, and fixed, another 5 or 6 years old bug lurking in a key bit of software ('=' signs appearing in the value part of post or get data were being dropped). I'm starting to realise just why so much of our software is a fragile as it is, and getting increasingly frightened by our reliance on computers for critical things. Mind you, I've been reading comp.risks for years, so shouldn't be that surprised.

On the details of the work, the new gazetteer is coming along apace. I plan to go live in the second half of February. It won't be finished, and will still have some bugs in but (putting aside the fact that - as shown above - everything still has bugs in it) it will be less of an abomination, and cause less harm than Google's recent changes to their interface to Usenet. And, again unlike Google, I won't be breaking anything that was long established before they came along.