Saturday, October 07, 2006

Places of Interest - or interesting places?

There are two fundamental approaches to mapping something like the waterways system: a topological one where you concentrate on the network structure - what is linked to where and how; and a geographical one where you concentrate on where things physically are.

The first of these means that your software is, at heart, an exercise in applied graph theory. The second means your program is a - specialised - geographic information system (a GIS).

CanalplanAC falls firmly into the first of these camps. The (apparently dead) Waterscape route planner gave every impression of being the second.

The problem is, neither of them can quite cut the mustard as you get into the details.

Waterscape's planner had the fundamental limitation that only one thing could be in one place at the same time. Which makes sense at first blush. But unfortunately it meant that every aqueduct was in fact a link between navigable waterways and it would cheerfully route you from Leigh to Manchester via the MSC - moving from one to the other at Barton.

Canalplan's approach - where places are nodes in a graph, and just happen to have some basic geographical information (in particular, their coordinates) bolted onto them is - I am convinced - the right approach. But it's not quite enough.

It starts to break down as you start to give more information about things around the waterways. In particular, as you get more and more places in the data, the idea that a pub - particularly one that isn't bang on the towpath - "belongs" to a particular place stops being sensible. You may have noticed that I've started to add more POI (Place Of Interest) data. In doing this, I've had to fiddle things by adding the POI items to all places within a certain range. This is inefficient - it's slow at build time, difficult to maintain if things change, very unamenable to user adjustment (something I want to continue to expand the use of) and just general not the Right Thing to do.

So I've started to migrate to a two layer approach. Waterways places will remain as a graph with coordinate information, but Places of Interest will live in a simple geographical database and be added as appropriate.

To help with this I've written a very nice (though I say it myself) tree based data structure and built a new sort of loop into the programming language that can search through several thousand places and find only those within a certain range of a place in almost no time at all. In the next release or so I'll have added this, and migrated all the existing pub, museum, shop etc data, so you won't see much difference, but it will all be working better.

It will then be that - as with photos - the data structure on the server is the true data, allowing people to modify and add places of interest with the changes taking place as soon as approved.

But, of course, it's never as simple as this. What, for example, is a boatyard? Is it a place on the canals, or is it a POI? If the former, then, unless each is added as its own place, we still have places with POI-type information associated with them. If the latter, then how will I ever implement "find nearest boatyard with gas for sale" for example?

I have a way to make the second of these work, but for the moment I'm suspending judgement on which way I will actually jump.

All of this is part - as I mentioned in the article about distances - of a gradual redesign of some of the underlying data structures. This is always a risky thing to do with something as large and evolved as CanalplanAC, but sometimes it just has to be done.

Canalplan is getting a bit long in the teeth and software rot is starting to set in in a few places. After all, I see from my change log that the gazetter, and the move to OS coordinates from purely arbitrary ones (what was I ever thinking of) came in in November 2000. In Internet terms, that's ancient. In that time the number of places in the database has gone up by an order of magnitude - and what worked fine then is really starting to creak. And there are new things out there. Satellite navigation systems having taken off means that geographical data is suddenly of use to lots of people, not just to a few wierdo programmers - and so it becomes available and affordable. I have to be able to move with this.


At 9:04 AM, Blogger Richard said...

With the late unlamented Waterscape routeplanner, it wasn't so much a problem of data model, just dodgy data.

The people who originally compiled the GIS database actually did draw Barton Swing Aqueduct as a waterway cross-roads. In other words, there was a node called 'Barton Swing Aqueduct' with four routes heading out of it - Leigh Branch north, Leigh Branch south, MSC east, MSC west.

When (in one of many attempts to make a silk purse out of a sow's ear) I redid the database to make this into a non-joining crossing, it stopped recommending that you should water-ski off the side of the aqueduct, which was probably a good thing.

Before leaving Waterscape, I did give Paul (WS webmaster) a set of instructions on how to write a new route-planner and I believe he's done some work on it... but there are other priorities at Waterscape at the moment. Maybe we'll do something on the WW website. ;)

At 11:01 PM, Anonymous paul said...

Yes I can confirm that we will be re-doing the route planner on the site... it is still missed by many a user :)

To add to what richard said above, I think in addition to the data, there was some things slightly over-done in terms of processing a route by making other points on the waterways as being nodes which increased the calculation time and therefore reduced the user experience.

As regards to data, I think this is always going to be a moving target in terms of accuracy, not least in the naming of certain POI's or even waterways depending upon who you are talking to. We'll take the approach along the lines of just getting a base working model out there and build on it from there... much as I understand CPlan has done. :)

It would certainly be nice for the boating enthusiast to suddenly have a glut of planners after having only one initially.

I'll drop you an email so if you'd like to have a chat then we can. I'm sure there's plenty we ould leanr from each other.


At 11:02 PM, Anonymous paul said...

*blush* maybe I could learn to type and spell at the same time ;)

At 10:38 AM, Blogger Nick said...

I've got mixed feelings about this. Well, not so mixed really - a decent waterscape route planner terrifies me.

I'm one chap, with a busy life, doing this for fun. I get a bit of beer money out of the google adverts but that's it.

Waterscape are big, I don't know how big, but we've all heard the stories about the millions used to launch it and then written off. If you put a decent amount of effort into it you should be able to produce a much better effort (for a start, you can have programmers, authors, layout experts etc; I have to be jack of all trades, and I'm a lot better at some of these than others).

And you have publicity. The latest waterfront magazine contains over 20 uses of the waterscape url. You can trample me underfoot like an elephant tramples a worm any time you want.

I can't stop you, obviously. And I know there are ways that route planning could be done better (I'm very happy with the way the program works, less happy with the way it looks and with the consistency of the data). But helping waterscape to build a decent route planner does feel a bit like a turkey putting up the christmas decorations.

OTOH maybe in this world of modern web technologies, multiple planners isn't the way to go - it's a horribly wasteful approach after all - and mutual services that build on each other in different ways are (look at the wonderful things Google are making possible by taking this approach). Perhaps waterscape could provide a waterways database API and CanalplanAC a planning API (you send it a list of places and options, it sends back an XML route, you display it how you want).

We hear so much about choice being good these days, that I think we sometimes forget that one exceptional product is better than a choice between several mediocre ones.

At 10:20 PM, Anonymous paul said...

hey nick,
I tried emailing your freeserve address but got a bounce back as undeliverable. Is your "temporary-address" account still good?

Just in terms of your first part of your reply.

We may have been "big" in the past, but that was then. Today we are currently 4 people (2 editorial, 2 programmers / coders / monkeys) plus some time of the marketing service manager's when we need it. All other descriptions of the size and spend as of about 1 yr ago are incorrect. We had a temp SEO advisor who used to be part of the team but had deferred redundancy for 6 months up until August to finish some task that needed to be done, so I hope that clears things up for you.

I appreciate the concerns expressed, totally natural I think really. However, I'm here to try and build a relationship of some sort if you want it. I'd like to explain it fully via email if that's ok with you as it's a bit O/T for this blog post. You can find my email via the urw list if you search for my name (paul morgan) and view the profile or just reply back to this letting me know if the temp address is still good. Or just not reply at all and I'll get the message. :)



Post a Comment

<< Home