Saturday, October 07, 2006

Places of Interest - or interesting places?

There are two fundamental approaches to mapping something like the waterways system: a topological one where you concentrate on the network structure - what is linked to where and how; and a geographical one where you concentrate on where things physically are.

The first of these means that your software is, at heart, an exercise in applied graph theory. The second means your program is a - specialised - geographic information system (a GIS).

CanalplanAC falls firmly into the first of these camps. The (apparently dead) Waterscape route planner gave every impression of being the second.

The problem is, neither of them can quite cut the mustard as you get into the details.

Waterscape's planner had the fundamental limitation that only one thing could be in one place at the same time. Which makes sense at first blush. But unfortunately it meant that every aqueduct was in fact a link between navigable waterways and it would cheerfully route you from Leigh to Manchester via the MSC - moving from one to the other at Barton.

Canalplan's approach - where places are nodes in a graph, and just happen to have some basic geographical information (in particular, their coordinates) bolted onto them is - I am convinced - the right approach. But it's not quite enough.

It starts to break down as you start to give more information about things around the waterways. In particular, as you get more and more places in the data, the idea that a pub - particularly one that isn't bang on the towpath - "belongs" to a particular place stops being sensible. You may have noticed that I've started to add more POI (Place Of Interest) data. In doing this, I've had to fiddle things by adding the POI items to all places within a certain range. This is inefficient - it's slow at build time, difficult to maintain if things change, very unamenable to user adjustment (something I want to continue to expand the use of) and just general not the Right Thing to do.

So I've started to migrate to a two layer approach. Waterways places will remain as a graph with coordinate information, but Places of Interest will live in a simple geographical database and be added as appropriate.

To help with this I've written a very nice (though I say it myself) tree based data structure and built a new sort of loop into the programming language that can search through several thousand places and find only those within a certain range of a place in almost no time at all. In the next release or so I'll have added this, and migrated all the existing pub, museum, shop etc data, so you won't see much difference, but it will all be working better.

It will then be that - as with photos - the data structure on the server is the true data, allowing people to modify and add places of interest with the changes taking place as soon as approved.

But, of course, it's never as simple as this. What, for example, is a boatyard? Is it a place on the canals, or is it a POI? If the former, then, unless each is added as its own place, we still have places with POI-type information associated with them. If the latter, then how will I ever implement "find nearest boatyard with gas for sale" for example?

I have a way to make the second of these work, but for the moment I'm suspending judgement on which way I will actually jump.

All of this is part - as I mentioned in the article about distances - of a gradual redesign of some of the underlying data structures. This is always a risky thing to do with something as large and evolved as CanalplanAC, but sometimes it just has to be done.

Canalplan is getting a bit long in the teeth and software rot is starting to set in in a few places. After all, I see from my change log that the gazetter, and the move to OS coordinates from purely arbitrary ones (what was I ever thinking of) came in in November 2000. In Internet terms, that's ancient. In that time the number of places in the database has gone up by an order of magnitude - and what worked fine then is really starting to creak. And there are new things out there. Satellite navigation systems having taken off means that geographical data is suddenly of use to lots of people, not just to a few wierdo programmers - and so it becomes available and affordable. I have to be able to move with this.

Sunday, October 01, 2006

Distances - not always as straightforward as you'd think

So I started putting distance table information in, and hit a snag along the Ashton when it started objecting to discrepencies between the distance table and the calculated distances.

Using a convenient Google Maps pedometer I crossed checked the distances.

For the fist 5 or 6 places it was really good:


PlaceDistance tablePedometer
Ducie Street, Manchester0.00.0
Lock No 1 Ancoats Junction.40.391
Lock No 3 Ancoats.50.495
Lock No 4 Bewicks Locks1.31.294
Lock No 8 Clayton Bottom Lock2.02.036
Clayton Junction2.42.485
Clayton Lock No 132.82.727

The last couple of these are showing slightly bigger errors, but it's still all well within what is needed for canal journeys. In fact, for Clayton Junction the data table merges it with Clayton Top Lock, which is a bit earlier, so the extra on the pedometer is explicable.
But now it starts to go wrong:
Clayton Top Lock No 163.53.033
We're out by nearly half a mile. Looking at any map it's clear that Clayton lock 13 is nowhere near that far from Clayton Junction. By the time we get to the end of this stretch, we're back on track
Fairfield Junction3.83.735


So the moral seems to be not to trust the distance tables explicitly. It also shows that the pedometer approach may well work for getting pretty exact distances for all places. Now I'm not doing that for all 7000 places, but it could get build into the lengthsman feature at some stage.

In passing, the overall discrepency between the pedometer and the distance table is 128 yards. At 3mph this is less than 1½ minutes! I can live with that.