Check out the traffic flow state in Belgium where measured (15min window). I went ahead on the concept and built a real one, while building a mobile HTML5 application. You can see the page full size here. Click the house , bottom right to zoom to the max extent of the traffic layer.

I’m trying to create a traffic layer using spatial SQL, nominatim, spatialite / postGIS. The concept is quite simple, lets focus on the main roads in Belgium. The motorways, primary roads, trunks and their links. In terms of OSM, this concerns the following classes : (‘motorway’,’motorway_link’,’trunk’,’trunk_link’,’primary’,’primary_link’)

To know how well traffic is flowing on a certain road, you need atleast the following:
– sample data. I have about 3000 devices sending data over GPRS. This is real live information I can use to see how fast these vehicles are going
– a road network map. (spatialite or nominatim create databases that can be used for this). Overpass is also able to export this data.
– match the GPS samples to a certain road. This is the spatial part of this exercise.
– the maximum speed allowed on that road, again OSM to the rescue

The source data is easy to obtain. Since I’m only interested in the major roads I started out using an Overpass query to filter the OSM dataset. The following query gave me what I wanted. This query takes some time to finish. Splitting up the BBOX and doing it in a few turns (merging that data is trivial) goes a lot faster. See the overpass query to obtain major roads.

<osm-script output="xml" timeout="250">
  <!-- gather results -->
    <query type="way">
      <has-kv k="highway" regv="motorway|trunk|motorway_link|trunk_link|primary|primary_link"/>
      <bbox-query {{bbox}}/>
  <!-- print results -->
  <print mode="meta"/>
  <recurse type="down"/>
  <print mode="meta" order="quadtile"/>

Using this data you can create a spatialite sqlite database. I’m a big sqlite fan, in fact. sqlite is incredible. But with GIS extensions it becomes beyond incredible. Very performant too as I will show later. Use spatialite to create a road network map.

$ spatialite_osm_net -o primary_roads_benelux.osm -d roads_net.sqlite -jo -T roads

Open this file up with spatialite-gui or spatialite and create a spatial index on the ‘roads’ table. The gui takes care of it, it’s not a classical index. If everything is ok, you’ll have these tables:

spatialite> .tables
SpatialIndex               roads                    
geom_cols_ref_sys          roads_net                
geometry_columns           roads_net_data          
geometry_columns_auth      roads_nodes              
idx_roads_geometry         spatial_ref_sys          
idx_roads_geometry_node    spatialite_history      
idx_roads_geometry_parent  views_geometry_columns  
idx_roads_geometry_rowid   virts_geometry_columns

So knowing we are getting a decent data feed (recent GPS coordinates), we can actually start figuring out which road segments (or osm_id’s) we are driving on. (not entirely correct, we’ll get to the caveats later). It’s really simple: we need some kind of reverse geocoding of the coordinate. But we are not interested in address data, I want to know what way the coordinate maps to. How can we find out? The idea is still simple. I want to know how far away the point is from the ‘known roadmap’. the datafeed can present me plenty of points we’re not interested in, minor roads, off-road.

We want to know the closest point to a LINESTRING that represents the way, more specifically we want to know the closest point of a way interpolated from our source point. Lets demonstrate.

Here you see how this goes for a certain coordinate: GeomFromText(‘POINT(4.4447233333333 51.07015)’,4326) This coordinate is represented by the partially hidden airplane which I couldn’t bring to the front in QGIS for some stupid reason. The vehicle is driving on the highway. The dots represent the interpolated closest points for each ‘highway’ within the parameters of the query. The grey lines are the 2 closest matches. It’s a perfect result. Other roads to the right aren’t close enough.

Lets show this without the grey lines. As you can see, the sample data is pretty accurate vs the OSM data. We’ll see by how much it’s off soon.

Calculating this in spatialite is easy but to make it really fast I’m doing it like this, since sqlite doesn’t support variable, I’m using a temp table to mimic this. There are linear referencing functions in PostGIS to help you out project the location of a point along a line. For example, if you have a road, and a point of interest (POI) that is along the side of the road, you can:

  • Use ST_Line_Locate_Point to get the fraction along the road closest to the POI; then
  • Use ST_Line_Interpolate_Point to interpolate the POI on the road.

to get the closest point projected on the road. That point is the shortest distance to the way we wan’t to findout.

PRAGMA temp_store = 2;
.mode line

INSERT INTO _Variables (Name) VALUES ('origin');
INSERT INTO _Variables (Name) VALUES ('origin_t');

UPDATE _Variables SET Geometry = GeomFromText('POINT(4.4447233333333 51.07015)',4326) WHERE Name = 'origin';
UPDATE _Variables SET Geometry = Transform(GeomFromText('POINT(4.4447233333333 51.07015)',4326),3035) WHERE Name = 'origin_t';

SELECT osm_id,name,NumPoints(Geometry), GLength(Geometry), Dimension(Geometry), GeometryType(Geometry),
 Distance((SELECT Geometry FROM _Variables WHERE Name = 'origin_t' LIMIT 1) , Line_Interpolate_Point(Transform(Geometry,3035), ST_Line_Locate_Point(Transform(Geometry,3035), (SELECT Geometry FROM _Variables WHERE Name = 'origin_t' LIMIT 1) ))) AS distance
FROM roads
WHERE PtDistWithin((SELECT Geometry FROM _Variables WHERE Name = 'origin_t' LIMIT 1), Transform( Line_Interpolate_Point(Geometry, ST_Line_Locate_Point(Geometry, (SELECT Geometry FROM _Variables WHERE Name = 'origin' LIMIT 1))) ,3035), 15);

DROP TABLE _Variables;
PRAGMA temp_store = 0;
glenn@slicky:~/OSM_PROJECT/osm2sqlite$ spatialite roads.sqlite < tt.sql
SpatiaLite version ..: 3.1.0-RC2
                osm_id = 146171802
                  name = *** Unknown ****
   NumPoints(Geometry) = 10
     GLength(Geometry) = 0.024435777321786
   Dimension(Geometry) = 1
GeometryType(Geometry) = LINESTRING
              distance = 6.04228534045956

And there we go: 6.0422 meters from this road, I think it’s safe to say we are traveling on that road. You could use nominatim reverse geocoding to confirm this too. But that could return a point, or a building with an address (it’s made for that), we need to know the ways only. And not all of them, a subset. In fact, in nominatim postgresql database we can find out the same, we just need to create a new table to make is a tad faster. My benelux placex table is 5 million records big, this creates a subset of it.

CREATE TABLE hw_placex AS (SELECT * FROM placex WHERE TYPE IN ('motorway','motorway_link','trunk','trunk_link','primary','primary_link') AND class='highway' AND GeometryType(geometry)='LINESTRING');

I initially forgot to do the following steps, create a spatial index on the geometry table. This should speed up things a lot, at this point I prefer the power of postgresql. It looks like sqlite3 isn’t lagging fat behind in performance though. If you happen to have a nominatim server around (as I do), it’s trivial to start using it instead of the spatialite implementation. Less setup work to do. Now about that index

   ON hw_placex USING gist (geometry);

Now we have a table with only the main roads, indexed… for postgresql to return about the same answer as spatialite does, we use a different query but the same approach still stands.

SELECT place_id, parent_place_id, rank_search, osm_id,class, TYPE, name, GeometryType(geometry) , ST_distance(Transform(GeomFromText('POINT(4.4447233333333 51.07015)',4326),3035), Line_Interpolate_Point(Transform(geometry,3035), ST_Line_Locate_Point(Transform(geometry,3035), Transform(GeomFromText('POINT(4.4447233333333 51.07015)',4326),3035)))) AS distance,
ST_distance(GeomFromText('POINT(4.4447233333333 51.07015)', 4326), Line_Interpolate_Point(geometry, ST_Line_Locate_Point(geometry, GeomFromText('POINT(4.4447233333333 51.07015)', 4326)))) AS distance_degrees
FROM hw_placex
WHERE ST_DWithin(Transform(GeomFromText('POINT(4.4447233333333 51.07015)',4326),3035), Line_Interpolate_Point(Transform(geometry,3035), ST_Line_Locate_Point(Transform(geometry,3035), Transform(GeomFromText('POINT(4.4447233333333 51.07015)',4326),3035))) ,15)
ORDER BY distance ASC
postgres@osm:~$ psql -d nominatim -f query.sql
 place_id | parent_place_id | rank_search |  osm_id   |  class  |   type   |                      name                       | geometrytype |     distance     |   distance_degrees  
  2791892 |         3520428 |          26 | 146171802 | highway | motorway | "ref"=>"E19", "int_ref"=>"E19", "nat_ref"=>"A1" | LINESTRING   | 6.04228534045956 | 8.52278872976028e-05
(1 row)

A few things to know, the second distance is in degrees, but us humans think in meter, hence all the conversion work is being done by ‘Transform’. By the looks of it, I think postgresql pretty much agrees with spatialite on the results, both the OSM id and the distance from it. I have my limits on 15 meters, it’s pretty high probably but when it comes down to GPS, you never know where you truly are without compensating for the intentional inaccuracies. Nevertheless, in this case, i’m pretty sure it’s correct to say the traveler is on the E19 going north.

now we know the osm_id, we can get to it’s tags, we’ll need to know the max_speed somehow. We could query the roads_raw.sqlite created by spatialite_raw from our overpass export, or we can stick to nominatim and figure out the tags there. lets try spatialite:

spatialite> glenn@slicky:~/OSM_PROJECT/osm2sqlite$ spatialite roads_raw.sqlite
Enter ".help" for instructions
spatialite> select * from osm_way_tags where way_id=146171802;

Allright, motorway. Since we are not in Germany (yet) , there is a limit to the speed and it’s 120Km/h . Perfect. Now all we need to do is go back to our GPS position datafeed, verify what the speed is of that vehicle and match the allowed speed, and mark accordingly.
It’s not rocket science, but then again, a single sample does’t tell you that much about a road(segment). We could overlay a map with the outcome of our analysis. Or we can do some more logic in trying to build up something more solid using several datasets. Once you have enough samples, you know what goes on. But at night, we the few samples we get there we need to do some guesswork.

conclusion, to visually display a traffic layer, we will need a large enough datafeed. We can use our samples to display the affected ways in a certain color depending on their availability and/or on speed conditions. Of course one would add enough statistical formula’s to compensate for error margins. The data I’m using is pretty much well prefiltered. Only moving GPS coordinates will be shown. And the coordinate quality is also monitored, the raw feed has ways to verify how good a sample was, before even being considered to be used to build up a traffic layer it has to pass checks.