- Real-time GTFS
- GPS positions
- Official colors of bus lines
Some data sets seem harder to use :
- number of passengers that enters a station
- air quality in an underground station
Let’s see how the open data could be used.
Data set : GTFS
No big surprise here. The vast majority of the mobile apps that help people travel inside city borders use GTFS information to an important degree. However, the GTFS data sets are not standard. I understand that manpower is important and the open data is a gift to the community. However, there are several things that could change:
- the exception calendar, which is far too big. Most of the bus lines are striving to respect the timetables and the drivers are great at doing it. I appreciate a lot this. The format of the data and information itself are two different things.
- The timetables for bus lines that are in service until after 12 a.m. (past midnight). The GTFS standard suggests to use 24 hours notation and to mark hours past 12 a.m. by adding 24. For example, 00:30 would become 24:30. The data is workable because of its redundancy. The stop_sequence column from the stop_times file helps a lot here.
These are not big changes, and for the moment being they are not a problem for me. In fact, I am working on a filter that could be applied to the GTFS data set in order to make it compatible with other tools.
Data set: real-time GTFS API
The API uses SOAP. A very mature technology, SOAP has proven very useful in the past. However, when it comes to data traffic, the envelope (which is redundant information) is much more important than the useful part of the data. I suppose that some filtering is necessary. In the vast majority of the cases, the volume of the relevant data is small enough to be sent via Twitter.
Data set: GPS positions
There are several data sets:
- train and bus stops
- public toilette cabins
The train and bus stops are a subset of the GTFS data set. It is nice to have them, as most people don’t need to download a big zip file. On the other side, the toilette cabins are a very useful data set. I like the additional information such as the open/close status of the cabins.
Data set: official colors
This data set contains the official logos for every bus and train line. I would have preferred to know the exact RGB color of the lines. Fortunately, the graphic image files contain this piece of information. With the exception of some special lines, a local image processing filter is enough to extract the exact color value. I will talk about it in a later post.