The Value of Historical Transit Data When it Comes to Machine Learning

I’m working through the different ways that transit authorities can generate more revenue from their data using APIs as part of my work with Streamdata.io. Making data streaming and truly more real time is the obvious goal of this research, but Streamdata.io is invested in transit authorities take more control over their data resources, and use APIs to generate revenue at a time when they need all the revenue they can possible get their hands on.

One overlap in the projects I’m working on with Streamdata.io is where transit data intersects with machine learning, and artificial intelligence. I’m not sure what transit authorities are doing with their historical data, but I know that it isn’t available via their APIs, and developer portals. I’m guessing they see historical data about schedules, vehicles, riderships, and other data points as a burden, and once they’ve generated the reports they need, don’t do anything else with it. This historical data is a goldmine of information when it comes to training machine learning models, which could then in turn be better used to understand ridership, make predictions, understand maintenance, scheduling, and other aspects of transit operations–let alone commerce, real estate, and other demographic data.

There is a dizzying amount of investment going into machine learning and artificial intelligence right now, and is something that could be routed to transit authorities to help boost revenue. If all historical data on transit operations was digitized and available via APIs, then metered using modern API management approaches, it could be an entirely new revenue opportunity for transit authorities. Transit systems are the heartbeat of the cities they operate within, and historical data is the record of everything that occurs, which can be used to develop machine learning models for the transit industry, as well as real estate, commerce, and other sectors that transit systems feed into on a daily basis, and have for years.

I do not know what data transit authorities possess. I don’t know how much historical data they keep around, and what is required by government regulators, but I do know whatever there is, it has value. I’ve studied how API management is being used by tech companies for almost 8 years now, and it is how value is created, and revenue is generated, something that transit authorities and leadership needs to realize applies to them in a digital age. They are sitting on a wealth of historical data that would be of value to tech companies who are already mining their existing schedules, and real time vehicle data. Historical transit data, and machine learning just represents one of many opportunities on the table for transit authorities to tap when it comes to looking for new revenue opportunities in the future.