Emanuele Fumeo
University of Genoa
Speaking on 24 November during the Big Data in Railway Operations sessions


Emanuele Fumeo
University of Genoa

Train Delay Prediction System for Large-Scale Railway Networks based on Big Data Analysis


Big Data Analysis for Railways is becoming increasingly relevant in all its different possible applications. In this context, the goal of this paper is to apply Big Data Analysis to a specific case study focused on increasing the automation level of current train delay prediction systems, so to support TMS planning and dispatching activities.

Nowadays, current train routes planning procedures are based only on the theoretical values (e.g. time to cover the distance between stations, dwell times, and the like) to generate trains timetables, without considering possible recurrent delay situations that could be discovered by processing the historical data about train movements.

For example, data analysis can reveal that the same train usually arrives to a specific station slightly in late (on average) every day, consequently providing the possibility of updating the train trip scheduling to cope with that. Starting from this simple example, it is possible to notice how Big Data analysis could be successfully used to discover these kind of trends supporting the railway operators in the train trips planning. From one side, this solution could have an impact on the perceived quality of service from the users point of view, and, on the other side, it could provide a timetable that is less affected by recurrent delay causes.

Consequently, the automation increase in train delay prediction aims at improving the current basic, albeit robust, methodologies [5]. This improvement has been achieved by adopting the following three-steps approach:

  1. Problem formalization: the problem has been addressed as a time series forecast [6] [7] [8] [9], with the objective of predicting the delay of each train in all the subsequent stations of interest, with the highest possible accuracy and with an estimate of the forecasting accuracy itself.
  2. Data collection: for this specific case study, the data has been provided by Rete Ferroviaria Italiana (RFI) S.p.A., the Italian Infrastructure Manager (IM), which owns an historical database containing all the information about train movements for the entire Italian railway network. For future applications, this predictive technology could be directly implemented by RFI, or by other IMs, on their own information systems.
  3. Data processing: the data has been analysed by using state-of-art Big Data technologies, i.e. Apache Spark [12] on Apache Hadoop [13] [14], and by exploiting a well-known Machine Learning
    algorithm, the Extreme Learning Machines (ELMs) [10] [11], which has been adapted to exploit typical Big Data parallel architectures.

The described approach and the prediction system performance has been validated based on the real data provided by RFI, and through the comparison with the performance achieved by the current train delay prediction system on the same data. For this purpose, a set of novel KPIs agreed with the Infrastructure Manager has been designed and used. Results show that this new train delay prediction system can accurately predict the train delays, and that it is able to outperform current prediction systems.

View the topics of discussion and the speakers for each individual theme:

We look forward to welcoming you in Naples between 22 – 24 November for the Intelligent Rail Summit. 

Conference Brochure Register