Development of Large-scale Trip Analysis Toolkits for Vehicle-based GPS Trajectories using Apache Spark and Open Data: A Case Study of Taxis in Bangkok, Thailand
Urban planning and mobility analysis have traditionally been studied through observation or questionnaires, which can be time-consuming and costly. However, the rapid advancement of technology has enabled tracking devices to be installed in individual vehicles, allowing the measurement of various values, particularly global positioning system (GPS) signals.
The location data collected is accurate, regularly updated, and can offer valuable insights into people's movements and behavior. Because the amount of trajectory data is substantial and continues to increase over time, specialized platforms and skills are needed for its analysis.
In this study, we developed large-scale analysis toolkits to extract insights, including trip statistics, origin–destination analysis, and hotspot identification from vehicle-based GPS trajectories. The toolkits are specifically designed to handle large-scale datasets using Apache Spark, an analytics engine capable of processing large volumes of data by distributing tasks across a Hadoop cluster for efficient processing.
Algorithms for the analytics model were created to reconstruct trips based on their type of mobility, and trip locations were mapped using open data such as administrative boundaries and points of interest. We then verified our approach using real-world taxi data from Bangkok, Thailand.
The results revealed that taxis had more vacant trips than busy trips, and the travel time and distance taken to search for passengers were longer than those taken to pick them up and drop them off. Taxi activity was concentrated in the city center and nearby areas, particularly those within the vicinity of transport-connecting hubs. Taxi stay hotspots were mainly areas near tourist attractions and parking hubs.
Furthermore, we found that the processing performance of the proposed approach increased with the number of executor cores. This study comprehensively presented information on taxi travel patterns, service availability, hotspots, and processing performance using the developed trip analysis toolkits.