kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Quantifying Variable Contributions to Bus Operation Delays Considering Causal Relationships
KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering, Transport planning.ORCID iD: 0000-0001-9990-4269
KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering, Transport planning.ORCID iD: 0000-0002-2141-0389
KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering, Transport planning.ORCID iD: 0009-0006-2697-9810
Department of Architecture and Civil Engineering, Chalmers University of Technology, .
Show others and affiliations
2025 (English)In: Transportation Research, Part E: Logistics and Transportation Review, ISSN 1366-5545, Vol. 194, p. 103881-Article in journal, Editorial material (Refereed) Published
Abstract [en]

Buses in public transit networks often face operational delays due to dynamic conditions such as traffic congestion, which can propagate through transit routes, affecting overall system performance. Understanding the causes of bus arrival delays is crucial for effective public transport management and control. Moreover, understanding the contribution of each factor to bus delays not only aids in developing targeted strategies to mitigate delays but is also crucial for effective decision-making and planning. Traditional research primarily focuses on correlation-based analysis, lacking the ability to reveal the underlying causal mechanisms. Additionally, no studies have considered the complex causal relationships between factors when quantifying their contributions to outcomes in public transport. This study aims to analyze the factors causing bus arrival delays from a causal perspective, focusing on quantifying the causal contribution of each factor while considering their causal relationships. Quantifying a factor's causal contribution poses challenges due to computational complexity and statistical bias from the limited sample size. Using a causal discovery method, this study generates a causal graph for bus arrival delays and employs the causality-based Shapley value to quantify the contribution of each variable. The study further uses the Double Machine Learning (DML) approach to estimate the causal contributions, which provides a consistent and computationally feasible method. A case study was conducted using Google Transit Feed Specification (GTFS) data, focusing on high-frequency bus routes in Stockholm, Sweden. To validate the model, cross-validation was performed by comparing variable importance rankings with traditional models, including Linear Regression (LR) and Structural Equation Modeling (SEM). The comparison shows that results from the causality-based Shapley value significantly differ from those obtained by traditional methods in terms of importance rankings and influence magnitudes. The findings underscore the significant impact of origin delays on bus punctuality, a factor often underestimated in previous studies. Additionally, it demonstrates that employing a causal discovery model can not only infer causal relationships but also reveal direct and indirect effects, which can provide more intuitive explanations. Finally, although the causal results are mathematically and intuitively sound, it is important to further investigate the real causality impact in practice using lab experiments or A/B tests in real-world settings.

Place, publisher, year, edition, pages
Elsevier BV , 2025. Vol. 194, p. 103881-
Keywords [en]
Explainable AI; Causal graph discovery; Shapley value; Urban transit; GTFS data
National Category
Transport Systems and Logistics
Identifiers
URN: urn:nbn:se:kth:diva-357042DOI: 10.1016/j.tre.2024.103881ISI: 001373407200001Scopus ID: 2-s2.0-85211053455OAI: oai:DiVA.org:kth-357042DiVA, id: diva2:1917737
Note

QC 20241206

Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2025-01-28Bibliographically approved
In thesis
1. Data-Driven Graphical Modelling and Applications in Public Transportation
Open this publication in new window or tab >>Data-Driven Graphical Modelling and Applications in Public Transportation
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Efficient public transportation is crucial for reducing traffic congestion, cutting carbon emissions, and ensuring fair access to jobs and services. With modern technology, we now have access to large amounts of public transport data, including passenger movements, vehicle trajectories, and other sensor-generated information. The knowledge hidden behind this data has significant potential to enhance transportation planning, operations, and control. However, effectively representing and organizing, as well as extracting useful information from such data to address public transportation issues remains challenging.  

Graphical models have gained significant attention for their strengths in data representation, knowledge interconnection, and complex structure visualization. Notably, knowledge graphs and causal graphs are two distinct types of graphical models and are widely applied in various domains (e.g., social network analysis, drug discovery, and recommendation systems, etc.). Knowledge graphs are good at organizing and connecting massive amounts of data and knowledge, revealing complex relationships, and enabling knowledge mining and inference (answering `what' and `how' questions). Causal graphs are powerful for identifying and analyzing causal relationships, allowing for a deeper understanding of the underlying mechanisms that drive observed data patterns  (answering `why' questions). 

Specifically, the thesis aims to propose two data-driven graphical models (i.e., the knowledge graph and causal graph) and explore their application scenarios in public transportation. It constructs a mobility knowledge graph to represent and organize mobility data, mine travel patterns between stations, and validate its value in trip destination inference and user-station attention estimation. Then, to gain a deeper understanding of transportation operations, the thesis develops causal discovery models for static data to infer causal relationships and generate causal graphs to analyse the variables causing bus delays. Based on the causal graph, it quantifies the contribution of each variable while considering the causal relationships to support the development of target strategies to mitigate delays. Additionally, the thesis also develops a time series causal discovery model to understand bus delay propagation patterns and effects within the public transportation system from a system perspective.

Papers I and II focus on data organization and knowledge inference, construct a mobility knowledge graph (MKG), and explore its applications in public transportation. Paper I introduces the concept of MKG and proposes a framework for constructing it from smart card data by capturing spatiotemporal travel patterns between stations using both rule-based and neural network-based decomposition methods. It validates the MKG framework and demonstrates its value in inferring trip destinations using only tap-in records. Paper II explores another transportation application, proposing a method to estimate the `real' user-station attention from partially observed station visit counts data. It utilizes the MKG to capture latent spatiotemporal travel dependencies between stations to enhance the estimation process by addressing missing values and cold start problems. The framework is validated with both synthetic and real-world data, demonstrating the value of MKG in user-station attention estimation.

Papers IV-VI focus on the research of causal graphs and their applications in public transportation. Before conducting the causal analysis for bus delay, Paper III conducts an empirical study examining the heterogeneous effects of various factors on bus arrival delays. Paper IV focuses on the operational variables and develops causal discovery methods for static data to analyse the variables causing bus delays and evaluate their performance from statistical data fitting and causality interpretation perspectives. It identifies the optimal causal discovery method for analysing the causes of bus delays. Further, based on the causal graph generated in Paper IV, Paper V develops a causality-based Shapley value approach to quantify the contribution of each variable to bus delays to support efficient transportation decision-making. The results are cross-validated with the conventional model (e.g., regression models) to reveal the difference between correlation-based and causality-based analysis approaches. Moreover, Paper VI develops a time series causal discovery model to infer causal relationships between bus stops and generate the spatiotemporal delay propagation causal graph from time series bus stop delay data. Then, it incorporates complex network theory to analyse the bus delay propagation patterns and effects within the public transportation system. 

Abstract [sv]

Effektiv kollektivtrafik är avgörande för att minska trängsel, minska koldioxidutsläppen och säkerställa rättvis tillgång till jobb och tjänster. Med modern teknik har vi nu tillgång till stora mängder kollektivtrafikdata, inklusive passagerarrörelser, fordonsrörelser och sensorgenererad information. Den kunskap som döljs bakom dessa data har stor potential att förbättra transportplanering, drift och styrning. Att effektivt representera och organisera, samt att extrahera användbar information från sådan data för att ta itu med kollektivtrafikproblem är fortfarande en utmaning. 

Grafiska modeller har fått stor uppmärksamhet för sina styrkor inom datarepresentation, kunskapssammankoppling och visualisering av komplexa strukturer. Kunskapsgrafer och kausala grafer är två distinkta typer av grafiska modeller och allmänt tillämpade inom olika domäner (t.ex. sociala nätverksanalyser, läkemedelsutveckling och rekommendationssystem, etc.). Kunskapsgrafer är bra på att organisera och koppla samman enorma mängder data och kunskap, avslöja komplexa samband och möjliggöra kunskapsutvinning och inferens (svara på "vad" och "hur"-frågor). Kausala grafer är kraftfulla för att identifiera och analysera orsakssamband, vilket möjliggör en djupare förståelse av de underliggande mekanismerna som driver observerade datamönster (svara på "varför"-frågor). 

Specifikt syftar avhandlingen till att föreslå två datadrivna grafiska modeller (d.v.s. kunskapsgrafen och kausalgrafen) och utforskar deras tillämpningsscenarier i kollektivtrafiken. Den konstruerar en mobilitetskunskapsgraf för att representera och organisera mobilitetsdata, bryta färdmönster mellan stationer och validera dess värde i slutledning av resemål och uppskattning av användarstations uppmärksamhet. Sedan, för att få en djupare förståelse av transportoperationer, utvecklar avhandlingen kausala upptäcktsmodeller för statisk data för att sluta sig till orsakssamband och generera kausala grafer för att analysera variablerna som orsakar bussförseningar. Baserat på kausalgrafen kvantifierar den bidraget från varje variabel samtidigt som orsakssambanden beaktas för att stödja utvecklingen av målstrategier för att mildra förseningar. Dessutom utvecklar avhandlingen också en tidsseriemodell för orsaksupptäckt för att förstå bussfördröjningsutbredningsmönster och effekter inom kollektivtrafiksystemet ur ett systemperspektiv.

Paper I och II fokuserar på dataorganisation och kunskapsinferens, och konstruerar en mobilitetskunskapsgraf (MKG) och utforskar dess tillämpningar i kollektivtrafik. Artikel I introducerar konceptet MKG och föreslår ett ramverk för att konstruera det från smartkortdata genom att fånga spatiotemporala färdmönster mellan stationer med både regelbaserade och neurala nätverksbaserade nedbrytningsmetoder. Det validerar MKG-ramverket och demonstrerar dess värde i att sluta resmål med hjälp av enbart tap-in-poster. Paper II utforskar en annan transportapplikation, och föreslår en metod för att uppskatta den "riktiga" användarstationens uppmärksamhet från delvis observerade stationsbesöksdata. Den använder MKG för att fånga latenta spatiotemporala resorberoenden mellan stationer för att förbättra uppskattningsprocessen genom att ta itu med saknade värden och kallstartsproblem. Ramverket är validerat med både syntetiska och verkliga data, vilket visar värdet av MKG vid uppskattning av användarstations uppmärksamhet.

Paper IV-VI fokuserar på forskning av kausala grafer och deras tillämpningar i kollektivtrafiken. Innan man genomför orsaksanalysen för bussförseningar, genomför Paper III en empirisk studie som undersöker de heterogena effekterna av olika faktorer på bussens ankomstförseningar operativa variabler och utvecklar kausala upptäcktsmetoder för statiska data för att analysera de variabler som orsakar bussförseningar och utvärdera deras prestanda utifrån statistisk dataanpassning och kausalitetstolkningsmetoden för att analysera orsakerna till bussförseningar kausal graf som genereras i Paper IV, Paper V utvecklar en kausalitetsbaserad Shapley-värdesmetod för att kvantifiera bidraget från varje variabel till bussförseningar för att stödja effektivt transportbeslut. Resultaten korsvalideras med den konventionella modellen (t.ex. regressionsmodeller ) för att avslöja skillnaden mellan korrelationsbaserade och kausalitetsbaserade analysmetoder. Dessutom utvecklar Paper VI en tidsseriekausal upptäcktsmodell för att sluta sig till orsakssamband mellan busshållplatser och generera den spatiotemporala fördröjningsutbredningens kausala grafen från tidsseriens busshållplatsfördröjningsdata. Sedan införlivar den komplex nätverksteori för att analysera bussfördröjningens utbredningsmönster och effekter inom kollektivtrafiksystemet.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. 58
Series
TRITA-ABE-DLT ; 2437
Keywords
Graphical model, data-driven, knowledge graph, causal graph, public transportation., Grafisk modell, datadriven, kunskapsgraf, kausal graf, kollektivtrafik
National Category
Transport Systems and Logistics
Research subject
Transport Science, Transport Systems
Identifiers
urn:nbn:se:kth:diva-357044 (URN)978-91-8106-150-5 (ISBN)
Public defence
2025-01-17, F3, Lindstedtsvägen 26, KTH Campus, public video conference link https://kth-se.zoom.us/j/67216916457, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20241203

Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2025-03-24Bibliographically approved

Open Access in DiVA

fulltext(4636 kB)60 downloads
File information
File name FULLTEXT01.pdfFile size 4636 kBChecksum SHA-512
6cc49762b9e014d75a17cf71d107bb7f267d07150555a39b2dd36ed60e99589fc80b6ebe5aef94362847424922f08eb54c39132d4656f4f2ca52c8c4314ec63c
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Zhang, QiMa, ZhenliangWu, Yuanyuan

Search in DiVA

By author/editor
Zhang, QiMa, ZhenliangWu, Yuanyuan
By organisation
Transport planning
Transport Systems and Logistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 60 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 171 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf