kth.sePublications
12345674 of 17
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data-Driven Graphical Modelling and Applications in Public Transportation
KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering, Transport planning.ORCID iD: 0000-0001-9990-4269
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Efficient public transportation is crucial for reducing traffic congestion, cutting carbon emissions, and ensuring fair access to jobs and services. With modern technology, we now have access to large amounts of public transport data, including passenger movements, vehicle trajectories, and other sensor-generated information. The knowledge hidden behind this data has significant potential to enhance transportation planning, operations, and control. However, effectively representing and organizing, as well as extracting useful information from such data to address public transportation issues remains challenging.  

Graphical models have gained significant attention for their strengths in data representation, knowledge interconnection, and complex structure visualization. Notably, knowledge graphs and causal graphs are two distinct types of graphical models and are widely applied in various domains (e.g., social network analysis, drug discovery, and recommendation systems, etc.). Knowledge graphs are good at organizing and connecting massive amounts of data and knowledge, revealing complex relationships, and enabling knowledge mining and inference (answering `what' and `how' questions). Causal graphs are powerful for identifying and analyzing causal relationships, allowing for a deeper understanding of the underlying mechanisms that drive observed data patterns  (answering `why' questions). 

Specifically, the thesis aims to propose two data-driven graphical models (i.e., the knowledge graph and causal graph) and explore their application scenarios in public transportation. It constructs a mobility knowledge graph to represent and organize mobility data, mine travel patterns between stations, and validate its value in trip destination inference and user-station attention estimation. Then, to gain a deeper understanding of transportation operations, the thesis develops causal discovery models for static data to infer causal relationships and generate causal graphs to analyse the variables causing bus delays. Based on the causal graph, it quantifies the contribution of each variable while considering the causal relationships to support the development of target strategies to mitigate delays. Additionally, the thesis also develops a time series causal discovery model to understand bus delay propagation patterns and effects within the public transportation system from a system perspective.

Papers I and II focus on data organization and knowledge inference, construct a mobility knowledge graph (MKG), and explore its applications in public transportation. Paper I introduces the concept of MKG and proposes a framework for constructing it from smart card data by capturing spatiotemporal travel patterns between stations using both rule-based and neural network-based decomposition methods. It validates the MKG framework and demonstrates its value in inferring trip destinations using only tap-in records. Paper II explores another transportation application, proposing a method to estimate the `real' user-station attention from partially observed station visit counts data. It utilizes the MKG to capture latent spatiotemporal travel dependencies between stations to enhance the estimation process by addressing missing values and cold start problems. The framework is validated with both synthetic and real-world data, demonstrating the value of MKG in user-station attention estimation.

Papers IV-VI focus on the research of causal graphs and their applications in public transportation. Before conducting the causal analysis for bus delay, Paper III conducts an empirical study examining the heterogeneous effects of various factors on bus arrival delays. Paper IV focuses on the operational variables and develops causal discovery methods for static data to analyse the variables causing bus delays and evaluate their performance from statistical data fitting and causality interpretation perspectives. It identifies the optimal causal discovery method for analysing the causes of bus delays. Further, based on the causal graph generated in Paper IV, Paper V develops a causality-based Shapley value approach to quantify the contribution of each variable to bus delays to support efficient transportation decision-making. The results are cross-validated with the conventional model (e.g., regression models) to reveal the difference between correlation-based and causality-based analysis approaches. Moreover, Paper VI develops a time series causal discovery model to infer causal relationships between bus stops and generate the spatiotemporal delay propagation causal graph from time series bus stop delay data. Then, it incorporates complex network theory to analyse the bus delay propagation patterns and effects within the public transportation system. 

Abstract [sv]

Effektiv kollektivtrafik är avgörande för att minska trängsel, minska koldioxidutsläppen och säkerställa rättvis tillgång till jobb och tjänster. Med modern teknik har vi nu tillgång till stora mängder kollektivtrafikdata, inklusive passagerarrörelser, fordonsrörelser och sensorgenererad information. Den kunskap som döljs bakom dessa data har stor potential att förbättra transportplanering, drift och styrning. Att effektivt representera och organisera, samt att extrahera användbar information från sådan data för att ta itu med kollektivtrafikproblem är fortfarande en utmaning. 

Grafiska modeller har fått stor uppmärksamhet för sina styrkor inom datarepresentation, kunskapssammankoppling och visualisering av komplexa strukturer. Kunskapsgrafer och kausala grafer är två distinkta typer av grafiska modeller och allmänt tillämpade inom olika domäner (t.ex. sociala nätverksanalyser, läkemedelsutveckling och rekommendationssystem, etc.). Kunskapsgrafer är bra på att organisera och koppla samman enorma mängder data och kunskap, avslöja komplexa samband och möjliggöra kunskapsutvinning och inferens (svara på "vad" och "hur"-frågor). Kausala grafer är kraftfulla för att identifiera och analysera orsakssamband, vilket möjliggör en djupare förståelse av de underliggande mekanismerna som driver observerade datamönster (svara på "varför"-frågor). 

Specifikt syftar avhandlingen till att föreslå två datadrivna grafiska modeller (d.v.s. kunskapsgrafen och kausalgrafen) och utforskar deras tillämpningsscenarier i kollektivtrafiken. Den konstruerar en mobilitetskunskapsgraf för att representera och organisera mobilitetsdata, bryta färdmönster mellan stationer och validera dess värde i slutledning av resemål och uppskattning av användarstations uppmärksamhet. Sedan, för att få en djupare förståelse av transportoperationer, utvecklar avhandlingen kausala upptäcktsmodeller för statisk data för att sluta sig till orsakssamband och generera kausala grafer för att analysera variablerna som orsakar bussförseningar. Baserat på kausalgrafen kvantifierar den bidraget från varje variabel samtidigt som orsakssambanden beaktas för att stödja utvecklingen av målstrategier för att mildra förseningar. Dessutom utvecklar avhandlingen också en tidsseriemodell för orsaksupptäckt för att förstå bussfördröjningsutbredningsmönster och effekter inom kollektivtrafiksystemet ur ett systemperspektiv.

Paper I och II fokuserar på dataorganisation och kunskapsinferens, och konstruerar en mobilitetskunskapsgraf (MKG) och utforskar dess tillämpningar i kollektivtrafik. Artikel I introducerar konceptet MKG och föreslår ett ramverk för att konstruera det från smartkortdata genom att fånga spatiotemporala färdmönster mellan stationer med både regelbaserade och neurala nätverksbaserade nedbrytningsmetoder. Det validerar MKG-ramverket och demonstrerar dess värde i att sluta resmål med hjälp av enbart tap-in-poster. Paper II utforskar en annan transportapplikation, och föreslår en metod för att uppskatta den "riktiga" användarstationens uppmärksamhet från delvis observerade stationsbesöksdata. Den använder MKG för att fånga latenta spatiotemporala resorberoenden mellan stationer för att förbättra uppskattningsprocessen genom att ta itu med saknade värden och kallstartsproblem. Ramverket är validerat med både syntetiska och verkliga data, vilket visar värdet av MKG vid uppskattning av användarstations uppmärksamhet.

Paper IV-VI fokuserar på forskning av kausala grafer och deras tillämpningar i kollektivtrafiken. Innan man genomför orsaksanalysen för bussförseningar, genomför Paper III en empirisk studie som undersöker de heterogena effekterna av olika faktorer på bussens ankomstförseningar operativa variabler och utvecklar kausala upptäcktsmetoder för statiska data för att analysera de variabler som orsakar bussförseningar och utvärdera deras prestanda utifrån statistisk dataanpassning och kausalitetstolkningsmetoden för att analysera orsakerna till bussförseningar kausal graf som genereras i Paper IV, Paper V utvecklar en kausalitetsbaserad Shapley-värdesmetod för att kvantifiera bidraget från varje variabel till bussförseningar för att stödja effektivt transportbeslut. Resultaten korsvalideras med den konventionella modellen (t.ex. regressionsmodeller ) för att avslöja skillnaden mellan korrelationsbaserade och kausalitetsbaserade analysmetoder. Dessutom utvecklar Paper VI en tidsseriekausal upptäcktsmodell för att sluta sig till orsakssamband mellan busshållplatser och generera den spatiotemporala fördröjningsutbredningens kausala grafen från tidsseriens busshållplatsfördröjningsdata. Sedan införlivar den komplex nätverksteori för att analysera bussfördröjningens utbredningsmönster och effekter inom kollektivtrafiksystemet.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. , p. 58
Series
TRITA-ABE-DLT ; 2437
Keywords [en]
Graphical model, data-driven, knowledge graph, causal graph, public transportation.
Keywords [sv]
Grafisk modell, datadriven, kunskapsgraf, kausal graf, kollektivtrafik
National Category
Transport Systems and Logistics
Research subject
Transport Science, Transport Systems
Identifiers
URN: urn:nbn:se:kth:diva-357044ISBN: 978-91-8106-150-5 (print)OAI: oai:DiVA.org:kth-357044DiVA, id: diva2:1917755
Public defence
2025-01-17, F3, Lindstedtsvägen 26, KTH Campus, public video conference link https://kth-se.zoom.us/j/67216916457, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20241203

Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2024-12-03Bibliographically approved
List of papers
1. Mobility knowledge graph: review and its application in public transport
Open this publication in new window or tab >>Mobility knowledge graph: review and its application in public transport
2023 (English)In: Transportation, ISSN 0049-4488, E-ISSN 1572-9435Article in journal (Refereed) Published
Abstract [en]

Understanding human mobility in urban areas is crucial for transportation planning, operations, and online control. The availability of large-scale and diverse mobility data (e.g., smart card data, GPS data), provides valuable insights into human mobility patterns. However, organizing and analyzing such data pose significant challenges. Knowledge graph (KG), a graph-based knowledge representation method, has been successfully applied in various domains but has limited applications in urban mobility. This paper aims to address this gap by reviewing existing KG studies, introducing the concept of a mobility knowledge graph (MKG), and proposing a general learning framework to construct MKG from smart card data. The MKG represents hidden travel activities between public transport stations, with stations as nodes and their relations as edges. Two decomposition approaches, rule-based and neural network-based models, are developed to extract MKG relations from smart card data, capturing latent spatiotemporal travel dependencies. The case study is conducted using smart card data from a heavily used urban railway system to validate the effectiveness of MKG in predicting individual trip destinations. The results demonstrate the significance of establishing an MKG database, as it assists in a typical problem of predicting individual trip destinations for public transport systems with only tap-in records. Additionally, the MKG framework offers potential for efficient data management and applications such as individual mobility prediction and personalized travel recommendations.

National Category
Transport Systems and Logistics
Identifiers
urn:nbn:se:kth:diva-340644 (URN)10.1007/s11116-023-10451-8 (DOI)2-s2.0-85178955739 (Scopus ID)
Funder
KTH Royal Institute of Technology
Note

QC 20231211

Available from: 2023-12-09 Created: 2023-12-09 Last updated: 2024-12-03Bibliographically approved
2. User-station attention inference using smart card data: a knowledge graph assisted matrix decomposition model
Open this publication in new window or tab >>User-station attention inference using smart card data: a knowledge graph assisted matrix decomposition model
Show others...
2023 (English)In: Applied intelligence (Boston), ISSN 0924-669X, E-ISSN 1573-7497, Vol. 53, no 19, p. 21944-21960Article in journal (Refereed) Published
Abstract [en]

Understanding human mobility in urban areas is important for transportation, from planning to operations and online control. This paper proposes the concept of user-station attention, which describes the user’s (or user group’s) interest in or dependency on specific stations. The concept contributes to a better understanding of human mobility (e.g., travel purposes) and facilitates downstream applications, such as individual mobility prediction and location recommendation. However, intrinsic unsupervised learning characteristics and untrustworthy observation data make it challenging to estimate the real user-station attention. We introduce the user-station attention inference problem using station visit counts data in public transport and develop a matrix decomposition method capturing simultaneously user similarity and station-station relationships using knowledge graphs. Specifically, it captures the user similarity information from the user-station visit counts matrix. It extracts the stations’ latent representation and hidden relations (activities) between stations to construct the mobility knowledge graph (MKG) from smart card data. We develop a neural network (NN)-based nonlinear decomposition approach to extract the MKG relations capturing the latent spatiotemporal travel dependencies. The case study uses both synthetic and real-world data to validate the proposed approach by comparing it with benchmark models. The results illustrate the significant value of the knowledge graph in contributing to the user-station attention inference. The model with MKG improves the estimation accuracy by 35% in MAE and 16% in RMSE. Also, the model is not sensitive to sparse data provided only positive observations are used.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
User-station attention; Knowledge graph; Smart card data; Public transport
National Category
Transport Systems and Logistics
Research subject
Transport Science
Identifiers
urn:nbn:se:kth:diva-339441 (URN)10.1007/s10489-023-04678-2 (DOI)001010000200001 ()2-s2.0-85162008941 (Scopus ID)
Funder
KTH Royal Institute of Technology
Note

QC 20231113

Available from: 2023-11-11 Created: 2023-11-11 Last updated: 2024-12-03Bibliographically approved
3. Real-time bus arrival delays analysis using seemingly unrelated regression model
Open this publication in new window or tab >>Real-time bus arrival delays analysis using seemingly unrelated regression model
Show others...
2024 (English)In: Transportation, ISSN 0049-4488, E-ISSN 1572-9435, Transportation, ISSN 0049-4488Article in journal, Editorial material (Refereed) Published
Abstract [en]

To effectively manage and control public transport operations, understanding the various factors that impact bus arrival delays is crucial. However, limited research has focused on a comprehensive analysis of bus delay factors, often relying on single-step delay prediction models that are unable to account for the heterogeneous impacts of spatiotemporal factors along the bus route. To analyze the heterogeneous impact of bus arrival delay factors, the paper proposes a set of regression equations conditional on the bus location. A seemingly unrelated regression equation (SURE) model is developed to estimate the regression coefficients, accounting for potential correlations between regression residuals caused by shared unobserved factors among equations. The model is validated using bus operations data from Stockholm, Sweden. The results highlight the importance of developing stop-specific bus arrival delay models to understand the heterogeneous impact of explanatory variables. The significant factors impacting bus arrival delays are primarily associated with bus operations, such as delays at consecutive upstream stops, dwell time, scheduled travel time, recurrent congestion, and current traffic conditions. Factors like the calendar and weather have significant but marginal impacts on arrival delays. The study suggests that different bus operating management strategies, such as schedule adjustments, route optimization, and real-time monitoring and control, should be tailored to the characteristics of stop sections since the impacts of these factors vary depending on the stop location.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Transport Systems and Logistics
Identifiers
urn:nbn:se:kth:diva-350233 (URN)10.1007/s11116-024-10507-3 (DOI)001255194300001 ()2-s2.0-85197190064 (Scopus ID)
Note

QC 20240710

Available from: 2024-07-09 Created: 2024-07-09 Last updated: 2024-12-05Bibliographically approved
4. Causal Graph Discovery for Urban Bus Operation Delays: A case in Stockholm
Open this publication in new window or tab >>Causal Graph Discovery for Urban Bus Operation Delays: A case in Stockholm
Show others...
2024 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Bus delays pose significant challenges to urban public transportation systems, impacting operational efficiency and incurring substantial costs. Understanding the causative factors behind bus delays is crucial for developing effective strategies to mitigate them. However, existing research predominantly relies on correlational analysis, which falls short of revealing the underlying causal relationships among factors. This study introduces a novel data-driven causality-based modeling approach, combining data-driven causal discovery and structure equation models (SEM), to investigate the causal relationships among various operational factors influencing bus delays. It can automatically generate a causal delay graph that elucidates the interconnectedness and influence of operational factors on bus delays, providing a deeper understanding of the causal mechanism of bus delays. We explored and evaluated the performance of different causal discovery algorithms in generating causal graph from both aspects of the data fitting and causality discovery performance. The SEM model is used to quantify the direct causal effects among factors in the causal graph. The case study is conducted to validate the causal discovery model performance using Google Transit Feed Specification (GTFS) data from high-frequency bus routes in Stockholm, Sweden. The validation results highlight potential of the data-driven causal discovery models in discovering causality relationships and automating the knowledge discovery process, particularly combining with domain knowledge. The empirical findings show the complexity of factors contributing to bus delays and emphasize the importance of integrating causality into the bus delay factor analysis. For example, a high correlation between origin delay and current arrival delay (coefficient = 0.63) doesn't necessarily indicate causation, and a strong causal link from dwell time and arrival delay also does not reflect a high correlation (coefficient = 0.12). The proof of data-driven causal discovery would facilitate the automated and informed decision-making process to optimize bus services towards better efficiency and reliability.

Keywords
bus arrival delays, causal discovery algorithm, causal graph, causality-based factor analysis, GTFS data
National Category
Transport Systems and Logistics
Identifiers
urn:nbn:se:kth:diva-340759 (URN)
Conference
The 103rd Transportation Research Board (TRB) Annual Meeting, January 7–11, 2024, Washington, DC, USA
Note

QC 20240108

Available from: 2023-12-13 Created: 2023-12-13 Last updated: 2024-12-03Bibliographically approved
5. Quantifying Variable Contributions to Bus Operation Delays Considering Causal Relationships
Open this publication in new window or tab >>Quantifying Variable Contributions to Bus Operation Delays Considering Causal Relationships
Show others...
2025 (English)In: Transportation Research, Part E: Logistics and Transportation Review, ISSN 1366-5545, Vol. 194, p. 103881-Article in journal, Editorial material (Refereed) Published
Abstract [en]

Buses in public transit networks often face operational delays due to dynamic conditions such as traffic congestion, which can propagate through transit routes, affecting overall system performance. Understanding the causes of bus arrival delays is crucial for effective public transport management and control. Moreover, understanding the contribution of each factor to bus delays not only aids in developing targeted strategies to mitigate delays but is also crucial for effective decision-making and planning. Traditional research primarily focuses on correlation-based analysis, lacking the ability to reveal the underlying causal mechanisms. Additionally, no studies have considered the complex causal relationships between factors when quantifying their contributions to outcomes in public transport. This study aims to analyze the factors causing bus arrival delays from a causal perspective, focusing on quantifying the causal contribution of each factor while considering their causal relationships. Quantifying a factor's causal contribution poses challenges due to computational complexity and statistical bias from the limited sample size. Using a causal discovery method, this study generates a causal graph for bus arrival delays and employs the causality-based Shapley value to quantify the contribution of each variable. The study further uses the Double Machine Learning (DML) approach to estimate the causal contributions, which provides a consistent and computationally feasible method. A case study was conducted using Google Transit Feed Specification (GTFS) data, focusing on high-frequency bus routes in Stockholm, Sweden. To validate the model, cross-validation was performed by comparing variable importance rankings with traditional models, including Linear Regression (LR) and Structural Equation Modeling (SEM). The comparison shows that results from the causality-based Shapley value significantly differ from those obtained by traditional methods in terms of importance rankings and influence magnitudes. The findings underscore the significant impact of origin delays on bus punctuality, a factor often underestimated in previous studies. Additionally, it demonstrates that employing a causal discovery model can not only infer causal relationships but also reveal direct and indirect effects, which can provide more intuitive explanations. Finally, although the causal results are mathematically and intuitively sound, it is important to further investigate the real causality impact in practice using lab experiments or A/B tests in real-world settings.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Explainable AI; Causal graph discovery; Shapley value; Urban transit; GTFS data
National Category
Transport Systems and Logistics
Identifiers
urn:nbn:se:kth:diva-357042 (URN)10.1016/j.tre.2024.103881 (DOI)
Note

QC 20241206

Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2024-12-06Bibliographically approved
6. Understanding Bus Network Delay Propagation: Integration of Causal Inference and Complex Network Theory
Open this publication in new window or tab >>Understanding Bus Network Delay Propagation: Integration of Causal Inference and Complex Network Theory
2025 (English)In: Journal of Transport Geography, ISSN 0966-6923, E-ISSN 1873-1236Article in journal, Editorial material (Refereed) In press
Abstract [en]

Bus transport, characterized by a complex network of routes and stops, frequently experiences delays that can affect the entire system’s reliability, passenger satisfaction, and operational efficiency. Existing research on bus delay propagation predominantly focuses on the route level. They lack a broader network-level perspective, which is essential for fully understanding the complex interactions and delay propagation. Additionally, previous studies typically rely on correlation-based analysis, which may not adequately uncover the underlying causal mechanisms of bus delay propagation. To understand bus delay propagation in the Public Transport System (PTS), this study employs a causality-based model instead of traditional correlation-based analysis to identify causal relationships between bus stops. We introduce a time-series causal discovery model that integrates temporal and spatial features of stop delays to generate a delay propagation causal graph (DPCG). Then, complex network theory and metrics are used to perform topological analysis on the DPCG and identify key bus stops. The case study is conducted using real-time GTFS data from Stockholm, Sweden. The results indicate that stops with more connections significantly influence delay propagation, and the network displays a distinct community structure with mixed connectivity. Moreover, bus stops exhibit different delay propagation patterns during various time periods. During the morning peak, delays primarily propagate to stops in the inner city due to the commuting surge. In the evening peak, however, delays are more widely distributed across central and suburban areas, reflecting the diversity of after-work travel patterns. The study also reveals that delay propagation extends beyond a single route and affects multiple routes. 

Keywords
Bus delay, Network delay propagation, Causal inference, Complex network
National Category
Transport Systems and Logistics
Research subject
Transport Science, Transport Systems
Identifiers
urn:nbn:se:kth:diva-357043 (URN)
Note

QC 20241205

Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2025-01-03Bibliographically approved

Open Access in DiVA

summary(3033 kB)64 downloads
File information
File name SUMMARY01.pdfFile size 3033 kBChecksum SHA-512
1886c3e3e5fd71ecde67811bc5a21cfb688cd3eb9309b1f260517b6126d4df08c3367cae02edd345439ac577b0b65faad51a3388baed6362bf672bec36f4449e
Type summaryMimetype application/pdf

Authority records

Zhang, Qi

Search in DiVA

By author/editor
Zhang, Qi
By organisation
Transport planning
Transport Systems and Logistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 375 hits
12345674 of 17
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf