kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A state-based inverse reinforcement learning approach to model activity-travel choices behavior with reward function recovery
KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering. School of Transportation, Southeast University, Nanjing 211189, Jiangsu, China, Jiangsu.ORCID iD: 0000-0002-4356-1145
School of Transportation, Southeast University, Nanjing 211189, Jiangsu, China, Jiangsu; Jiangsu Key Laboratory of Urban ITS, Southeast University, Nanjing 211189, Jiangsu, China, Jiangsu; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Nanjing 211189, Jiangsu, China, Jiangsu.
KTH, School of Architecture and the Built Environment (ABE), Civil and Architectural Engineering, Transport planning.ORCID iD: 0000-0002-2141-0389
School of Transportation, Southeast University, Nanjing 211189, Jiangsu, China, Jiangsu.
Show others and affiliations
2024 (English)In: Transportation Research Part C: Emerging Technologies, ISSN 0968-090X, E-ISSN 1879-2359, Vol. 158, article id 104454Article in journal (Refereed) Published
Abstract [en]

Behaviorally oriented activity-travel choices (ATC) modeling is a principal part of travel demand analysis. Traditional econometric and rule-based methods require explicit model structures and complex domain knowledge. While several recent studies used machine learning models, especially adversarial inverse reinforcement learning (IRL) models, to learn potential ATC patterns with less expert-designed settings, they lack a clear representation of rational ATC behavior. In this study, we propose a data-driven IRL framework based on the maximum causal approach to minimize f-divergences between expert and agent state marginal distributions, which provides a more sample-efficient measurement. In addition, we specify a separate state-only reward function and derive an analytical gradient of the f-divergence objective with respect to reward parameters to ensure good convergences. The method can recover a stationary reward function, which assures the agent to get close to the expert behavior when training from scratch. We validate the proposed model using cellular signaling data from Chongqing, China by comparing with baseline models (behavior cloning, policy-based, and reward-based models) in aspects of policy performance comparison, reward recovery, and reward transfer tasks. The experiment results indicate that the proposed model outperforms existing methods and is relatively less sensitive to the number of expert demonstrations. Qualitative analyses are provided on the fundamental ATC preferences on different features given the reward function recovered from the observed mobility trajectories, and on the learning behaviors under different choices of f-divergence.

Place, publisher, year, edition, pages
Elsevier BV , 2024. Vol. 158, article id 104454
Keywords [en]
Activity-travel choices modeling, Cellular signaling data, Inverse reinforcement learning, Reward function, State marginal matching
National Category
Transport Systems and Logistics
Identifiers
URN: urn:nbn:se:kth:diva-341608DOI: 10.1016/j.trc.2023.104454ISI: 001134887700001Scopus ID: 2-s2.0-85179130607OAI: oai:DiVA.org:kth-341608DiVA, id: diva2:1822613
Note

QC 20231227

Available from: 2023-12-27 Created: 2023-12-27 Last updated: 2025-03-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Song, YuchenMa, Zhenliang

Search in DiVA

By author/editor
Song, YuchenMa, Zhenliang
By organisation
Civil and Architectural EngineeringTransport planning
In the same journal
Transportation Research Part C: Emerging Technologies
Transport Systems and Logistics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 163 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf