kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Local List-Wise Explanations of LambdaMART
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-6846-5707
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-5344-8042
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0001-8382-0300
2024 (English)In: Explainable Artificial Intelligence - Second World Conference, xAI 2024, Proceedings, Springer Nature , 2024, p. 369-392Conference paper, Published paper (Refereed)
Abstract [en]

LambdaMART, a potent black-box Learning-to-Rank (LTR) model, has been shown to outperform neural network models across tabular ranking benchmark datasets. However, its lack of transparency challenges its application in many real-world domains. Local list-wise explanation techniques provide scores that explain the importance of the features in a list of documents associated with a query to the prediction of black-box LTR models. This study investigates which list-wise explanation techniques provide the most faithful explanations for LambdaMART models. Several local explanation techniques are evaluated for this, i.e., Greedy Score, RankLIME, EXS, LIRME, LIME, and SHAP. Moreover, a non-LTR explanation technique is applied, called Permutation Importance (PMI) to obtain list-wise explanations of LambdaMART. The techniques are compared based on eight evaluation metrics, i.e., Consistency, Completeness, Validity, Fidelity, ExplainNCDG@10, (In)fidelity, Ground Truth, and Feature Frequency similarity. The evaluation is performed on three benchmark datasets: Yahoo, Microsoft Bing Search (MSLR-WEB10K), and LETOR 4 (MQ2008), along with a synthetic dataset. The experimental results show that no single explanation technique is faithful across all datasets and evaluation metrics. Moreover, the explanation techniques tend to be faithful for different subsets of the evaluation metrics; for example, RankLIME out-performs other explanation techniques with respect to Fidelity and ExplainNCDG, while PMI provides the most faithful explanations with respect to Validity and Completeness. Moreover, we show that explanation sample size and the normalization of feature importance scores in explanations can largely affect the faithfulness of explanation techniques across all datasets.

Place, publisher, year, edition, pages
Springer Nature , 2024. p. 369-392
Keywords [en]
Explainability for Learning to Rank, Explainable Artificial Intelligence, Explainable Machine Learning, Local explanations, Local list-wise explanations
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-351924DOI: 10.1007/978-3-031-63797-1_19ISI: 001282234900019Scopus ID: 2-s2.0-85200663788OAI: oai:DiVA.org:kth-351924DiVA, id: diva2:1890140
Conference
2nd World Conference on Explainable Artificial Intelligence, xAI 2024, Valletta, Malta, Jul 17 2024 - Jul 19 2024
Note

Part of ISBN 9783031637964

QC 20240823

Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2025-02-20Bibliographically approved
In thesis
1. Evaluating the Faithfulness of Local Feature Attribution Explanations: Can We Trust Explainable AI?
Open this publication in new window or tab >>Evaluating the Faithfulness of Local Feature Attribution Explanations: Can We Trust Explainable AI?
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Black-box models have demonstrated incredible performance and accuracy across various modeling problems and benchmarks over the past decade, from detecting objects in images to generating intelligent responses to user queries. Despite their impressive performance, these models suffer from a lack of interpretability, making it difficult to understand their decision-making processes and diagnose errors, which limits their applicability, especially in high-stakes domains such as healthcare and law. Explainable Artificial Intelligence (xAI) is a set of techniques, tools, and algorithms that bring transparency to black-box machine learning models. This transparency is said to bring trust to the users and, as a result, help deploy these models in high-stake decision-making domains. One of the most popular categories of xAI algorithms is local explanation techniques, where the information about the prediction of a black box for a single data instance. One of the most consequential open research problems for local explanation techniques is the evaluation of these techniques. This is mainly because we cannot directly extract ground truth explanations from complex black-box models to evaluate these techniques. In this thesis, we focus on a systematic evaluation of local explanation techniques. In the first part, we investigate whether local explanations, such as LIME, fail systematically or if failures only occur in a few cases. We then discuss the implicit and explicit assumptions behind different evaluation measures for local explanations. Through this analysis, we aim to present a logic for choosing the most optimal evaluation measure in various cases. After that, we proposea new evaluation framework called Model-Intrinsic Additive Scores (MIAS) for extracting ground truth explanations from different black-box models for regression, classification, and learning-to-rank models. Next, we investigate the faithfulness of explanations of tree ensemble models using perturbation-based evaluation measures. These techniques do not rely on the ground truth explanations. The last part of this thesis focuses on a detailed investigation into the faithfulness of local explanations of LambdaMART, a tree-based ensemble learning-to-rank model. We are particularly interested in studying whether techniques built specifically for explaining learning-to-rank models are more faithful than their regression-based counterparts for explaining LambdaMART. For this, we have included evaluation measures that rely on ground truth along with those that do not rely on the ground truth. This thesis presents several influential conclusions. First, we find that failures in local explanation techniques, such as LIME, occur more frequently and systematically, and we explore the mechanisms behind these failures. Furthermore, we demonstrate that evaluating local explanations using ground truth extracted from interpretable models mitigates the risk of blame, where explanations might be wrongfully criticized for lacking faithfulness. We also show that local explanations provide faithful insights for linear regression but not for classification models, such as Logistic Regression and Naive Bayes, or ranking models, such as Neural Ranking Generalized Additive Models (GAMs). Additionally, our results indicate that KernelSHAP and LPI deliver faithful explanations for treebased ensemble models, such as Gradient Boosting and Random Forests, when evaluated with measures independent of ground truth. Lastly, we establish that regression-based explanations for learning-to-rank models consistently outperform ranking-based explanation techniques in explaining LambdaMART. Our conclusion includes a mix of ground truth-dependent and perturbation-based evaluation measures that do not rely on ground truth.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. 80
Series
TRITA-EECS-AVL ; 2025:23
Keywords
xai, artificial intelligene, machine learning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-360228 (URN)978-91-8106-200-7 (ISBN)
Public defence
2025-03-14, Sal C, Ka-Sal C (Sven-Olof Öhrvik), Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-03-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Akhavan Rahnama, Amir HosseinButepage, JudithBoström, Henrik

Search in DiVA

By author/editor
Akhavan Rahnama, Amir HosseinButepage, JudithBoström, Henrik
By organisation
Software and Computer systems, SCSRobotics, Perception and Learning, RPL
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 152 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf