kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating the Faithfulness of Local Feature Attribution Explanations: Can We Trust Explainable AI?
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-6846-5707
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Black-box models have demonstrated incredible performance and accuracy across various modeling problems and benchmarks over the past decade, from detecting objects in images to generating intelligent responses to user queries. Despite their impressive performance, these models suffer from a lack of interpretability, making it difficult to understand their decision-making processes and diagnose errors, which limits their applicability, especially in high-stakes domains such as healthcare and law. Explainable Artificial Intelligence (xAI) is a set of techniques, tools, and algorithms that bring transparency to black-box machine learning models. This transparency is said to bring trust to the users and, as a result, help deploy these models in high-stake decision-making domains. One of the most popular categories of xAI algorithms is local explanation techniques, where the information about the prediction of a black box for a single data instance. One of the most consequential open research problems for local explanation techniques is the evaluation of these techniques. This is mainly because we cannot directly extract ground truth explanations from complex black-box models to evaluate these techniques. In this thesis, we focus on a systematic evaluation of local explanation techniques. In the first part, we investigate whether local explanations, such as LIME, fail systematically or if failures only occur in a few cases. We then discuss the implicit and explicit assumptions behind different evaluation measures for local explanations. Through this analysis, we aim to present a logic for choosing the most optimal evaluation measure in various cases. After that, we proposea new evaluation framework called Model-Intrinsic Additive Scores (MIAS) for extracting ground truth explanations from different black-box models for regression, classification, and learning-to-rank models. Next, we investigate the faithfulness of explanations of tree ensemble models using perturbation-based evaluation measures. These techniques do not rely on the ground truth explanations. The last part of this thesis focuses on a detailed investigation into the faithfulness of local explanations of LambdaMART, a tree-based ensemble learning-to-rank model. We are particularly interested in studying whether techniques built specifically for explaining learning-to-rank models are more faithful than their regression-based counterparts for explaining LambdaMART. For this, we have included evaluation measures that rely on ground truth along with those that do not rely on the ground truth. This thesis presents several influential conclusions. First, we find that failures in local explanation techniques, such as LIME, occur more frequently and systematically, and we explore the mechanisms behind these failures. Furthermore, we demonstrate that evaluating local explanations using ground truth extracted from interpretable models mitigates the risk of blame, where explanations might be wrongfully criticized for lacking faithfulness. We also show that local explanations provide faithful insights for linear regression but not for classification models, such as Logistic Regression and Naive Bayes, or ranking models, such as Neural Ranking Generalized Additive Models (GAMs). Additionally, our results indicate that KernelSHAP and LPI deliver faithful explanations for treebased ensemble models, such as Gradient Boosting and Random Forests, when evaluated with measures independent of ground truth. Lastly, we establish that regression-based explanations for learning-to-rank models consistently outperform ranking-based explanation techniques in explaining LambdaMART. Our conclusion includes a mix of ground truth-dependent and perturbation-based evaluation measures that do not rely on ground truth.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. , p. 80
Series
TRITA-EECS-AVL ; 2025:23
Keywords [en]
xai, artificial intelligene, machine learning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-360228ISBN: 978-91-8106-200-7 (print)OAI: oai:DiVA.org:kth-360228DiVA, id: diva2:1939078
Public defence
2025-03-14, Sal C, Ka-Sal C (Sven-Olof Öhrvik), Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-03-05Bibliographically approved
List of papers
1. A study of data and label shift in the LIME framework
Open this publication in new window or tab >>A study of data and label shift in the LIME framework
(English)Manuscript (preprint) (Other academic) [Artistic work]
Abstract [en]

LIME is a popular approach for explaining a black-box prediction through an interpretable model that is trained on instances in the vicinity of the predicted instance. To generate these instances, LIME randomly selects a subset of the non-zero features of the predicted instance. After that, the perturbed instances are fed into the black-box model to obtain labels for these, which are then used for training the interpretable model. In this study, we present a systematic evaluation of the interpretable models that are output by LIME on the two use-cases that were considered in the original paper introducing the approach; text classification and object detection. The investigation shows that the perturbation and labeling phases result in both data and label shift. In addition, we study the correlation between the shift and the fidelity of the interpretable model and show that in certain cases the shift negatively correlates with the fidelity. Based on these findings, it is argued that there is a need for a new sampling approach that mitigates the shift in the LIME's framework.

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-360213 (URN)10.48550/arXiv.1910.14421 (DOI)
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-02-20Bibliographically approved
2. Can local explanation techniques explain linear additive models?
Open this publication in new window or tab >>Can local explanation techniques explain linear additive models?
2024 (English)In: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 38, no 1, p. 237-280Article in journal (Refereed) Published
Abstract [en]

Local model-agnostic additive explanation techniques decompose the predicted output of a black-box model into additive feature importance scores. Questions have been raised about the accuracy of the produced local additive explanations. We investigate this by studying whether some of the most popular explanation techniques can accurately explain the decisions of linear additive models. We show that even though the explanations generated by these techniques are linear additives, they can fail to provide accurate explanations when explaining linear additive models. In the experiments, we measure the accuracy of additive explanations, as produced by, e.g., LIME and SHAP, along with the non-additive explanations of Local Permutation Importance (LPI) when explaining Linear and Logistic Regression and Gaussian naive Bayes models over 40 tabular datasets. We also investigate the degree to which different factors, such as the number of numerical or categorical or correlated features, the predictive performance of the black-box model, explanation sample size, similarity metric, and the pre-processing technique used on the dataset can directly affect the accuracy of local explanations.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-360215 (URN)10.1007/s10618-023-00971-3 (DOI)001067646000001 ()2-s2.0-85171464862 (Scopus ID)
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-02-20Bibliographically approved
3. The Blame Problem in Evaluating Local Explanations and How to Tackle It
Open this publication in new window or tab >>The Blame Problem in Evaluating Local Explanations and How to Tackle It
2024 (English)In: Artificial Intelligence. ECAI 2023 International Workshops - XAI^3, TACTIFUL, XI-ML, SEDAMI, RAAIT, AI4S, HYDRA, AI4AI, 2023, Proceedings, Springer Nature , 2024, p. 66-86Conference paper, Published paper (Refereed)
Abstract [en]

The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the “blame problem.” In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Evaluation of Local Explanations, Explainability in Machine Learning, Explainable AI, Interpretability, Local Explanations, Local model-agnostic Explanations
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-343500 (URN)10.1007/978-3-031-50396-2_4 (DOI)001259329400004 ()2-s2.0-85184112932 (Scopus ID)
Conference
International Workshops of the 26th European Conference on Artificial Intelligence, ECAI 2023, Kraków, Poland, Sep 30 2023 - Oct 4 2023
Note

QC 20240219

Part of ISBN 9783031503955

Available from: 2024-02-15 Created: 2024-02-15 Last updated: 2025-02-20Bibliographically approved
4. Local Point-Wise Explanations of LambdaMART
Open this publication in new window or tab >>Local Point-Wise Explanations of LambdaMART
2024 (English)In: 14th Scandinavian Conference on Artificial Intelligence SCAI 2024, 2024Conference paper, Published paper (Refereed)
Abstract [en]

LambdaMART has been shown to outperform neural network models on tabular Learning-to-Rank (LTR) tasks. Similar to the neural network models, LambdaMART is considered a black-box model due to the complexity of the logic behind its predictions. Explanation techniques can help us understand these models. Our study investigates the faithfulness of point-wise explanation techniques when explaining LambdaMART models. Our analysis includes LTR-specific explanation techniques, such as LIRME and EXS, as well as explanation techniques that are not adapted to LTR use cases, such as LIME, KernelSHAP, and LPI. The explanation techniques are evaluated using several measures: Consistency, Fidelity,(In) fidelity, Validity, Completeness, and Feature Frequency (FF) Similarity. Three LTR benchmark datasets are used in the investigation: LETOR 4 (MQ2008), Microsoft Bing Search (MSLR-WEB10K), and Yahoo! LTR challenge dataset. Our empirical results demonstrate the challenges of accurately explaining LambdaMART: no single explanation technique is consistently faithful across all our evaluation measures and datasets. Furthermore, our results show that LTR-based explanation techniques are not consistently better than their non-LTR-based counterparts across the evaluation measures. Specifically, the LTR-based explanation techniques consistently are the most faithful with respect to (In) fidelity, whereas the non-LTR-specific approaches are shown to frequently provide the most faithful explanations with respect to Validity, Completeness, and FF Similarity.

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-360218 (URN)
Conference
14th Scandinavian Conference on Artificial Intelligence SCAI 2024, Jönköping University, 10-11 Jun 2024
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-02-20Bibliographically approved
5. Local Interpretable Model-Agnostic Explanations for Neural Ranking Models
Open this publication in new window or tab >>Local Interpretable Model-Agnostic Explanations for Neural Ranking Models
2024 (English)In: 14th Scandinavian Conference on Artificial Intelligence SCAI 2024, 2024Conference paper, Published paper (Refereed) [Artistic work]
Abstract [en]

Neural Ranking Models have shown state-of-the-art performance in Learning-To-Rank (LTR) tasks. However, they are considered black-box models. Understanding the logic behind the predictions of such black-box models is paramount for their adaptability in the real-world and high-stake decision-making domains. Local explanation techniques can help us understand the importance of features in the dataset relative to the predicted output of these black-box models. This study investigates new adaptations of Local Interpretable Model-Agnostic Explanation (LIME) explanation for explaining Neural ranking models. To evaluate our proposed explanation, we explain Neural GAM models. Since these models are intrinsically interpretable Neural Ranking Models, we can directly extract their ground truth importance scores. We show that our explanation of Neural GAM models is more faithful than explanation techniques developed for LTR applications such as LIRME and EXS and non-LTR explanation techniques for regression models such as LIME and KernelSHAP using measures such as Rank Biased Overlap (RBO) and Overlap AUC. Our analysis is performed on the Yahoo! Learning-To-Rank Challenge dataset.

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-360223 (URN)
Conference
14th Scandinavian Conference on Artificial Intelligence SCAI 2024, Jönköping University, 10-11 Jun 2024
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-02-20Bibliographically approved
6. Local List-Wise Explanations of LambdaMART
Open this publication in new window or tab >>Local List-Wise Explanations of LambdaMART
2024 (English)In: Explainable Artificial Intelligence - Second World Conference, xAI 2024, Proceedings, Springer Nature , 2024, p. 369-392Conference paper, Published paper (Refereed)
Abstract [en]

LambdaMART, a potent black-box Learning-to-Rank (LTR) model, has been shown to outperform neural network models across tabular ranking benchmark datasets. However, its lack of transparency challenges its application in many real-world domains. Local list-wise explanation techniques provide scores that explain the importance of the features in a list of documents associated with a query to the prediction of black-box LTR models. This study investigates which list-wise explanation techniques provide the most faithful explanations for LambdaMART models. Several local explanation techniques are evaluated for this, i.e., Greedy Score, RankLIME, EXS, LIRME, LIME, and SHAP. Moreover, a non-LTR explanation technique is applied, called Permutation Importance (PMI) to obtain list-wise explanations of LambdaMART. The techniques are compared based on eight evaluation metrics, i.e., Consistency, Completeness, Validity, Fidelity, ExplainNCDG@10, (In)fidelity, Ground Truth, and Feature Frequency similarity. The evaluation is performed on three benchmark datasets: Yahoo, Microsoft Bing Search (MSLR-WEB10K), and LETOR 4 (MQ2008), along with a synthetic dataset. The experimental results show that no single explanation technique is faithful across all datasets and evaluation metrics. Moreover, the explanation techniques tend to be faithful for different subsets of the evaluation metrics; for example, RankLIME out-performs other explanation techniques with respect to Fidelity and ExplainNCDG, while PMI provides the most faithful explanations with respect to Validity and Completeness. Moreover, we show that explanation sample size and the normalization of feature importance scores in explanations can largely affect the faithfulness of explanation techniques across all datasets.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Explainability for Learning to Rank, Explainable Artificial Intelligence, Explainable Machine Learning, Local explanations, Local list-wise explanations
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-351924 (URN)10.1007/978-3-031-63797-1_19 (DOI)001282234900019 ()2-s2.0-85200663788 (Scopus ID)
Conference
2nd World Conference on Explainable Artificial Intelligence, xAI 2024, Valletta, Malta, Jul 17 2024 - Jul 19 2024
Note

Part of ISBN 9783031637964

QC 20240823

Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2025-02-20Bibliographically approved
7. Faithfulness of Local Explanations for Tree-Based Ensemble Models
Open this publication in new window or tab >>Faithfulness of Local Explanations for Tree-Based Ensemble Models
2024 (English)In: 27th International Conference, DS 2024, Pisa, Italy, October 14–16, 2024, Proceedings, Part II / [ed] Dino Pedreschi, Anna Monreale, Riccardo Guidotti, Roberto Pellungrini, Francesca Naretto, Springer Nature , 2024Conference paper, Published paper (Refereed) [Artistic work]
Abstract [en]

Local explanation techniques provide insights into the predicted outputs of machine learning models for individual data instances. These techniques can be model-agnostic, treating the machine learning model as a black box, or model-based, leveraging access to the model’s internal properties or logic. Evaluating these techniques is crucial for ensuring the transparency of complex machine-learning models in real-world applications. However, most evaluation studies have focused on the faithfulness of these techniques in explaining neural networks. Our study empirically evaluates the faithfulness of local explanations in explaining tree-based ensemble models. In our study, we have included local model-agnostic explanations of LIME, KernelSHAP, and LPI, along with local model-based explanations of TreeSHAP, Sabaas, and Local MDI for gradient-boosted trees and random forests models trained on 20 tabular datasets. We evaluate local explanations using two perturbation-based measures: Importance by Preservation and Importance by Deletion. We show that model-agnostic explanations of KernelSHAP and LPI consistently outperform model-based explanations from TreeSHAP, Saabas, and Local MDI when gradient-boosted tree and random forest models. Moreover, LIME explanations of gradient-boosted tree and random forest models consistently demonstrate low faithfulness across all datasets.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-360222 (URN)10.1007/978-3-031-78980-9_2 (DOI)2-s2.0-85219197501 (Scopus ID)
Conference
27th International Conference, DS 2024, Pisa, Italy, October 14–16, 2024
Note

QC 20250220

Available from: 2025-02-20 Created: 2025-02-20 Last updated: 2025-03-12Bibliographically approved

Open Access in DiVA

Fulltext(2988 kB)262 downloads
File information
File name FULLTEXT01.pdfFile size 2988 kBChecksum SHA-512
167af025db432cbf6d71bbf7357be582a403af8fab41e1ed0bb1f7915ac7637d771493991d82b2022f8dfd6083ae80606386b4a43fd4195bc498fb29ef10d628
Type fulltextMimetype application/pdf

Authority records

Akhavan Rahnama, Amir Hossein

Search in DiVA

By author/editor
Akhavan Rahnama, Amir Hossein
By organisation
Software and Computer systems, SCS
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 262 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1138 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf