Explaining Interpretable Machine Learning: Theory, Methods and Applications
87 Pages Posted: 21 Jan 2021
Date Written: December 11, 2020
This working paper aims at providing a structured and accessible introduction to the topic of interpretable machine learning. We start with an overview of the research literature and we continue by analyzing selected methods to explain machine learning model outcomes. We apply these methods in two distinct case studies. The theory on machine learning interpretability is discussed together with the concepts of explanation, interpretation, and trust from philosophy and social sciences. We choose counterfactual explanations and Locally Interpretable Model-agnostic Explanations (LIME) as prominent examples of machine learning interpretability methods and we discuss their Python implementations in detail. We apply the chosen methods in two separate case studies; the first uses the Boston Housing dataset to classify census tracts in the Boston metropolitan area. The second case study is focused on the natural language processing and classification of Youtube comments. The results of the first case study show that the existing Python implementation of counterfactual explanations does not allow for controlling sparsity and feasibility of the explanations. Moreover, it does not properly handle datasets with categorical variables. The results of the second case study show that the understandability of LIME explanations depends, among others, on the structure of the text instance to be explained. Therefore, practitioners have to rely on domain knowledge to identify and share only the informative explanations. The above limitations have to be taken care of to ensure applicability of counterfactual explanations and LIME in real-world applications.
Keywords: interpretable machine learning, explanation, interpretation, trust, trust in human-machine interactions, counterfactual explanations, Local Interpretable Model-agnostic Explanations (LIME), Python, Tensorflow 2.0
JEL Classification: C45, C51, C52, G22
Suggested Citation: Suggested Citation