Explaining Interpretable Machine Learning: Theory, Methods and Applications

87 Pages Posted: 21 Jan 2021

See all articles by Michaela Benk

Michaela Benk

ETH Zürich - Department of Management, Technology, and Economics (D-MTEC); ETH Zürich - Mobiliar Lab for Analytics

Andrea Ferrario

Dep. Management, Technology, and Economics ETH Zurich; Mobiliar Lab for Analytics at ETH

Date Written: December 11, 2020

Abstract

This working paper aims at providing a structured and accessible introduction to the topic of interpretable machine learning. We start with an overview of the research literature and we continue by analyzing selected methods to explain machine learning model outcomes. We apply these methods in two distinct case studies. The theory on machine learning interpretability is discussed together with the concepts of explanation, interpretation, and trust from philosophy and social sciences. We choose counterfactual explanations and Locally Interpretable Model-agnostic Explanations (LIME) as prominent examples of machine learning interpretability methods and we discuss their Python implementations in detail. We apply the chosen methods in two separate case studies; the first uses the Boston Housing dataset to classify census tracts in the Boston metropolitan area. The second case study is focused on the natural language processing and classification of Youtube comments. The results of the first case study show that the existing Python implementation of counterfactual explanations does not allow for controlling sparsity and feasibility of the explanations. Moreover, it does not properly handle datasets with categorical variables. The results of the second case study show that the understandability of LIME explanations depends, among others, on the structure of the text instance to be explained. Therefore, practitioners have to rely on domain knowledge to identify and share only the informative explanations. The above limitations have to be taken care of to ensure applicability of counterfactual explanations and LIME in real-world applications.

Keywords: interpretable machine learning, explanation, interpretation, trust, trust in human-machine interactions, counterfactual explanations, Local Interpretable Model-agnostic Explanations (LIME), Python, Tensorflow 2.0

JEL Classification: C45, C51, C52, G22

Suggested Citation

Benk, Michaela and Ferrario, Andrea, Explaining Interpretable Machine Learning: Theory, Methods and Applications (December 11, 2020). Available at SSRN: https://ssrn.com/abstract=3748268 or http://dx.doi.org/10.2139/ssrn.3748268

Michaela Benk (Contact Author)

ETH Zürich - Department of Management, Technology, and Economics (D-MTEC) ( email )

ETH-Zentrum
Zurich, CH-8092
United States

ETH Zürich - Mobiliar Lab for Analytics ( email )

Zürich, 8092
Switzerland

Andrea Ferrario

Dep. Management, Technology, and Economics ETH Zurich ( email )

Zurich
Switzerland

Mobiliar Lab for Analytics at ETH ( email )

Zürich, 8092
Switzerland

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
189
Abstract Views
726
rank
207,405
PlumX Metrics