Black Box Explanation by Learning Image Exemplars in the Latent Feature Space

Latest revision as of 15:19, 21 January 2021

Abstract

We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by "morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.

Original document

The different versions of the original document can be found in:

http://arxiv.org/abs/2002.03746

http://arxiv.org/pdf/2002.03746

http://link.springer.com/content/pdf/10.1007/978-3-030-46150-8_12,

http://dx.doi.org/10.1007/978-3-030-46150-8_12 under the license http://www.springer.com/tdm

https://dblp.uni-trier.de/db/journals/corr/corr2002.html#abs-2002-03746,

https://arxiv.org/pdf/2002.03746,

https://link.springer.com/chapter/10.1007/978-3-030-46150-8_12,

https://academic.microsoft.com/#/detail/3029534136

Latest revision as of 15:19, 21 January 2021

Abstract

Original document

Document information

Document Score

Share this document

Keywords

claim authorship

Revision as of 15:19, 21 January 2021 (view source) Scipediacontent (talk \| contribs) (Created page with " == Abstract == We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation metho...")	Latest revision as of 15:19, 21 January 2021 (view source) Scipediacontent (talk \| contribs) m (Scipediacontent moved page Draft Content 451223266 to Guidotti et al 2020a)
(No difference)