Paper ID | SS-EDNN.5 | ||
Paper Title | UNDERSTANDING AND MANIPULATING NEURAL NET FEATURES USING SPARSE OBLIQUE CLASSIFICATION TREES | ||
Authors | Suryabhan Singh Hada, Miguel Á. Carreira-Perpiñán, Arman Zharmagambetov, University of California, Merced, United States | ||
Session | SS-EDNN: Special Session: Explainable Deep Neural Networks for Image/Video Processing | ||
Location | Area B | ||
Session Time: | Wednesday, 22 September, 14:30 - 16:00 | ||
Presentation Time: | Wednesday, 22 September, 14:30 - 16:00 | ||
Presentation | Poster | ||
Topic | Special Sessions: Explainable Deep Neural Networks for Image/Video Processing | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding what internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the net with a decision tree having sparse weight vectors at the nodes. We are able to learn trees that are both highly accurate and interpretable, so they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. We demonstrate this robustly in MNIST and ImageNet with LeNet5 and VGG networks. |