Sie sind auf Seite 1von 81

Explainable AI and Visual Analytics (III)

Huamin Qu
Hong Kong University of Science and Technology

THE HONG KONG UNIVERSITY


VISLAB
OF SCIENCE AND
TECHNOLOGY
“ Strategy 2: Developing
Effective Methods for AI-
human Collaboration

Better visualization and user


interfaces are additional areas
that need much greater
development to help humans
understand large-volume
modern datasets and information
coming from a variety of sources.


VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’19) (Model Refinement)

3
Outline

• Motivations

• Vis for XAI


§ Model diagnosis: iForest
§ Model understanding: RNNVis
§ Model training: DeepTracker
§ Model trust: RuleMatrix

• VIS for AutoML


§ ATMSeer: Algorithm Transparency

• Conclusions and Future Work


Explainable AI

The concept of XAI. DARPA, Explainable AI Project 2017

5
What role is visualization playing in XAI?

DARPA, Explainable AI Project 2017

6
VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’19) (Model Refinement)

7
iForest: Interpreting Random Forests via Visual Analytics
IEEE Visual Analytics Science and Technology (VAST) 2018

Xun Zhao, Yanhong Wu, Dik Lee, Weiwei Cui


Background – Decision Tree
Motivation – Random Forest
Random Forests are A+ predictors on performance
but rate an F on interpretability
L. Breiman “Statistical modeling: The two cultures.”
iForest

Interpret random forest models and predictions


Decision Path View

a. For a data item, Decision Path Projection provides an overview of decision path similarities.
Decision Path View

b. The Feature Summary shows the summarized feature ranges for multiple selected decision paths.
Decision Path View

c. The Decision Path Flow encodes the detailed structures and feature ranges in a layer-wise manner.
Titanic Usage Scenario – Decision Path View

Positive Negative
Titanic Usage Scenario – Decision Path View

Positive Negative
Titanic Usage Scenario – Decision Path View

Positive Negative
VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

19
input
? output

What has the RNN learned from data?

21
What has the RNN learned from data?
A. map the value of a single hidden unit on data (Karpathy A. et al., 2015)

A unit sensitive to position in a line.

A lot more units have no clear meanings.

22
Our Solution: RNNVis

23

https://www.youtube.com/watch?v=0QFDNLdQ6_w
RNNVis

Our Solution
Explaining individual hidden units
Bi-graph and co-clustering
Sequence evaluation

24
RNNVis
Solution
Explaining an individual hidden unit using its most salient words

25% - 75%
9% - 91%

response

Unit: #36
Top 4 positive/negative salient words of unit 36 in
an RNN (GRU) trained on Yelp review data.

25
What Has RNN Learned from Data?
Solution
Explaining an individual hidden unit using its most salient words

mean
25% - 75%
9% - 91%

Highly responsive hidden units

Unit #

Distribution of model’s response given the word “he”.


Units reordered according to the mean. (an LSTM with 600 units)

26
What Has RNN Learned from Data?
Solution
Explaining an individual hidden unit using its most salient words

Investigating one unit/word at a time…

P: Too much user burden!


S: An overview for easier exploration

27
What Has RNN Learned from Data?

Solution
Explaining individual hidden units
Bi-graph and co-clustering
Sequence evaluation

28
What Has RNN Learned from Data?

Hidden Units

Words good nice by bad worst

RNNVis: Ming et al. 2017


29
What Has RNN Learned from Data?
Evaluation: What has an RNN learned from the data?

Hidden Units

Words good nice by bad worst

RNNVis: Ming et al. 2017


30
What Has RNN Learned from Data?

Hidden Units

Color: sign of the average weight


Width: scale of the average weight

Words good nice by bad worst

RNNVis: Ming et al. 2017


31
Hidden Units

Words he she by can may

Hidden Units Clusters Words Clusters


(Memory Chips) (Word Clouds)
32
VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

34
DeepTracker: Visualizing the Training Process of
Convolutional Neural Networks
ACM Transactions on Intelligent Systems and Technology (TIST), 2019

Dongyu Liu, Weiwei Cui, Kai Jin, Yuxiao Guo, Huamin Qu


Background
Basic Knowledge
„CNNs – trained parameters & validation labels

True
label: dog

„quantities need to monitor


„ loss function, train/validation error rates, weight update ratio, and weight/gradient/activation distributions

„Rules of thumb
„ loss and error rate should decrease over time; consistent increase or violent fluctuation of loss may indicate a problem
„ a big gap between the error rates of training and validation dataset suggest the model is over-fitting
„ a absence of any gap may indicate the model has a limited learning capability
„ the update ratio is expected to be around 1e-3 (lower suggest low lr and higher suggest high lr)
Our methods
Validation view

• Validation view shows how the


performances of validation
classes evolve over time
• Each class is depicted as a color
stripe
• Classes with similar evolving
patterns are put closely
Our methods
Validation view
• Validation view shows how the performances of
validation classes evolve over time
• Rule-based anomaly detection algorithm to detect
anomaly iterations ( ).
• Image-level performance exploration with pixel
charts (see d and e ).
• Patterns:
• Anomaly iterations for certain classes indicate the
model try to jump out of a local optimal
• After the anomaly iteration in e , the ‘mushroom’
class perform much better.
• Non-red mushrooms are always mislabeled during
the whole training.
Our methods
Layer view

• Layer view shows how the model parameters evolve over


time.
• In the form of hierarchical small multiples, consistent with the very
deep CNN structures
• Support multiple types of charts, e.g., line charts, horizon graphs,
and box plots
• CNN structures are also visualized and linked to the charts.
Our methods
Correlation view
• Correlation views explore the
relationships between the
changes of neuron weights
and model performance
• A grid-style visualization, where
rows and columns represent
layers and image classes,
respectively.
• Blue rectangles in cell_ij are the
detected filters of layer_i that
are highly related to the
performance change of class_i.
• A novel set partition algorithm
to reduce the visual clutter
Our methods
Correlation view

• Three views, i.e., validation


view (top), layer view (front),
and correlation view (right),
can be stitched as a cube.
• A novel way to explore the
complex relationships among
various types of heterogeneous
time-series data (e.g., image
classification results, neuron
weights, and training iterations).
Other Work

DGMTracker (Liu et al. VAST 2017) Seq2Seq-Vis (Strobelt et al. VAST 2018)
Other Work

GanViz ( Wang et al. IEEE PVIS 2017) DQNViz ( Wang et al. IEEE VIS 2018)
VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

44
RuleMatrix: Visualizing and Understanding
Classifiers with Rules
IEEE Visual Analytics Science and Technology (VAST), 2018
Yao Ming1, Huamin Qu1, Enrico Bertini2.

1 2
RuleMatrix
RuleMatrix

Decision Rule List


Antecedent Consequent
X IF X1 < 4 AND 9 < X2 AND X3 = 3 THEN Probability = 0.8

X ELSE IF X1 < 4 AND X3 = 1 THEN Probability = 0.2

√ ELSE IF X2 > 9 AND X4 = 0 THEN Probability = 0.1

...
RuleMatrix

Model-agnostic
Induct Bayesian Rule List
RuleMatrix IF (X1 in (178.67, inf)) THEN prob: [0.0152, 0.9848]

ELSE IF (X5 in (39.376, inf)) and (X6 in (1.0258, 2.0217)) THEN prob: [0.0784, 0.9216]

The Problems of Rule List in Text Form ELSE IF (X1 in (-inf, 86.89)) THEN prob: [0.9932, 0.0068]

ELSE IF (X5 in (-inf, 23.632)) and (X7 in (-inf, 25.426)) THEN prob: [0.9850, 0.0150]

ELSE IF (X4 in (203.8, inf)) and (X7 in (-inf, 25.426)) THEN prob: [0.8426, 0.1574]
IF X1 < 4 AND 9 < X2 THEN ELSEProb = 0.8 - Features are not aligned, difficult for visual
IF (X1 in (137.52, 155.06)) and (X5 in (39.376, inf)) THEN prob: [0.0149, 0.9851]

comparison/search
ELSE IF (X1 in (155.06, 178.67)) THEN prob: [0.0675, 0.9325]
ELSE IF X3 = 1 THEN Prob =ELSE IF0.2
(X5 in (39.376, inf)) and (X7 in (36.007, inf)) THEN prob: [0.1786, 0.8214]
- Unable to view important information of
ELSE IF (X1 in (86.89, 107.17)) THEN prob: [0.9750, 0.0250]

ELSE IF X4 = 0 THEN Prob =ELSE IF0.1 each rule (supports, fidelity, etc)
(X5 in (32.169, 39.376)) and (X7 in (36.007, inf)) THEN prob: [0.1835, 0.8165]

ELSE IF (X6 in (1.0258, 2.0217)) THEN prob: [0.3404, 0.6596]


- Scalability of long lists/ large feature sets
... ELSE IF (X4 in (-inf, 203.8)) and (X5 in (39.376, inf)) THEN prob: [0.4500, 0.5500]

ELSE IF (X5 in (-inf, 23.632)) THEN prob: [0.9515, 0.0485]

ELSE IF (X1 in (137.52, 155.06)) and (X6 in (0.3688, 1.0258)) THEN prob: [0.2250, 0.7750]
Solution: RuleMatrix Visualization! ELSE IF (X7 in (-inf, 25.426)) THEN prob: [0.9842, 0.0158]

ELSE IF (X1 in (132.04, 137.52)) and (X6 in (0.3688, 1.0258)) THEN prob: [0.3600, 0.6400]

ELSE IF (X0 in (-inf, 4.668)) and (X5 in (23.632, 28.954)) THEN prob: [0.9884, 0.0116]

ELSE IF (X1 in (137.52, 155.06)) THEN prob: [0.3077, 0.6923]

ELSE IF (X6 in (0.3688, 1.0258)) THEN prob: [0.6903, 0.3097]

ELSE DEFAULT prob: [0.9203, 0.0797]


RuleMatrix

Data Flow RuleMatrix Support Info


RuleMatrix
IF Glucose > 179 THEN Output_Prob = [0.02, 0.98]
ELSE IF BMI > 39 AND DBF > 1.0 THEN Output_Prob = [0.08, 0.92]
ELSE IF Glucose < 87 THEN Output_Prob = [0.99, 0.01]
...

• Each rule -> a row


• Each feature -> a column
• Each condition -> a small glyph + shadowed range
• Multiple conditions (AND) -> multiple glyphs
2 • Output probs -> color-coded number
RuleMatrix
IF Glucose > 179 THEN Output_Prob = [0.02, 0.98]
ELSE IF BMI > 39 AND DBF > 1.0 THEN Output_Prob = [0.08, 0.92]
ELSE IF Glucose < 87 THEN Output_Prob = [0.99, 0.01]
...

2
2
RuleMatrix
Data Flow

• Width of the flow: amount of data


• Color: label of the data
• Fork: data captured/not captured by a rule
RuleMatrix
Support Info
Negative Positive • Fidelity: accuracy of the rule in
predict by the model as Negative but wrong approximating the given model
Predict by the model as Positive but wrong

• Evidence (support): the data that


support (is captured by) the rule

• Length of the bar shows the


amount of data

• Stripped part are wrong predictions


Case - Understand the Model

• Dataset: Pima Female Diabetes

• Features: Glucose, Age, BMI, DPF, Pregnancy...

• Labels: Negative (Healthy), Positive (Diabetes)

• Model: 2-layer Neural Network (20, 20)

• Accuracy: 79% (Train), 73% (Test)

Negative Positive
predict by the model as Negative but wrong
Predict by the model as Positive but wrong
Case - Understand the Model
Young age AND Low BMI -> Negative

Negative predict by the model as Negative but wrong


Positive Predict by the model as Positive but wrong
Case - Understand the Model
Young age AND Low BMI -> Negative
Filters

-> Almost all Negative!

Negative predict by the model as Negative but wrong


Positive Predict by the model as Positive but wrong
Case2 - Understand the Errors of the Model
Older Age AND High BMI AND Medium Glucose -> ? Filters

Acc: 79% -> 57%!


Negative predict by the model as Negative but wrong The model finds it hard to predict!
Positive Predict by the model as Positive but wrong
Case2 - Understand the Errors of the Model
Oversample the erroneous subset of data (sampling rate: 2.0)

Before Sampling After Sampling


(10 runs) test set (10 runs) test set

Mean Acc: 72.4% Mean Acc: 76.3%

Min Acc: 70.2% Min Acc: 74.0%

Max Acc: 74.0% Max Acc: 79.2%


Demo of RuleMatrix
Web Demo (http://bit.ly/rulematrix-demo)

Jupyter Notebook (http://bit.ly/rm-note)


VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

62
ATMSeer:
Increasing Transparency and Controllability in
Automated Machine Learning

Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu,
Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu

63
Motivation

Support Vector Machine ?


Min Samples Leaf = ?
Leaf Size = ?
Neural Network ?
Max Depth = ?
Learning Rate = ?
Random Forest ?
Min Samples Split = ?
Hidden Layer = ?
Kernel Function = ?
K Nearest Neighbor ?
Linear Regression ?
Activation= ?
65
Motivation

Make it automated
Motivation
Run long enough?
Sufficiently explore
the search space? Miss some
suitable models?

lt s ,
Prior Resu rns,
Knowledg e
e Patt hts
In s ig

Automated
Machine Learning
67
Motivation

Run long enough?


Sufficiently explore
the search space? Miss some
suitable models? Transparency
e s u l ts,
R
Prior tt e r n s,
Pa s
Knowledg s ig ht
e In

Controllability
Automated
Machine Learning
68
ATMSeer: Increasing Transparency and Controllability
in Automated Machine Learning

Transparency
e s u l ts,
R
Prior Automatedtt e r n s,
Pa s
Knowledg s ig h t
e
Machine In
Learning
Controllability
Transparency Controllability
Analyze the searched models Automated
Modify the search space
Machine Learning 69
Designing ATMSeer

Transparency: What needs to be seen?


Controllability: What needs to be controlled?
70
Designing ATMSeer
An Workflow of using AutoML

Start
Yes

D1. Modify No Run AutoML No D3. Reason No Use the


D2. Adjust
search space? Process model choice? model
budget?

Yes Yes

Modify the range of Compare top k models and evaluate


1. Algorithms Choose one
1. Performance stability
2. Hyperpartitions model
2. Model complexity
3. Hyperparameters
Designing ATMSeer
• Have domain-specified preference
D1. Modify • Have prior knowledge
search space?
• …

Algorithm level Hyperpartition level Hyper parameter level

72
Designing ATMSeer
• Unsatisfying results
D2. Adjust
computational
• Potential to improve
budget? • Low coverage
• ……
Compu
tational
Budget

Model
Perform
ance

73
Designing ATMSeer
D3.
Reason/analyze
• Unfamiliar with the model
model choice? • Models with similar performances

Performance
Robustness
Score

Trust

74
Usage Scenarios

75
Usage Scenarios

76
Usage Scenarios

77
Other Findings
Different suitable hyperparameters for different dataset

78
Other Findings
Same dataset, different algorithms

79
VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)

RNNVis (IEEE VAST’17) (Model Understanding)


DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)

DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)


HKUST VISLAB

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

80
Limitations and Future Work
Limitations:
• Scalability
• Generalization
• Validation

Future Work:
• Model bias
• Explainability by analogy
• …
Thank You!
Contact:
Huamin Qu
huamin@cse.ust.hk

More Info:
http://vis.cse.ust.hk/groups/xai-vis/

82