Talk MBA AI XAI 3 PDF

Explainable AI and Visual Analytics (III)
Huamin Qu
Hong Kong University of Science and Technology
THE HONG KONG UNIVERSITY

VISLAB
OF SCIENCE AND
TECHNOLOGY
“ Strategy 2: Developing
Effective Methods for AI-
human Collaboration
Better visualization and user

interfaces are additional areas
that need much greater
development to help humans
understand large-volume
modern datasets and information
coming from a variety of sources.
”
VisLab’s work on VIS for AI
iForest (IEEE VAST’18)
Non-DL EmbeddingVis (IEEE VAST’18)
RNNVis (IEEE VAST’17) (Model Understanding)

DL
XAI DeepTracker (ACM TIST’18) (Model Debugging)
Open the black box CNN Comparator (VADL’ 17) (Model Diagnosis)
DL
RuleMatrix (IEEE VAST’18) (Model Trust)
Treat as black box
AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)

HKUST VISLAB
InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’19) (Model Refinement)
3
Outline
• Motivations
• Vis for XAI

§ Model diagnosis: iForest
§ Model understanding: RNNVis
§ Model training: DeepTracker
§ Model trust: RuleMatrix
• VIS for AutoML

§ ATMSeer: Algorithm Transparency
• Conclusions and Future Work

Explainable AI
The concept of XAI. DARPA, Explainable AI Project 2017
5
What role is visualization playing in XAI?
DARPA, Explainable AI Project 2017
6

DL
DL
Treat as black box

HKUST VISLAB
7
iForest: Interpreting Random Forests via Visual Analytics
IEEE Visual Analytics Science and Technology (VAST) 2018
Xun Zhao, Yanhong Wu, Dik Lee, Weiwei Cui

Background – Decision Tree
Motivation – Random Forest
Random Forests are A+ predictors on performance
but rate an F on interpretability
L. Breiman “Statistical modeling: The two cultures.”
iForest
Interpret random forest models and predictions

Decision Path View
a. For a data item, Decision Path Projection provides an overview of decision path similarities.
Decision Path View
b. The Feature Summary shows the summarized feature ranges for multiple selected decision paths.
Decision Path View
c. The Decision Path Flow encodes the detailed structures and feature ranges in a layer-wise manner.
Titanic Usage Scenario – Decision Path View
Positive Negative
Positive Negative
Positive Negative

DL
DL
Treat as black box

HKUST VISLAB
19
input
? output
What has the RNN learned from data?
21
What has the RNN learned from data?
A. map the value of a single hidden unit on data (Karpathy A. et al., 2015)
A unit sensitive to position in a line.
A lot more units have no clear meanings.
22
Our Solution: RNNVis
23
https://www.youtube.com/watch?v=0QFDNLdQ6_w
RNNVis
Our Solution
Explaining individual hidden units
Bi-graph and co-clustering
Sequence evaluation
24
RNNVis
Solution
Explaining an individual hidden unit using its most salient words
25% - 75%
9% - 91%
response
Unit: #36
Top 4 positive/negative salient words of unit 36 in
an RNN (GRU) trained on Yelp review data.
25
What Has RNN Learned from Data?
Solution
mean
25% - 75%
9% - 91%
Highly responsive hidden units
Unit #
Distribution of model’s response given the word “he”.

Units reordered according to the mean. (an LSTM with 600 units)
26
Solution
Investigating one unit/word at a time…
P: Too much user burden!

S: An overview for easier exploration
27
Solution
Explaining individual hidden units
Bi-graph and co-clustering
Sequence evaluation
28
Hidden Units
Words good nice by bad worst
RNNVis: Ming et al. 2017

29
Evaluation: What has an RNN learned from the data?
Hidden Units

30
Hidden Units
Color: sign of the average weight

Width: scale of the average weight

31
Hidden Units
Words he she by can may
Hidden Units Clusters Words Clusters

(Memory Chips) (Word Clouds)
32

DL
DL
Treat as black box

HKUST VISLAB
34
DeepTracker: Visualizing the Training Process of
Convolutional Neural Networks
ACM Transactions on Intelligent Systems and Technology (TIST), 2019
Dongyu Liu, Weiwei Cui, Kai Jin, Yuxiao Guo, Huamin Qu

Background
Basic Knowledge
CNNs – trained parameters & validation labels
True
label: dog
quantities need to monitor

loss function, train/validation error rates, weight update ratio, and weight/gradient/activation distributions
Rules of thumb
loss and error rate should decrease over time; consistent increase or violent fluctuation of loss may indicate a problem
a big gap between the error rates of training and validation dataset suggest the model is over-fitting
a absence of any gap may indicate the model has a limited learning capability
the update ratio is expected to be around 1e-3 (lower suggest low lr and higher suggest high lr)
Our methods
Validation view
• Validation view shows how the

performances of validation
classes evolve over time
• Each class is depicted as a color
stripe
• Classes with similar evolving
patterns are put closely
Our methods
Validation view
• Validation view shows how the performances of
validation classes evolve over time
• Rule-based anomaly detection algorithm to detect
anomaly iterations ( ).
• Image-level performance exploration with pixel
charts (see d and e ).
• Patterns:
• Anomaly iterations for certain classes indicate the
model try to jump out of a local optimal
• After the anomaly iteration in e , the ‘mushroom’
class perform much better.
• Non-red mushrooms are always mislabeled during
the whole training.
Our methods
Layer view
• Layer view shows how the model parameters evolve over

time.
• In the form of hierarchical small multiples, consistent with the very
deep CNN structures
• Support multiple types of charts, e.g., line charts, horizon graphs,
and box plots
• CNN structures are also visualized and linked to the charts.
Our methods
Correlation view
• Correlation views explore the
relationships between the
changes of neuron weights
and model performance
• A grid-style visualization, where
rows and columns represent
layers and image classes,
respectively.
• Blue rectangles in cell_ij are the
detected filters of layer_i that
are highly related to the
performance change of class_i.
• A novel set partition algorithm
to reduce the visual clutter
Our methods
Correlation view
• Three views, i.e., validation

view (top), layer view (front),
and correlation view (right),
can be stitched as a cube.
• A novel way to explore the
complex relationships among
various types of heterogeneous
time-series data (e.g., image
classification results, neuron
weights, and training iterations).
Other Work
DGMTracker (Liu et al. VAST 2017) Seq2Seq-Vis (Strobelt et al. VAST 2018)
Other Work
GanViz ( Wang et al. IEEE PVIS 2017) DQNViz ( Wang et al. IEEE VIS 2018)

DL
DL
Treat as black box

HKUST VISLAB
44
RuleMatrix: Visualizing and Understanding
Classifiers with Rules
IEEE Visual Analytics Science and Technology (VAST), 2018
Yao Ming1, Huamin Qu1, Enrico Bertini2.
1 2
RuleMatrix
RuleMatrix
Decision Rule List

Antecedent Consequent
X IF X1 < 4 AND 9 < X2 AND X3 = 3 THEN Probability = 0.8
X ELSE IF X1 < 4 AND X3 = 1 THEN Probability = 0.2
√ ELSE IF X2 > 9 AND X4 = 0 THEN Probability = 0.1
...
RuleMatrix
Model-agnostic
Induct Bayesian Rule List
RuleMatrix IF (X1 in (178.67, inf)) THEN prob: [0.0152, 0.9848]
ELSE IF (X5 in (39.376, inf)) and (X6 in (1.0258, 2.0217)) THEN prob: [0.0784, 0.9216]
The Problems of Rule List in Text Form ELSE IF (X1 in (-inf, 86.89)) THEN prob: [0.9932, 0.0068]
ELSE IF (X5 in (-inf, 23.632)) and (X7 in (-inf, 25.426)) THEN prob: [0.9850, 0.0150]
ELSE IF (X4 in (203.8, inf)) and (X7 in (-inf, 25.426)) THEN prob: [0.8426, 0.1574]
IF X1 < 4 AND 9 < X2 THEN ELSEProb = 0.8 - Features are not aligned, difficult for visual
IF (X1 in (137.52, 155.06)) and (X5 in (39.376, inf)) THEN prob: [0.0149, 0.9851]
comparison/search
ELSE IF (X1 in (155.06, 178.67)) THEN prob: [0.0675, 0.9325]
ELSE IF X3 = 1 THEN Prob =ELSE IF0.2
(X5 in (39.376, inf)) and (X7 in (36.007, inf)) THEN prob: [0.1786, 0.8214]
- Unable to view important information of
ELSE IF X4 = 0 THEN Prob =ELSE IF0.1 each rule (supports, fidelity, etc)
(X5 in (32.169, 39.376)) and (X7 in (36.007, inf)) THEN prob: [0.1835, 0.8165]

- Scalability of long lists/ large feature sets
... ELSE IF (X4 in (-inf, 203.8)) and (X5 in (39.376, inf)) THEN prob: [0.4500, 0.5500]
ELSE IF (X5 in (-inf, 23.632)) THEN prob: [0.9515, 0.0485]
ELSE IF (X1 in (137.52, 155.06)) and (X6 in (0.3688, 1.0258)) THEN prob: [0.2250, 0.7750]
Solution: RuleMatrix Visualization! ELSE IF (X7 in (-inf, 25.426)) THEN prob: [0.9842, 0.0158]
ELSE IF (X1 in (132.04, 137.52)) and (X6 in (0.3688, 1.0258)) THEN prob: [0.3600, 0.6400]
ELSE IF (X0 in (-inf, 4.668)) and (X5 in (23.632, 28.954)) THEN prob: [0.9884, 0.0116]
ELSE DEFAULT prob: [0.9203, 0.0797]

RuleMatrix
Data Flow RuleMatrix Support Info

RuleMatrix
IF Glucose > 179 THEN Output_Prob = [0.02, 0.98]
ELSE IF BMI > 39 AND DBF > 1.0 THEN Output_Prob = [0.08, 0.92]
ELSE IF Glucose < 87 THEN Output_Prob = [0.99, 0.01]
...
• Each rule -> a row

• Each feature -> a column
• Each condition -> a small glyph + shadowed range
• Multiple conditions (AND) -> multiple glyphs
2 • Output probs -> color-coded number
RuleMatrix
IF Glucose > 179 THEN Output_Prob = [0.02, 0.98]
ELSE IF BMI > 39 AND DBF > 1.0 THEN Output_Prob = [0.08, 0.92]
ELSE IF Glucose < 87 THEN Output_Prob = [0.99, 0.01]
...
2
2
RuleMatrix
Data Flow
• Width of the flow: amount of data

• Color: label of the data
• Fork: data captured/not captured by a rule
RuleMatrix
Support Info
Negative Positive • Fidelity: accuracy of the rule in
predict by the model as Negative but wrong approximating the given model
Predict by the model as Positive but wrong
• Evidence (support): the data that

support (is captured by) the rule
• Length of the bar shows the

amount of data
• Stripped part are wrong predictions

Case - Understand the Model
• Dataset: Pima Female Diabetes
• Features: Glucose, Age, BMI, DPF, Pregnancy...
• Labels: Negative (Healthy), Positive (Diabetes)
• Model: 2-layer Neural Network (20, 20)
• Accuracy: 79% (Train), 73% (Test)
Negative Positive
predict by the model as Negative but wrong
Predict by the model as Positive but wrong
Young age AND Low BMI -> Negative
Negative predict by the model as Negative but wrong

Positive Predict by the model as Positive but wrong
Young age AND Low BMI -> Negative
Filters
-> Almost all Negative!
Negative predict by the model as Negative but wrong

Case2 - Understand the Errors of the Model
Older Age AND High BMI AND Medium Glucose -> ? Filters
Acc: 79% -> 57%!

Negative predict by the model as Negative but wrong The model finds it hard to predict!
Case2 - Understand the Errors of the Model
Oversample the erroneous subset of data (sampling rate: 2.0)
Before Sampling After Sampling

(10 runs) test set (10 runs) test set
Mean Acc: 72.4% Mean Acc: 76.3%
Min Acc: 70.2% Min Acc: 74.0%
Max Acc: 74.0% Max Acc: 79.2%

Demo of RuleMatrix
Web Demo (http://bit.ly/rulematrix-demo)
Jupyter Notebook (http://bit.ly/rm-note)


DL
DL
Treat as black box

HKUST VISLAB
62
ATMSeer:
Increasing Transparency and Controllability in
Automated Machine Learning
Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu,
Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu
63
Motivation
Support Vector Machine ?

Min Samples Leaf = ?
Leaf Size = ?
Neural Network ?
Max Depth = ?
Learning Rate = ?
Random Forest ?
Min Samples Split = ?
Hidden Layer = ?
Kernel Function = ?
K Nearest Neighbor ?
Linear Regression ?
Activation= ?
65
Motivation
Make it automated
Motivation
Run long enough?
Sufficiently explore
the search space? Miss some
suitable models?
lt s ,
Prior Resu rns,
Knowledg e
e Patt hts
In s ig
Automated
Machine Learning
67
Motivation
Run long enough?

Sufficiently explore
the search space? Miss some
suitable models? Transparency
e s u l ts,
R
Prior tt e r n s,
Pa s
Knowledg s ig ht
e In
Controllability
Automated
Machine Learning
68
ATMSeer: Increasing Transparency and Controllability
in Automated Machine Learning
Transparency
e s u l ts,
R
Prior Automatedtt e r n s,
Pa s
Knowledg s ig h t
e
Machine In
Learning
Controllability
Transparency Controllability
Analyze the searched models Automated
Modify the search space
Machine Learning 69
Designing ATMSeer
Transparency: What needs to be seen?

Controllability: What needs to be controlled?
70
Designing ATMSeer
An Workflow of using AutoML
Start
Yes
D1. Modify No Run AutoML No D3. Reason No Use the

D2. Adjust
search space? Process model choice? model
budget?
Yes Yes
Modify the range of Compare top k models and evaluate

1. Algorithms Choose one
1. Performance stability
2. Hyperpartitions model
2. Model complexity
3. Hyperparameters
Designing ATMSeer
• Have domain-specified preference
D1. Modify • Have prior knowledge
search space?
• …
Algorithm level Hyperpartition level Hyper parameter level
72
Designing ATMSeer
• Unsatisfying results
D2. Adjust
computational
• Potential to improve
budget? • Low coverage
• ……
Compu
tational
Budget
Model
Perform
ance
73
Designing ATMSeer
D3.
Reason/analyze
• Unfamiliar with the model
model choice? • Models with similar performances
Performance
Robustness
Score
Trust
74
Usage Scenarios
75
Usage Scenarios
76
Usage Scenarios
77
Other Findings
Different suitable hyperparameters for different dataset
78
Other Findings
Same dataset, different algorithms
79

DL
DL
Treat as black box

HKUST VISLAB
80
Limitations and Future Work
Limitations:
• Scalability
• Generalization
• Validation
Future Work:
• Model bias
• Explainability by analogy
• …
Thank You!
Contact:
Huamin Qu
huamin@cse.ust.hk
More Info:
http://vis.cse.ust.hk/groups/xai-vis/
82

Talk MBA AI XAI 3 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Talk MBA AI XAI 3 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Explainable AI and Visual Analytics (III)

THE HONG KONG UNIVERSITY

Better visualization and user

RNNVis (IEEE VAST’17) (Model Understanding)

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’19) (Model Refinement)

• Vis for XAI

• VIS for AutoML

• Conclusions and Future Work

The concept of XAI. DARPA, Explainable AI Project 2017

DARPA, Explainable AI Project 2017

RNNVis (IEEE VAST’17) (Model Understanding)

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’19) (Model Refinement)

Xun Zhao, Yanhong Wu, Dik Lee, Weiwei Cui

Interpret random forest models and predictions

RNNVis (IEEE VAST’17) (Model Understanding)

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

What has the RNN learned from data?

A unit sensitive to position in a line.

A lot more units have no clear meanings.

Highly responsive hidden units

Distribution of model’s response given the word “he”.

Investigating one unit/word at a time…

P: Too much user burden!

Words good nice by bad worst

RNNVis: Ming et al. 2017

Words good nice by bad worst

RNNVis: Ming et al. 2017

Color: sign of the average weight

Words good nice by bad worst

RNNVis: Ming et al. 2017

Words he she by can may

Hidden Units Clusters Words Clusters

RNNVis (IEEE VAST’17) (Model Understanding)

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

Dongyu Liu, Weiwei Cui, Kai Jin, Yuxiao Guo, Huamin Qu

quantities need to monitor

• Validation view shows how the

• Layer view shows how the model parameters evolve over

• Three views, i.e., validation

RNNVis (IEEE VAST’17) (Model Understanding)

AutoML ATMSeer (ACM CHI’19) (Algorithm Transparency)

InteractiveML ProtoSteer (ACM KDD’19, IEEE VIS’10) (Model Refinement)

Decision Rule List

X ELSE IF X1 < 4 AND X3 = 1 THEN Probability = 0.2

√ ELSE IF X2 > 9 AND X4 = 0 THEN Probability = 0.1

ELSE IF (X6 in (1.0258, 2.0217)) THEN prob: [0.3404, 0.6596]

ELSE IF (X5 in (-inf, 23.632)) THEN prob: [0.9515, 0.0485]

ELSE IF (X1 in (137.52, 155.06)) THEN prob: [0.3077, 0.6923]

ELSE IF (X6 in (0.3688, 1.0258)) THEN prob: [0.6903, 0.3097]

ELSE DEFAULT prob: [0.9203, 0.0797]

Data Flow RuleMatrix Support Info

• Each rule -> a row

• Width of the flow: amount of data

• Evidence (support): the data that

• Length of the bar shows the

• Stripped part are wrong predictions

• Dataset: Pima Female Diabetes

• Features: Glucose, Age, BMI, DPF, Pregnancy...

• Labels: Negative (Healthy), Positive (Diabetes)

quantities need to monitor