Posted by John on 2020-06-09
NAACL2019 2019的一篇文章，旨在透過一系列的實驗提出一種新的看法:attention機制並不是解釋性，我們不能用attention來說作為XAI的技術。這其實是個蠻特別的觀點，Attention自從2016年崛起後，一堆研究都是基於他而擴展的，現在這篇論文出來卻是打臉了attention的可解釋性。

…所以以後不能再用attention來做XAI(Explainable AI)了嗎? QQ饅頭喔

Abstract

In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs.
In this work we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful “explanations” for predictions

attention雖然已經被證實可以提升效果，但是很多人也同時把他吹捧成具有透明性(transparency)，可以幫助解釋深度學習這個黑盒子。

2. 不同的注意力分布可以產生相同的預測效果(如果注意力是指模型的重要性的話，那為何不同的重要性分布可以有一樣的效果呢?)

Github code的部分在: https://github.com/successar/AttentionExplanation

Introduction and Motivation

Li et al. (2016) summarized this commonly held view in NLP: “Attention provides an important way to explain the workings of neural models”. Indeed, claims that attention provides interpretability are common in the literature

2016年後，普遍認為attention是可以來幫助解釋模型運作的一種注意力機制。這項共識來自於一個假設:

Assuming attention provides a faithful explanation for model predictions, we might expect the following properties to hold.

1. Attention weights should correlate with feature importance
2. Alternative (or counterfactual) attention weight configurations ought to yield corresponding changes in prediction (and if they do not then are equally plausible as explanations).

2. 不同的attention weight分布應該會造成不同的prediction，因為注意的地方不同了。反之，如果不會造成不同的prediction，則可以用來當作他是解釋性的一種依據

1. 他們發現這樣子的attention weight和gradient-based measures of feature importance的相關性非常的小
2. 他們另外造了一個分布$\tilde{\alpha}$(此時重要的token變成了myself, was)卻能得到一樣的prediction。如果更換其他分布對於預測的準確度也不會降低太多(只有0.006的誤差)

We thus caution against using attention weights to highlight input tokens “responsible for” model outputs and onstructing just-so stories on this basis.

Research questions and contributions

We investigate whether this holds across tasks by exploring the following empirical questions.

1. To what extent do induced attention weights correlate with measures of feature importance – specifically, those resulting from gradients and leave-one-out (LOO) methods?
2. Would alternative attention weights (and hence distinct heatmaps/“explanations”) necessarily yield different predictions?

2. 不同的attention weight是否會導致不同的prediction?

1. binary text classification
3. Natural Language Inference (NLI)

Experiments

Correlation Between Attention and Feature Importance Measures

• LOO就是看移除某個token $t$時，prediction的TVD與attention weight之間的差異

Note that we disconnect the computation graph at the attention module so that the gradient does not flow through this layer: This means the gradients tell us how the prediction changes as a function of inputs, keeping the attention distribution fixed.

• BiRNN + attention
• 將token經過linear projection layer(以及ReLU)的encoding + attention
3. BiLSTM下的LOO方法跟attention weight的相關性

Counterfactual Attention Weights

We experiment with two means of constructing such distributions.

1. First, we simply scramble the original attention weights $\hat{\alpha}$, re-assigning each value to an arbitrary, randomly sampled index (input feature).
2. Second, we generate an adversarial attention distribution: this is a set of attention weights that is maximally distinct from $\hat{\alpha}$ but that nonetheless yields an equivalent prediction (i.e., prediction within some of $\hat{y}$)

Discussion and Conclusions

2. 不同的attention distribution可以有相同的prediction

3. 評估標準使用了Kendall $\tau$ measure，這個方法再有不相關的特徵時會因噪聲而帶給了不夠精準的評估
4. 實際上這篇文章只在少部分的attention varients上進行實驗(在文章中只用了BiLSTM)
5. 儘管不同的attention distribution可以產生相同的prediction，但他們不否認可能同時存在多種解釋性，也就是說模型今天可能透過不同的注意力組合來得到相同的推論。
6. 最後，目前只在分類任務上做實驗，他們把其他任務當作future work了(ㆆᴗㆆ)