[論文速速讀]Attention-based LSTM for Aspect-level Sentiment Classification

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

論文網址: https://www.aclweb.org/anthology/D16-1058.pdf

Abstract

Aspect-level sentiment classification is a fine-grained task in sentiment analysis.

這篇文章透過attention mechanism來加強LSTM的Aspect-level sentiment classification，中文我暫且稱作觀點層級的情感分析。

In this paper, we reveal that the sentiment polarity of a sentence is not only determined by the content but is also highly related to the concerned aspect. For instance, “The appetizers are ok, but the service is slow.”, for aspect taste, the polarity is positive while for service, the polarity is negative.

作者發覺句子的情感分析不只跟內容有關，也跟你切入的觀點有關。這也蠻好懂的，舉個例子，”開胃菜好ㄘ，但服務很慢”，在這句話中，如果從食物的角度來看是正向的;但如果從服務的角度來看則是負面的。

所以他們希望不同的aspect被當作輸入時，可以透過attention關注到句子的不同部分。

Attentionすごい

Introduction

The main contributions of our work can be summarized as follows:

We propose attention-based Long Short-Term memory for aspect-level sentiment classification. The models are able to attend different parts of a sentence when different aspects are concerned. Results show that the attention mechanism is effective.

Since aspect plays a key role in this task, we propose two ways to take into account aspect information during attention: one way is to concatenate the aspect vector into the sentence hidden representations for computing attention weights, and another way is to additionally append the aspect vector into the input word vectors.

Experimental results indicate that our approach can improve the performance compared with several baselines, and further examples
demonstrate the attention mechanism works well for aspect-level sentiment classification.

Jumping

Attention-based LSTM with Aspect Embedding

Long Short-term Memory (LSTM)

挖歐…重新介紹了一遍LSTM…厲害…

LSTM with Aspect Embedding (AE-LSTM)

Aspect information is vital when classifying the polarity of one sentence given aspect. We may get opposite polarities if different aspects are considered.
To make the best use of aspect information, we propose to learn an embedding vector for each aspect.

因為Aspect很大程度會影響句子的polarity，所以他們提出針對每一個aspect都訓練一個embedding vector。

注意到這裡的aspect會在之後的task一起train

$v_{\alpha_i}\in \mathbb{R}^{d_\alpha}$是aspect $i$的embedding，$d_{\alpha}$是dimension

$A\in \mathbb{R}^{d_{\alpha}\times|A|}$是所有的embeeding matrix

然後呢？沒有然後了，往下繼續看唄

Attention-based LSTM (AT-LSTM)

設計了一種LSTM架構，可以根據給定的aspect去捕捉句子中的不同部分。

$H \in \mathbb{R}^{d \times N}$, $[h_{1}, \cdots, h_{N}]$ 是LSTM的hidden vector
$v_{a} \in \mathbb{R}^{d_{a}}$ 是某個aspect的embedding

接下來的公式有點難打啊，可以去看原文，不過看架構圖也大致理解了他怎麼做的: 在原先LSTM架構中concat某個特定的aspect embedding，然後去做attention。

題外話，原文裡面的公式沒事搞複雜，扯了一個 $e_{N}$ 出來，然後 $v_\alpha \otimes e_N$ 這又是什麼鬼？

其實看架構圖會更容易理解，或是要注意到原文的這個例子$v_\alpha \otimes e_N= [v;v;…;v]$，其實就是架構圖的每個$h$都會跟$v$做concat的意思而已)

做完attention得到了$r$，然後再接了一個transformation得到最後的representation $h^{*}$，值得一提的是，這裡的tranformation不是一個簡單的FC，他長下面這樣:

$h^{*}=tanh(W_p r + W_x h_N)$

他多加了一項$W_x h_N$，因為他們發現這樣做效果會比較好(*´･д･)?

好啦他們有給reference: Reasoning about Entailment with Neural Attention

最後$h^{*}$再接一層FC然後去softmax分類。

Attention-based LSTM with Aspect Embedding (ATAE-LSTM)

剛剛在hidden layer concat了aspect的embedding，那為何input layer不行？

沒有不行，所以ATAE-LSTM就出來了！只要有心，任何地方都可以被注意！

Experiments

訓練資料必須包含aspect和對應的polarity，這樣才能給定aspect去進行attention。

然後你就可以去看給定aspect下句子的attention score

這個例子是說，在一般情況下，有否定句，長句子下都可以做得不錯