MeetonFriday - 程式 | 學習 | 日記

[論文速速讀]系列文章介紹

前言

論文速速讀系列是從今年四月開始，我開始寫一些論文的中文讀書筆記，還記得第一篇是[論文速速讀]ReZero is All You Need: Fast Convergence at Large Depth，之後發現儘管自己已經陸續產出了幾篇文章，可是好像都沒正式的跟大家介紹這系列文章的由來xD

所以這篇文章就是來講講這系列文章到底是什麼，以及我會和會想寫這些文章。

論文速速讀系列是什麼?

由於在AI領域每年總是有一些非常重大的突破和應用，如果跟不上潮流很有可能就會錯失許多機會。例如，對NLP領域熟悉的話你一定聽過2013年的word2vec、2014年開始流行的attention、2018年的Bert…這些很有名的技術。

還記得Bert剛出的時候我好像剛進碩士實驗室，當時只知道這個技術屌打了當時一堆NLP的研究，但我想也想不到兩年後Bert已經造成如此大的影響力，一堆基於Bert的變形應用在各大領域上都取得了非常優異的結果。

因此，我想要藉由這系列的文章讓自己能夠更加快速的了解AI的新技術和研究，同時逼迫自己看論文xD

繼續閱讀

Posted by John on 2020-07-09

[課程筆記]課程筆記系列總覽

本文記錄了自己在上課時所記錄的一些課程筆記，可以透過這邊文章連結到所有以往發過的課程筆記文章。

繼續閱讀

Posted by John on 2021-01-24

[論文速速讀]Attention is not Explanation

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

NAACL2019 2019的一篇文章，旨在透過一系列的實驗提出一種新的看法:attention機制並不是解釋性，我們不能用attention來說作為XAI的技術。這其實是個蠻特別的觀點，Attention自從2016年崛起後，一堆研究都是基於他而擴展的，現在這篇論文出來卻是打臉了attention的可解釋性。

…所以以後不能再用attention來做XAI(Explainable AI)了嗎? QQ饅頭喔

也別難過得太早，同一年又有另一篇論文跳出來了，EMNLP 2019的”Attention is not not Explanation”，看這個標題就知道擺明就是要跟這篇對著幹，所以先看完這篇再去看如何被反駁好像也是蠻有趣的?

題外話，我是站Attention is not not Explanation這邊，希望那篇可以狠狠打臉這篇xD

繼續閱讀

Posted by John on 2020-06-09

新竹租屋經驗談

不自殺聲明

這篇文章就是要分享在新竹找房子租屋的心路歷層，只能說相對於高雄，我覺得新竹的租屋環境真的很不友善…

以下為真實經歷，很多相關的資訊也可以上網找一下就能佐證了，我沒有想要拉黑誰或是斷人財路，只是覺得在新竹租屋實在是資訊太不對等了，僅分享自己找房子的經驗，希望能幫助到未來同樣要在新竹找房子的朋友。

要看房?前一個禮拜再聯絡謝謝

由於工作關係，畢業後應該也是確定待在新竹了(園區附近)，考量到之後自身要準備口試還有搬家都需要一定時間，我打算6月同時承租目前的房子跟新房子，於是，我最先的規劃如下:

4月底開始消極看房，陸續加入一些臉書的新竹租屋社團以及上591看有沒有喜歡的物件，如果有喜歡的想說就直接連絡房東預約看房，順便問他能不能讓我六月起租

不過就在我四月底開始聯絡一些房東(其實他們根本不是房東，這些後面會講到)時，他們都口頭一致的先問了我打算的起租時間，然後就叫我五月中甚至五月底才來看房。

Okay, fine. 聽完後我就大概知道在新竹就是這樣了，都是快要租的時候(大約前一到兩個禮拜)才會讓你看房。

繼續閱讀

Posted by John on 2020-06-05

[論文速速讀]End-to-end object detection with Transformers

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

DETR，是DEtection TRansformer的縮寫，FB首度將NLP的transformer用在CV的object detection上，還不用做NMS。
FB真滴神。已經沒有任何東西可以阻擋attention了…

論文網址: https://arxiv.org/pdf/2005.12872.pdf

Abstract

We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task.

將object detection視作一個direct set prediction problem。並且精簡了很多object detection上的額外操作(non-maximum suppression, anchor generation)

The main ingredients of the new framework, called DEtection TRansformer or
DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture.

這段提到了這個架構的重點在於

set-based global loss that forces unique predictions via bipartite matching，也就是透過二分匹配的一個global loss
transformer的encoder-decoder架構

繼續閱讀

Posted by John on 2020-06-04

[Pytorch]zero_grad()和backward()使用技巧

前言

今天來聊聊Pytorch的gradient update這個寫法。對Pytorch不陌生的朋友應該知道，一個pytorch model training的起手式大概長這個樣子:

for idx, (batch_x, batch_y) in enumerate(data_loader):
    output = model(batch_x)
    loss = criterion(output, batch_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

這段code看似簡單，實際上他做了下列這些事情:

將data傳入model進行forward propagation
計算loss
清空前一次的gradient
根據loss進行back propagation，計算gradient
做gradient descent

如果用numpy純刻forward/backward是一件很累的事情，所以deep learning framework如Pytorch, tensorflow都幫你做完了，大感恩。

這篇文章主要是針對上面這個起手式來討論一些有的沒的，在往下看之前也可以先想想看這些問題:

Pytorch為什麼要手動將gradient清空(optimizer.zero_grad())，不能把這一步自動化嗎?
同理，為什麼gradient也要手動計算(loss.backward())，不能每一次forward做完就自動算出對應的gradient嗎?

繼續閱讀

Posted by John on 2020-05-28

[論文速速讀]Attention-based LSTM for Aspect-level Sentiment Classification

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

論文網址: https://www.aclweb.org/anthology/D16-1058.pdf

Abstract

Aspect-level sentiment classification is a fine-grained task in sentiment analysis.

這篇文章透過attention mechanism來加強LSTM的Aspect-level sentiment classification，中文我暫且稱作觀點層級的情感分析。

In this paper, we reveal that the sentiment polarity of a sentence is not only determined by the content but is also highly related to the concerned aspect. For instance, “The appetizers are ok, but the service is slow.”, for aspect taste, the polarity is positive while for service, the polarity is negative.

作者發覺句子的情感分析不只跟內容有關，也跟你切入的觀點有關。這也蠻好懂的，舉個例子，”開胃菜好ㄘ，但服務很慢”，在這句話中，如果從食物的角度來看是正向的;但如果從服務的角度來看則是負面的。

所以他們希望不同的aspect被當作輸入時，可以透過attention關注到句子的不同部分。

Attentionすごい

繼續閱讀

Posted by John on 2020-05-25

[論文速速讀]Hierarchical Attention Networks for Document Classification

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

論文網址: https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf

Abstract

We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics:
(i) it has a hierarchical structure that mirrors the hierarchical structure of documents;
(ii) it has two levels of attention mechanisms applied at the word and sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation.

提出一個基於hierarchical attention architecture的模型，用於文本分類任務，abstract點出兩個特色:

使用了可以反映文章結構的的階層結構
在word level和sentence level上都使用了attention mechanism，使得在建構representation時能夠注意到比較重要的內容

繼續閱讀

Posted by John on 2020-05-19

星期五。見面

[論文速速讀]系列文章介紹

前言

論文速速讀系列是什麼?

[課程筆記]課程筆記系列總覽

[論文速速讀]Attention is not Explanation

新竹租屋經驗談

不自殺聲明

要看房?前一個禮拜再聯絡謝謝

[論文速速讀]End-to-end object detection with Transformers

Abstract

[Pytorch]zero_grad()和backward()使用技巧

前言

[論文速速讀]Attention-based LSTM for Aspect-level Sentiment Classification

Abstract

[論文速速讀]Hierarchical Attention Networks for Document Classification

Abstract

SEARCH

ABOUT ME

categories

FEATURED TAGS

VISITORS

RECENT POSTS

ARCHIVES