MeetonFriday - 程式 | 學習 | 日記

[論文速速讀]系列文章介紹

前言

論文速速讀系列是從今年四月開始，我開始寫一些論文的中文讀書筆記，還記得第一篇是[論文速速讀]ReZero is All You Need: Fast Convergence at Large Depth，之後發現儘管自己已經陸續產出了幾篇文章，可是好像都沒正式的跟大家介紹這系列文章的由來xD

所以這篇文章就是來講講這系列文章到底是什麼，以及我會和會想寫這些文章。

論文速速讀系列是什麼?

由於在AI領域每年總是有一些非常重大的突破和應用，如果跟不上潮流很有可能就會錯失許多機會。例如，對NLP領域熟悉的話你一定聽過2013年的word2vec、2014年開始流行的attention、2018年的Bert…這些很有名的技術。

還記得Bert剛出的時候我好像剛進碩士實驗室，當時只知道這個技術屌打了當時一堆NLP的研究，但我想也想不到兩年後Bert已經造成如此大的影響力，一堆基於Bert的變形應用在各大領域上都取得了非常優異的結果。

因此，我想要藉由這系列的文章讓自己能夠更加快速的了解AI的新技術和研究，同時逼迫自己看論文xD

繼續閱讀

Posted by John on 2020-07-09

[課程筆記]課程筆記系列總覽

本文記錄了自己在上課時所記錄的一些課程筆記，可以透過這邊文章連結到所有以往發過的課程筆記文章。

繼續閱讀

Posted by John on 2021-01-24

[論文速速讀]NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

前言

attention在nlp很紅眾所皆知，這篇據說是nlp第一個將attention概念引入的paper，還不來拜見衣食父母

paper: NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

作者的slide: Neural Machine Translation by Jointly Learning to Align and Translate

ABSTRACT

In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

以往神經網路再翻譯的任務上都是建立在encoder-decoder的架構上，固定長度的context vector會成為效能的bottleneck，所以這篇文章提出了dynamic context vector的概念，讓model自己去搜尋input和related predict的相關部分，使得效能upupup。

這也是最早提出attention mechanism的paper。

INTRODUCTION

繼續閱讀

Posted by John on 2020-05-06

解決NVIDIA-SMI has failed because it could not communicate with the NVIDIA driver

前言

最近因為碩論進入了瘋狂跑實驗補數據的階段，有幾個model又要訓練的蠻久的，讓人有時覺得有點小煩躁。

問題描述

昨天，對就是昨天，在server上用gpu訓練model訓練到一半，然後我interrupt kernel後感覺失敗了，從那一刻起jupyter再也不理我，然後ssh server也連不上了…

不過萬幸的是ping server還是ping的到，所以應該不是整個server爆炸了。不過由於server在學校，也沒辦法繼續進行實驗，等隔天請人幫忙將server重開後server就可以登進去了。

儘管成功登進去了，不過又遇到了奇怪的靈異事件，那就是不論使用nvidia-smi或是gpustat均取得不到顯卡的資訊。

明明前一晚還好好的，一重開就這樣真的很讓人傻眼…這種時期server可不能出事阿…

繼續閱讀

Posted by John on 2020-05-02

Difference average method in sklearn.metrics.classification_report()

前言

在評估模型好壞時，除了accuracy往往也會想看看其他的評估指標

而在sklearn中，有一個很方便的function可以快速取得評估模型的一些量化指標，例如下方程式碼:

1 2	from sklearn.metrics import classification_report classification_report(test_y_true, test_y_pred, digits=3)

會得到:

這張圖到底要怎麼看，可以知道macro, weighted是不同的平均方法，所以最後兩行應該是對各種評估指標用不同的方法進行平均，那位什麼accuracy又放在倒數第三行，且沒有precision和recall呢?

先上結論，accuracy那行其實是在做micro avg
對於micro avg，precision, recall, f1-score是相同的

詳細內容請看下方介紹

繼續閱讀

Posted by John on 2020-04-27

[論文速速讀]Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

論文網址: https://arxiv.org/pdf/2003.13048.pdf

Abstract

However, all of them perform this operation randomly, without capturing the most important region(s) within an object. In this paper, we propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix [3]. In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor, which enables searching for the most discriminative parts in an image.

CutMix的進化版，以往的data augumentation都是random operation。Attentive CutMix透過取出中間層的attention map來挑選最具有解釋性的區域進行CutMix。

Introduction

Attentive CutMix想要基於CutMix的情況下，找出最具代表性的region來進行替換。

下圖高能，非戰鬥人員請迅速撤離

這已經沒在顧及動物的感受了…求那隻狗的心理面積…

繼續閱讀

Posted by John on 2020-04-26

[論文速速讀]CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

論文網址: https://arxiv.org/pdf/1905.04899.pdf

Abstract

…current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise.
Such removal is not desirable because it leads to information loss and inefficiency during training.

以往的regional dropout技術是在圖片上加上黑色的補釘(patch)或是雜訊來使得model更robust
這樣的方式會造成information loss或是訓練效率降低

We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.

切割patch然後把其他張圖片拿來補

繼續閱讀

Posted by John on 2020-04-25

[論文速速讀]Attention Is All You Need

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

簡介

paper: Attention Is All You Need

提到nlp近年來的重點技術之一就不能不提到attention，注意力機制提出後幾乎所有nlp論文都被重新用attention掃過一輪benchmark。

雖然這篇不是第一個提出注意力機制的，不過後面的各種芝麻街是基於這篇來延伸。

關於attention一路走來的發展，可以參考之前我寫的[DL]Attention Mechanism學習筆記，這篇會主要在摘要paper重點內容。

Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

以往的seqence transduction models是使用基於encoder & decoder的複雜RNN/CNN模型
- 最佳的模型則是使用了基於attention mechanism的encoder decoder(還是依據RNN/CNN)
提出了transformer，不使用CNN/RNN，完全只使用attention mechanism的網路架構
- 不過他的架構還是encoder decoder的概念，只是沒用到RNN/CNN

繼續閱讀

Posted by John on 2020-04-14

星期五。見面

[論文速速讀]系列文章介紹

前言

論文速速讀系列是什麼?

[課程筆記]課程筆記系列總覽

[論文速速讀]NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

前言

ABSTRACT

INTRODUCTION

解決NVIDIA-SMI has failed because it could not communicate with the NVIDIA driver

前言

問題描述

Difference average method in sklearn.metrics.classification_report()

前言

[論文速速讀]Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification

Abstract

Introduction

[論文速速讀]CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

Abstract

[論文速速讀]Attention Is All You Need

簡介

Abstract

SEARCH

ABOUT ME

categories

FEATURED TAGS

VISITORS

RECENT POSTS

ARCHIVES