# [論文速速讀]CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

Posted by John on 2020-04-25
Words 1.2k and Reading Time 5 Minutes
Viewed Times

〖想觀看更多中文論文導讀，至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章！〗

## Abstract

…current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise.
Such removal is not desirable because it leads to information loss and inefficiency during training.

• 以往的regional dropout技術是在圖片上加上黑色的補釘(patch)或是雜訊來使得model更robust
• 這樣的方式會造成information loss或是訓練效率降低

We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.

• 切割patch然後把其他張圖片拿來補

## Introduction

In particular, to prevent a CNN from focusing too much on a small set of intermediate activations or on a small region on input images, random feature removal regularizations have been proposed. Examples include dropout [33] for randomly dropping hidden activations and regional dropout [2, 49, 32, 7] for erasing random regions on the input. Researchers have shown that the feature removal strategies improve generalization and localization by letting a model attend not only to the most discriminative parts of objects, but rather to the entire object region [32, 7].

While regional dropout strategies have shown improvements of classification and localization performances to a certain degree, deleted regions are usually zeroed-out [2, 32] or filled with random noise [49], greatly reducing the proportion of informative pixels on training images

• 既然如此，把要dropout的區域用其他張圖片來補就好了，這樣既有資訊可以學又可以robust

### 與其它的方法比較

• CutOut就是單純的黑色補丁，但是會有多餘的information loss
• Mixup是透過把兩張圖片作線性插值來混合，但這樣會造成混合後的圖片很不自然
• CutMix是只針對patch region來補其它圖片，解決了上述兩個缺點

• regional dropout
• data augmentation

## CutMix

### Algorithm

$x\in R^{W\times H\times C}$是圖片, $y$是label，CutMix做的是合成兩張圖片$(x_A, y_A)$和$(x_B, y_B)$然後產生新圖片$(\tilde{x}, \tilde{y})$，透過以下公式:

• $M\in {0,1}^{W\times H}$是一個binary mask，在bounding box內的值是0；否則是1
• $\odot$是element-wise multiplication
• combination ratio$\lambda$來自beta distribution $Beta(\alpha, \alpha)$
• $\lambda$來自Mixup的原始paper

• 該區域會將B的內容crop下來蓋到A上面

source code: clovaai/CutMix-PyTorch

• 決定bounding box:

• 取一個minibatch，將該batch的所有圖片都用同一張B來合成

• 注意到實際上這張合成圖混和label的意思，是指在實作上算loss時使用不同的label來算cross entropy並加權平均

## Discussion

• Mixup混和了兩張圖片，造成模型無法正確知道要關注什麼部分
• Cutout再只有兩個類別的時候效果不錯，因為遮掉另一個類別所以可以區分剩下的類別；不過這樣就無法有效的關注另一個類別了
• CutMix棒棒哒，上面的兩個問題都解決了

>