〖想觀看更多中文論文導讀,至[論文速速讀]系列文章介紹可以看到目前已發布的所有文章!〗
論文網址: https://arxiv.org/pdf/1905.04899.pdf
Abstract
…current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise.
Such removal is not desirable because it leads to information loss and inefficiency during training.
- 以往的regional dropout技術是在圖片上加上黑色的補釘(patch)或是雜訊來使得model更robust
- 這樣的方式會造成information loss或是訓練效率降低
We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches.
- 切割patch然後把其他張圖片拿來補
Introduction
In particular, to prevent a CNN from focusing too much on a small set of intermediate activations or on a small region on input images, random feature removal regularizations have been proposed. Examples include dropout [33] for randomly dropping hidden activations and regional dropout [2, 49, 32, 7] for erasing random regions on the input. Researchers have shown that the feature removal strategies improve generalization and localization by letting a model attend not only to the most discriminative parts of objects, but rather to the entire object region [32, 7].
為了避免CNN關注到某些特定小區域,一些random feature removal regularizations的技術被提出,向是dropout或是feature removal strategies
研究發現feature removal strategies可以使得模型關注整體的資訊而不是只關注最重要的部分,因此提升了generalization和localization
While regional dropout strategies have shown improvements of classification and localization performances to a certain degree, deleted regions are usually zeroed-out [2, 32] or filled with random noise [49], greatly reducing the proportion of informative pixels on training images
但作者認為regional dropout strategies造成了被dropout的區域值全部都是zero或是noise,導致了過多的information loss
- 既然如此,把要dropout的區域用其他張圖片來補就好了,這樣既有資訊可以學又可以robust
與其它的方法比較
- CutOut就是單純的黑色補丁,但是會有多餘的information loss
- Mixup是透過把兩張圖片作線性插值來混合,但這樣會造成混合後的圖片很不自然
- CutMix是只針對patch region來補其它圖片,解決了上述兩個缺點
Related Works
以往的CNN優化技巧有:
- regional dropout
- data augmentation
剩下的就是介紹CutMix有多棒,然後因為他是在data level進行操作,所以不會影響原本的模型架構
CutMix
Algorithm
$x\in R^{W\times H\times C}$是圖片, $y$是label,CutMix做的是合成兩張圖片$(x_A, y_A)$和$(x_B, y_B)$然後產生新圖片$(\tilde{x}, \tilde{y})$,透過以下公式:
- $M\in {0,1}^{W\times H}$是一個binary mask,在bounding box內的值是0;否則是1
- $\odot$是element-wise multiplication
- combination ratio$\lambda$來自beta distribution $Beta(\alpha, \alpha)$
- $\lambda$來自Mixup的原始paper
接下來定義bounding box $B=(r_x, r_y, r_w, r_h)$
- 該區域會將B的內容crop下來蓋到A上面
論文的實驗中使用的是rectangular masks(aspect ratio和原圖一樣),box coordinates來自uniformly distribution:
實作細節簡單易懂,取一個minibatch,之後shuffle再取一次minibatch,然後根據上面公式產生新的label:
source code: clovaai/CutMix-PyTorch
決定bounding box:
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295def rand_bbox(size, lam):
W = size[2]
H = size[3]
cut_rat = np.sqrt(1. - lam)
cut_w = np.int(W * cut_rat)
cut_h = np.int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
bbx1 = np.clip(cx - cut_w // 2, 0, W)
bby1 = np.clip(cy - cut_h // 2, 0, H)
bbx2 = np.clip(cx + cut_w // 2, 0, W)
bby2 = np.clip(cy + cut_h // 2, 0, H)
return bbx1, bby1, bbx2, bby2取一個minibatch,將該batch的所有圖片都用同一張B來合成
- 注意到實際上這張合成圖混和label的意思,是指在實作上算loss時使用不同的label來算cross entropy並加權平均
220 | for i, (input, target) in enumerate(train_loader): |
Discussion
- Mixup混和了兩張圖片,造成模型無法正確知道要關注什麼部分
- Cutout再只有兩個類別的時候效果不錯,因為遮掉另一個類別所以可以區分剩下的類別;不過這樣就無法有效的關注另一個類別了
- CutMix棒棒哒,上面的兩個問題都解決了