Posted by John on 2020-06-15
Words 2.6k and Reading Time 10 Minutes
Viewed Times

## 前言

“在深度學習中，你(設計)的(損失)函數是怎麼更新的，他可微嗎?”

• relu是convex
• L2是convex

## Convex or Non-convex

• 如果函數是凸函數，那我們可以很快的找到全域最小值
• 傳統機器學習方法問題大多是凸函數
• 但如果函數是非凸函數，模型學習到的就很容易是區域最小值而非全域最佳解，這也是我們必須透過training set/ validation set來判斷模型好不好的原因，因為你不知道你現在的位置到底在哪裡。
• 深度學習方法大多是非凸函數(注意這裡用的是大多，而不是全部)

There are various ways to test for convexity.

One is to just plot a cross-section of the function and look at it. If it has a non-convex shape, you don’t need to write a proof; you have disproven convexity by counter-example.

If you want to do this with algebra, one way is just to take the second derivatives of a function. If the second derivative of a function in 1-D space is ever negative, the function isn’t convex.

For neural nets, you have millions of parameters, so you need a test that works in high-dimensional space. In high-dimensional space, it turns out we can take the second derivative along one specific direction in space. For a unit vector $d$ giving the direction and a Hessian matrix $H$ of second derivatives, this is given by $d^{T}Hd$.

For most neural nets and most loss functions, it’s very easy to find a point in parameter space and a direction where $d^{T}Hd$ is negative.

## Is convex function differentiable?

2. 對於relu在0這個點的微分通常會設成0，不過實際上在[0, 1]之間選一個值也可以(我也看過有人用1/2的)

$g$ is a subgradient of $f$ (convex or not) at $x$ if

1. 用次梯度对原函数做出的一阶展开估计总是比真实值要小；
2. 次梯度可能不唯一。

• 更新公式: $x^{(k)}=x^{(k-1)}-\alpha_{k} \nabla f\left(x^{(k-1)}\right)$

## 總結

2. 函數是不是convex對於找最佳解的影響
3. convex不一定可微