首页 > 技术文章 > 关于GAN的推导

JunzhaoLiang 2021-01-08 22:24 原文

关于GAN的推导

Maximum Likelihood Estimation 最大似然估计

  1. sample \(\{x^1,x^2,\ldots,x^m\}\) from \(P_{data}(x)\)

  2. 计算每个\(x^n\)来自于\(P_G(x;\theta)\)的概率

    \[L=\prod_{i=1}^mP_G(x;\theta) \]

  3. 找到一个\(\theta^*\)使得该概率的总和最大

from MLE to KL Divergence

\[\begin{aligned} \theta^*&=arg\,\max_{\theta}\prod_{i=1}^m{P_G(x^i;\theta)}\\ &=arg\,\max_{\theta}\log\prod_{i=1}^m{P_G(x^i;\theta)}\\ &=arg\,\max_{\theta}\sum_{i=1}^m\log{P_G(x^i;\theta)}\\ &\approx arg\,\max_{\theta}E_{x\sim P_{data}}\log{P_G(x;\theta)}\\ &=arg\,\max_{\theta}\int_{x}P_{data}(x)\log{P_G(x;\theta)}dx\\ &=arg\,\max_{\theta}\int_{x}P_{data}(x)\log{P_G(x;\theta)}dx-\int_{x}P_{data}(x)\log{P_{data}(x)}dx\\ \\ &上式右半部分与\theta无关,不影响\theta取值\\ \\ &=arg\,\max_{\theta}\int_{x}P_{data}(x)\frac {P_G(x;\theta)}{P_{data}(x)}dx\\ &=arg\,\min_{\theta}KLD(P_{data}||P_G)\\ \end{aligned}\]

  • 然而,\(P_{data}\)在高维空间中往往是低维流形,最大似然估计的方法使用的先验分布(例如高斯)无论\(\theta\)取任意值,都无法拟合目标的低维流形

  • GAN采用的方法,是通过生成器将先验分布\(Z\)中的向量映射到目标分布中

  • 使用mapping network先将\(Z\)映射到中间分布\(W\)再到目标分布能够提升效果(参考StyleGAN论文)

Why is GAN minimizing JS Divergence

original GAN loss

\[LOSS=\min_{\theta}\max_{G}V(D,G) \]

\[\begin{aligned} V&=E_{x\sim P_{data}}[\log D(x)]+E_{x\sim P_{G}}[\log (1-D(x))]\\ &=\int_{x}P_{data}(x)\log{D(x)}dx+\int_{x}P_{G}(x)\log{(1-D(x))}dx\\ &=\int_{x}P_{data}(x)\log{D(x)}+P_{G}(x)\log{(1-D(x))}dx\\ \end{aligned}\]

\[每个x都是独立的个体,因此针对单个x,可以求出局部最优的D^*,令\\ a=P_{data}(x),b=P_{G}(x),D=D(x) \]

\[Find\ D^*\ by\ maximizing:\ f(D)=a\log(D)+b\log(1-D) \]

\[\frac{\mathrm{d}f(D)}{\mathrm{d}D}=\frac{a}{D}-\frac{b}{1-D} \]

\[令\frac{\mathrm{d}f(D)}{\mathrm{d}D}=0\ \Rightarrow D^*=\frac{a}{a+b}\Rightarrow D^*=\frac{P_{data}(x)}{P_{data}(x)+P_{G}(x)} \]

\[将D^*代入V中,得到 \]

\[\begin{aligned} V&=\int_xP_{data}(x)\log{\frac{P_{data}(x)}{P_{data}(x)+P_{G}(x)}}dx+\int_xP_{G}(x)\log{\frac{P_{G}(x)}{P_{data}(x)+P_{G}(x)}}dx\\ &=-2\log2+\int_xP_{data}(x)\log{\cfrac{P_{data}(x)}{\cfrac{P_{data}(x)+P_{G}(x)}{2}}}dx+\int_xP_{G}(x)\log{\cfrac{P_{G}(x)}{\cfrac{P_{data}(x)+P_{G}(x)}{2}}}dx\\ &=-2\log2+KLD(P_{data}||\frac{P_{data}+P_{G}}{2})+KLD(P_{G}||\frac{P_{data}+P_{G}}{2})\\ &=-2\log2+2JSD(P_{data}||P_{G}) \end{aligned}\]

f-divergence

定义f-divergence

\[D_f(P||Q)=\int_xq(x)f(\frac{p(x)}{q(x)})dx \]

\[当f\ is\ convex;\ f(1)=0,D_f满足非负^*,且P=Q时,D_f=0 \]

\[*:D_f(P||Q)\ge f(\int_xq(x)\frac{p(x)}{q(x)}dx)=f(1)=0 \]

Fenchel Conjugate

\[每个convex\ function\ f都有其对应的共轭convex函数f^* \]

\[f^*(t)=\max_{x\in dom(f)}\{xt-f(x)\} \]

\[其含义为xt-f(x)的上界 \]

\[求解f^*(t):将t视为常数,将xt-f(x)对x求导,求得导数为0时x关于t的表达式,代回到原式中消掉x \]

\[同理有:f(x)=\max_{x\in dom(f)}\{xt-f^*(t)\} \]

\[因此D_f(P||Q)=\int_xq(x)\left(\max_{t\in dom(f^*)}\left\{\frac{p(x)}{q(x)}t-f^*(t)\right\}\right)dx \]

\[令D(x)=t,即输入x能让上式\max中的值最大,得到D_f的下界 \]

\[\begin{aligned} D_f(P||Q)&\ge\int_xq(x)\left(\frac{p(x)}{q(x)}D(x)-f^*(D(x))\right)dx\\ &=\int_xp(x)D(x)dx-\int_xq(x)f^*(D(x))dx \end{aligned} \]

\[找到一个D(x)使得下界最大,将该最大下界来近似D_f \]

\[\begin{aligned} D_f&\approx \max_D\int_xp(x)D(x)dx-\int_xq(x)f^*(D(x))dx\\ &=\max_D\{E_{x\sim P}[D(x)]-E_{x\sim Q}[f^*(D(x))]\}\\ &=\max_D\{E_{x\sim P_{data}}[D(x)]-E_{x\sim P_{G}}[f^*(D(x))]\} \end{aligned}\]

\[至此,将f^*代入不同的函数就能在GAN中使用不同的Divergence \]

推荐阅读