首页 > 技术文章 > 【深度学习】梯度计算(矩阵向量求导)

xushunsdu 2021-08-14 10:45 原文

0. 标量、向量、矩阵互相求导的形状

标量、向量和矩阵的求导(形状)
  标量x (1,) 向量x (n,1) 矩阵X (n,k)

标量y (1,)

$\frac{\partial y}{\partial x}$ (1,)  $\frac{\partial y}{\partial\textbf x}$ (1,n)  $\frac{\partial y}{\partial\textbf X}$ (k,n)

向量(m,1)

 $\frac{\partial\textbf y}{\partial x}$ (m,1)  $\frac{\partial\textbf y}{\partial\textbf x}$ (m,n)  $\frac{\partial\textbf y}{\partial\textbf X}$ (m,k,n)
矩阵(m,l)  $\frac{\partial\textbf Y}{\partial x}$ (m,l)  $\frac{\partial\textbf Y}{\partial\textbf x}$ (m,l,n)  $\frac{\partial\textbf Y}{\partial\textbf X}$ (m,l,k,n)

PS:默认使用列向量和分子布局(分子不变,分母转置)。

1. 标量对向量求导 $\frac{\partial y}{\partial\textbf x}$ 

  $\textbf x\left ( n,1 \right )= \begin{bmatrix}
x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}$ ,y为标量,

  $\frac{\partial y}{\partial\textbf x}\left ( 1,n \right )= \begin{bmatrix}
\frac{\partial y}{\partial x_{1}} & \frac{\partial y}{\partial x_{2}} & \cdots & \frac{\partial y}{\partial x_{n}}
\end{bmatrix}$

  PS:标量对列向量求导,变为行向量,标量对向量每一元素求导。

2. 向量对标量求导 $\frac{\partial\textbf y}{\partial x}$

  x为标量,$\textbf y\left ( m,1 \right )= \begin{bmatrix}
y_{1}\\
y_{2}\\
\vdots\\
y_{m}
\end{bmatrix}$ ,

  $\frac{\partial\textbf y}{\partial x}\left ( m,1 \right )= \begin{bmatrix}
\frac{\partial y_{1}}{\partial x}\\
\frac{\partial y_{2}}{\partial x}\\
\vdots\\
\frac{\partial y_{m}}{\partial x}
\end{bmatrix}$

  PS:向量对标量求导,形状不变,向量每一元素对标量求导。

3. 向量对向量求导 $\frac{\partial\textbf y}{\partial\textbf x}$

  $\textbf x\left ( n,1 \right )= \begin{bmatrix}
x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}$,$\textbf y\left ( m,1 \right )= \begin{bmatrix}
y_{1}\\
y_{2}\\
\vdots\\
y_{m}
\end{bmatrix}$,

  $\frac{\partial\textbf y}{\partial\textbf x}\left ( m,n \right )=
\begin{bmatrix}
\frac{\partial y_{1}}{\partial\textbf x}\\
\frac{\partial y_{2}}{\partial\textbf x}\\
\vdots\\
\frac{\partial y_{m}}{\partial\textbf x}
\end{bmatrix} =
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\
\frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{2}}{\partial x_{n}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y_{m}}{\partial x_{1}} & \frac{\partial y_{m}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{bmatrix}$

  PS:向量对向量求导,形状为矩阵,可以理解为一列标量分别对向量求导。

4. 标量对矩阵求导 $\frac{\partial y}{\partial\textbf X}$

  $\textbf X\left ( n,k \right )=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1k}\\
x_{21} & x_{22} & \cdots & x_{2k}\\
\vdots & \vdots & \ddots & \vdots\\
x_{n1} & x_{n2} & \cdots & x_{nk}
\end{bmatrix}$,y为标量,

  $\frac{\partial y}{\partial\textbf X}\left ( k,n \right )=
\begin{bmatrix}
\frac{\partial y}{\partial\textbf x_{:,1}} & \frac{\partial y}{\partial\textbf x_{:,2}} & \cdots & \frac{\partial y}{\partial\textbf x_{:,k}}
\end{bmatrix}=
\begin{bmatrix}
\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{n1}}\\
\frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{n2}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y}{\partial x_{1k}} & \frac{\partial y}{\partial x_{2k}} & \cdots & \frac{\partial y}{\partial x_{nk}}
\end{bmatrix}$

  PS:标量对矩阵求导,形状为转置的矩阵,可以理解为标量分别对k个列向量求导。

5. 矩阵对标量求导 $\frac{\partial\textbf Y}{\partial x}$

  x为标量,$\textbf Y\left ( m,l \right )=
\begin{bmatrix}
y_{11} & y_{12} & \cdots & y_{1l}\\
y_{21} & y_{22} & \cdots & y_{2l}\\
\vdots & \vdots & \ddots & \vdots\\
y_{m1} & y_{m2} & \cdots & y_{ml}
\end{bmatrix}$,

  $\frac{\partial\textbf Y}{\partial x}\left ( m,l \right )=
\begin{bmatrix}
\frac{\partial\textbf y_{:,1}}{\partial x} & \frac{\partial\textbf y_{:,2}}{\partial x} & \cdots & \frac{\partial\textbf y_{:,l}}{\partial x}
\end{bmatrix}=
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1l}}{\partial x}\\
\frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2l}}{\partial x}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{ml}}{\partial x}
\end{bmatrix}$

  PS:矩阵对标量求导,形状不变,可以理解为l个列向量分别对标量求导。

6. 向量对矩阵求导 $\frac{\partial\textbf y}{\partial\textbf X}$

  $\textbf X\left ( n,k \right )=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1k}\\
x_{21} & x_{22} & \cdots & x_{2k}\\
\vdots & \vdots & \ddots & \vdots\\
x_{n1} & x_{n2} & \cdots & x_{nk}
\end{bmatrix}$,$\textbf y\left ( m,1 \right )= \begin{bmatrix}
y_{1}\\
y_{2}\\
\vdots\\
y_{m}
\end{bmatrix}$,

  $\frac{\partial\textbf y}{\partial\textbf X}\left ( m,k,n \right )=
\begin{bmatrix}\frac{\partial y_{1}}{\partial\textbf X}\end{bmatrix},\begin{bmatrix}\frac{\partial y_{2}}{\partial\textbf X}\end{bmatrix},\cdots,\begin{bmatrix}\frac{\partial y_{m}}{\partial\textbf X}\end{bmatrix}=\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{11}} & \frac{\partial y_{1}}{\partial x_{21}} & \cdots & \frac{\partial y_{1}}{\partial x_{n1}}\\
\frac{\partial y_{1}}{\partial x_{12}} & \frac{\partial y_{1}}{\partial x_{22}} & \cdots & \frac{\partial y_{1}}{\partial x_{n2}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y_{1}}{\partial x_{1k}} & \frac{\partial y_{1}}{\partial x_{2k}} & \cdots & \frac{\partial y_{1}}{\partial x_{nk}}
\end{bmatrix},\begin{bmatrix}
\frac{\partial y_{2}}{\partial x_{11}} & \frac{\partial y_{2}}{\partial x_{21}} & \cdots & \frac{\partial y_{2}}{\partial x_{n1}}\\
\frac{\partial y_{2}}{\partial x_{12}} & \frac{\partial y_{2}}{\partial x_{22}} & \cdots & \frac{\partial y_{2}}{\partial x_{n2}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y_{2}}{\partial x_{1k}} & \frac{\partial y_{2}}{\partial x_{2k}} & \cdots & \frac{\partial y_{2}}{\partial x_{nk}}
\end{bmatrix},\cdots,\begin{bmatrix}
\frac{\partial y_{m}}{\partial x_{11}} & \frac{\partial y_{m}}{\partial x_{21}} & \cdots & \frac{\partial y_{m}}{\partial x_{n1}}\\
\frac{\partial y_{m}}{\partial x_{12}} & \frac{\partial y_{m}}{\partial x_{22}} & \cdots & \frac{\partial y_{m}}{\partial x_{n2}}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y_{m}}{\partial x_{1k}} & \frac{\partial y_{m}}{\partial x_{2k}} & \cdots & \frac{\partial y_{m}}{\partial x_{nk}}
\end{bmatrix}$

  PS:向量对矩阵求导,形状为3维数组,可以理解为y的每个元素(标量)分别对矩阵求导,结果为m个k*n矩阵的组合。

7. 矩阵对向量求导 $\frac{\partial\textbf Y}{\partial\textbf x}$

  $\textbf x\left ( n,1 \right )=\begin{bmatrix}
x_{1}\\
x_{2}\\
\vdots\\
x_{n}
\end{bmatrix}$,$\textbf Y\left ( m,l \right )=
\begin{bmatrix}
y_{11} & y_{12} & \cdots & y_{1l}\\
y_{21} & y_{22} & \cdots & y_{2l}\\
\vdots & \vdots & \ddots & \vdots\\
y_{m1} & y_{m2} & \cdots & y_{ml}
\end{bmatrix}$,

  $\frac{\partial\textbf Y}{\partial\textbf x}\left ( m,l,n \right )=\begin{bmatrix}
\frac{\partial y_{11}}{\partial\textbf x} & \frac{\partial y_{12}}{\partial\textbf x} & \cdots & \frac{\partial y_{1l}}{\partial\textbf x}\\
\frac{\partial y_{21}}{\partial\textbf x} & \frac{\partial y_{22}}{\partial\textbf x} & \cdots & \frac{\partial y_{2l}}{\partial\textbf x}\\
\vdots & \vdots & \ddots & \vdots\\
\frac{\partial y_{m1}}{\partial\textbf x} & \frac{\partial y_{m2}}{\partial\textbf x} & \cdots & \frac{\partial y_{ml}}{\partial\textbf x}
\end{bmatrix}$

  PS:矩阵对向量求导,形状为3维的数组,没搞懂。

8. 矩阵对矩阵求导 $\frac{\partial\textbf Y}{\partial\textbf X}$

  $\textbf X\left ( n,k \right )=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1k}\\
x_{21} & x_{22} & \cdots & x_{2k}\\
\vdots & \vdots & \ddots & \vdots\\
x_{n1} & x_{n2} & \cdots & x_{nk}
\end{bmatrix}$,$\textbf Y\left ( m,l \right )=
\begin{bmatrix}
y_{11} & y_{12} & \cdots & y_{1l}\\
y_{21} & y_{22} & \cdots & y_{2l}\\
\vdots & \vdots & \ddots & \vdots\\
y_{m1} & y_{m2} & \cdots & y_{ml}
\end{bmatrix}$,

  PS:搞懂再来写。

推荐阅读