反向传播(Backpropagation)是深度学习的核心算法,它使得训练深度神经网络成为可能。本文将深入讲解反向传播的数学原理。
神经网络基础
一个简单的神经网络可以表示为:
- 输入层:$x = (x_1, x_2, ..., x_n)$
- 隐藏层:$h = \sigma(W^{(1)} x + b^{(1)})$
- 输出层:$y = W^{(2)} h + b^{(2)}$
其中 $\sigma$ 是激活函数。
前向传播
前向传播计算过程:
$$z^{(1)} = W^{(1)} x + b^{(1)}$$
$$a^{(1)} = \sigma(z^{(1)})$$
$$z^{(2)} = W^{(2)} a^{(1)} + b^{(2)}$$
$$y = \sigma(z^{(2)})$$
损失函数
对于分类问题,常用交叉熵损失:
$$L = -\sum_c y_c \log(\hat{y}_c)$$
对于回归问题,使用均方误差:
$$L = \frac{1}{2}(y - \hat{y})^2$$
反向传播
反向传播使用链式法则计算梯度。假设我们要计算 $\frac{\partial L}{\partial W^{(1)}}$:
输出层梯度
$$\delta^{(2)} = \frac{\partial L}{\partial z^{(2)}} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z^{(2)}}$$
对于 MSE 损失:
$$\delta^{(2)} = (\hat{y} - y) \cdot \sigma'(z^{(2)})$$
隐藏层梯度
$$\delta^{(1)} = (W^{(2)})^T \delta^{(2)} \cdot \sigma'(z^{(1)})$$
权重梯度
$$\frac{\partial L}{\partial W^{(2)}} = \delta^{(2)} (a^{(1)})^T$$
$$\frac{\partial L}{\partial W^{(1)}} = \delta^{(1)} x^T$$
代码实现
import numpy as np
class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.W1 = np.random.randn(input_size, hidden_size) 0.01
self.b1 = np.zeros((1, hidden_size))
self.W2 = np.random.randn(hidden_size, output_size) 0.01
self.b2 = np.zeros((1, output_size))
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
return x (1 - x)
def forward(self, X):
self.z1 = X @ self.W1 + self.b1
self.a1 = self.sigmoid(self.z1)
self.z2 = self.a1 @ self.W2 + self.b2
self.a2 = self.sigmoid(self.z2)
return self.a2
def backward(self, X, y, learning_rate=0.1):
m = X.shape[0]
# 输出层误差
delta2 = (self.a2 - y) self.sigmoid_derivative(self.a2)
# 隐藏层误差
delta1 = (delta2 @ self.W2.T) self.sigmoid_derivative(self.a1)
# 更新权重
self.W2 -= learning_rate (self.a1.T @ delta2) / m
self.b2 -= learning_rate np.sum(delta2, axis=0, keepdims=True) / m
self.W1 -= learning_rate (X.T @ delta1) / m
self.b1 -= learning_rate * np.sum(delta1, axis=0, keepdims=True) / m
def train(self, X, y, epochs=10000, learning_rate=0.1):
for i in range(epochs):
output = self.forward(X)
self.backward(X, y, learning_rate)
if i % 1000 == 0:
loss = np.mean((output - y) 2)
print(f"Epoch {i}, Loss: {loss:.4f}")
示例:XOR 问题
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork(2, 4, 1)
nn.train(X, y)
print("预测结果:", nn.forward(X).round(2))
梯度下降变体
在实际应用中,常用的优化器包括:
$$v_t = \beta_1 v_{t-1} + (1 - \beta_1) g_t$$
$$m_t = \beta_2 m_{t-1} + (1 - \beta_2) g_t^2$$
总结
反向传播的核心思想:
- 利用链式法则高效计算梯度
- 从后向前传播误差
- 梯度下降更新参数