PyTorch 是由 Facebook 主导开发的深度学习框架,因其高效的计算过程以及良好的易用性被诸多大公司和科研人员所喜爱。

安装

参考官方说明,推荐用conda来管理环境,注意环境隔离。以mac为例:

conda create -n pytorch_env # 新建环境 pytorch_env
conda info -e # 查看现有环境
conda activate pytorch_env # 激活 pytorch_env环境
conda install pytorch torchvision -c pytorch # 安装pytorch

Tensors

PyTorch中的Tensors(张量)与 NumPy 中的 ndarray( 多维数组)类似,但PyTorch 中 Tensors 支持GPU 计算。

常用方法:

import torch

torch.empty(6, 3) # 未初始化数据的张量,shape=[6,3]
"""
tensor([[-7.6027e+33,  4.5916e-41, -7.6027e+33],
        [ 4.5916e-41,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00]])
"""

torch.rand(6, 3) # 随机张量,分布在0-1间
"""
tensor([[0.5559, 0.6020, 0.7345],
        [0.0226, 0.1468, 0.5493],
        [0.0953, 0.0787, 0.1556],
        [0.7109, 0.9057, 0.1468],
        [0.2171, 0.2595, 0.1807],
        [0.2468, 0.2483, 0.9191]])
"""
torch.randn(6, 3) # 正态分布的随机张量
"""
tensor([[ 0.2550, -0.2483, -1.0960],
        [-0.3968,  0.6721, -1.3530],
        [ 0.1528, -1.3270, -0.1585],
        [-1.0298,  0.8645, -1.0621],
        [-0.5864,  0.7020, -1.0625],
        [ 0.3827,  0.7369,  0.7417]])
"""

torch.zeros(6, 3, dtype=torch.long) # shape=[6,3],全为0的张量

"""
tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
"""
torch.ones(6, 3, dtype=torch.float) # shape=[6,3],全为1的张量
"""
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])
"""
x = torch.tensor([[6, 666],[666,6]])
x
"""
tensor([[  6, 666],
        [666,   6]])
"""
# 根据现有张量创建新张量。这些方法将重用输入张量的属性,如dtype,除非重新定义进行覆盖。
x = x.new_ones(6, 3, dtype=torch.double)  # new_* 方法来创建对象
x
"""
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
"""
x = torch.randn_like(x, dtype=torch.float)
x
"""
tensor([[-0.7134,  0.5978, -0.7040],
        [-2.5056, -1.1300, -0.9594],
        [ 1.6301, -1.1263, -0.5724],
        [ 0.6882,  0.2266, -1.0729],
        [-1.0688,  0.7179,  0.9303],
        [ 2.5750, -0.4685,  0.0134]])
"""
x.size() # 获取张量的size,返回类型为torch.Size,本质为tuple
"""
torch.Size([6, 3])
"""

操作

针对 Tensor 的操作语法很多。

1. 加减乘除运算操作,以加法为例:

第一种:

y = torch.rand(6,3)
y
"""
tensor([[0.4862, 0.8127, 0.0867],
        [0.1323, 0.7562, 0.8797],
        [0.8187, 0.7725, 0.2687],
        [0.1116, 0.6149, 0.9046],
        [0.9227, 0.2562, 0.0596],
        [0.2610, 0.2048, 0.6576]])
"""
x = torch.ones(6,3)
x + y
"""
tensor([[1.4862, 1.8127, 1.0867],
        [1.1323, 1.7562, 1.8797],
        [1.8187, 1.7725, 1.2687],
        [1.1116, 1.6149, 1.9046],
        [1.9227, 1.2562, 1.0596],
        [1.2610, 1.2048, 1.6576]])
"""

第二种:

torch.add(x, y)
"""
tensor([[1.4862, 1.8127, 1.0867],
        [1.1323, 1.7562, 1.8797],
        [1.8187, 1.7725, 1.2687],
        [1.1116, 1.6149, 1.9046],
        [1.9227, 1.2562, 1.0596],
        [1.2610, 1.2048, 1.6576]])
"""

第三种:提供输出tensor作为参数

result = torch.empty(6, 3)
torch.add(x, y, out=result)
result
"""
tensor([[1.4862, 1.8127, 1.0867],
        [1.1323, 1.7562, 1.8797],
        [1.8187, 1.7725, 1.2687],
        [1.1116, 1.6149, 1.9046],
        [1.9227, 1.2562, 1.0596],
        [1.2610, 1.2048, 1.6576]])
"""

第四种:替换。任何以下划线结尾的操作都会用结果替换原变量。例如:x.add_(y), x.copy_(y), x.t_(), 都会改变 x

y.add_(x)  # 将 x 加到 y
y
"""
tensor([[1.4862, 1.8127, 1.0867],
        [1.1323, 1.7562, 1.8797],
        [1.8187, 1.7725, 1.2687],
        [1.1116, 1.6149, 1.9046],
        [1.9227, 1.2562, 1.0596],
        [1.2610, 1.2048, 1.6576]])
"""

2. 索引及改变张量维度和大小

tensor指出numpy类似的索引操作:

x[:, 1]
"""
tensor([1., 1., 1., 1., 1., 1.])
"""

torch.view 可以改变张量的维度和大小:

x = torch.randn(6, 6)
y = x.view(36)
z = x.view(-1, 9)  # size -1 从其他维度推断
x.size(), y.size(), z.size()
"""
(torch.Size([6, 6]), torch.Size([36]), torch.Size([4, 9]))
"""

若张量只有一个元素,可用 .item() 来得到 Python 数据类型的标量:

x = torch.randn(1)
x, x.item()
"""
(tensor([-1.1411]), -1.1410937309265137)
"""

tensor的更多操作可查阅官方文档

3. 和Numpy的对接

PyTorch 张量和 NumPy 数组之间的互相转换是一件轻而易举的事。PyTorch 张量和 NumPy 数组将共享其底层的内存位置(CPU上),这意味着改变一个也将改变另一个。除了CharTensor,所有基于CPU的Tensors都支持和NumPy ndarray的来回转换。

# PyTorch tensor => NumPy ndarray
a = torch.ones(5)
a
"""
tensor([1., 1., 1., 1., 1.])
"""
b = a.numpy()
b
"""
array([1., 1., 1., 1., 1.], dtype=float32)
"""
a.add_(1)
a, b
"""
(tensor([2., 2., 2., 2., 2.]), array([2., 2., 2., 2., 2.], dtype=float32))
"""
# ========================================================= #
# NumPy ndarray => PyTorch tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
a, b
"""
(array([2., 2., 2., 2., 2.]),
 tensor([2., 2., 2., 2., 2.], dtype=torch.float64))
"""

4. CUDA Tensors

# let us run this cell only if CUDA is available
# We will use `torch.device` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          
    # a CUDA device object
    y = torch.ones_like(x, device=device)  
    # directly create a tensor on GPU
    x = x.to(device) 
    # or just use strings `.to("cuda")`
    z = x + y
    print(z)
    print(z.to("cpu", torch.double)) 
    # `.to` can also change dtype together!

# 如果有 GPU 设备,输出示例如下:
"""
tensor([1.4566], device='cuda:0')
tensor([1.4566], dtype=torch.float64)
"""

Autograd 自动求导

PyTorch 中所有神经网络的核心是 autogradautograd为张量上的所有操作提供了自动求导。它是一个在运行时定义的框架,这意味着反向传播是根据你的代码来确定如何运行。torch.Tensor 是这个包的核心类。如果设置 .requires_gradTrue,那么将会追踪所有对于该张量的操作。当完成计算后通过调用 .backward() 会自动计算所有的梯度,这个张量的所有梯度将会自动累积到 .grad 属性,这也就完成了自动求导的过程。

.requires_grad_( ... ) 可以改变现有张量的 requires_grad 属性。 如果没有指定的话,默认输入的 flag 是 False

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)
"""
False
True
<SumBackward0 object at 0x7fffc2c6eba8>
"""

举个完整的例子:

x = torch.ones(2, 2, requires_grad=True)
x
"""
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
"""
y = x + 2
y
"""
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
"""
y.grad_fn
"""
<MulBackward0 at 0x7fffc2c78b00>
"""
z = y * y * 3
out = z.mean()
z, out
"""
(tensor([ 363054.4375, 1201249.5000, 2383055.0000], grad_fn=<MulBackward0>),
 tensor(1315786.3750, grad_fn=<MeanBackward1>))
"""

上面只是完成了梯度的自动追踪,下面通过反向传播打印对应节点的梯度。因为 out 是一个纯量 Scalar,out.backward() 等于 out.backward(torch.tensor(1))

out.backward()
x.grad
"""
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])
"""

具体过程:

$$out = \frac{1}{4}\sum_i z_i$$ $$z_i = 3(x_i+2)^2$$ $$zi\bigr\rvert{x_i=1} = 27$$

则:

$$\frac{\partial out}{\partial x_i} = \frac{3}{2}(x_i+2)$$ $$\frac{\partial out}{\partial xi}\bigr\rvert{x_i=1} = \frac{9}{2} = 4.5$$

x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)
x.grad

"""
tensor([-169.3331,  694.2395, 1232.7043], grad_fn=<MulBackward0>)
tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])
"""

如果 .requires_grad=True 但是你又不希望进行 Autograd 的计算,那么可以将变量包裹在 with torch.no_grad() 中:

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

"""
True
True
False
"""

神经网络实例

神经网络的典型训练过程如下:

  1. 定义包含可学习参数(权重)的神经网络模型。
  2. 在输入数据集上迭代。
  3. 通过神经网络处理输入。
  4. 计算loss(输出结果和正确值的差值大小)。
  5. 将梯度反向传播
  6. 更新网络的参数,一般可使用梯度下降等最优化方法。weight = weight - learning_rate *gradient

LeNet进行手写数字识别为例:

image-20190708011313694

(1)首先定义网络结构:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

"""
Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
"""

(2)处理输入,调用 backword

模型中必须要定义 forward 函数,根据 forward 函数,backward 函数(用于计算梯度)会被 autograd 自动创建。可以在 forward 函数中使用任何针对 Tensor 的操作。net.parameters() 返回模型学习到的参数(权重)。

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight
"""
10
torch.Size([6, 1, 3, 3])
"""

测试随机输入 32×32。注意,网络(LeNet)期望的输入大小是 32×32,如果使用 MNIST 数据集(28×28)来训练这个网络,需先把图片大小重新调整到 32×32。

input = torch.randn(1, 1, 32, 32)
print(input)
out = net(input)
out

"""
tensor([[[[ 1.9687,  1.0455,  0.7801,  ...,  1.3936,  0.6338, -0.0804],
          [-0.5434,  0.3152,  2.0965,  ...,  0.3341,  0.2929, -1.0626],
          [ 0.2217,  0.2723,  1.1797,  ...,  0.3956, -1.6249, -1.5242],
          ...,
          [-0.0432, -0.2831, -1.2752,  ..., -1.0822,  1.1868, -1.6513],
          [ 0.6551,  0.5037,  0.9120,  ..., -0.3927, -2.5998,  1.5201],
          [-1.7560,  0.8368,  0.0114,  ..., -0.3840, -0.2012,  1.4936]]]])
tensor([[ 0.0194, -0.0680, -0.1112, -0.0043,  0.0765,  0.0658,  0.0866, -0.1482,
         -0.0663, -0.0418]], grad_fn=<AddmmBackward>)
"""

# 将所有参数的梯度缓存清零,然后进行随机梯度的的反向传播:
net.zero_grad()
out.backward(torch.randn(1, 10))

image-20190708012756592

torch.nn 只支持小批量输入。整个 torch.nn 包都只支持小批量样本,而不支持单个样本。例如,nn.Conv2d 接受一个 4 维的张量,每一维分别是 sSamples x nChannels x Height x Width(样本数 x 通道数 x 高 x 宽)。如果是单个样本,需使用 input.unsqueeze(0) 来添加其它的维数解决问题。

Recap:

(3)计算LOSS

Loss Function可以根据(output, target)输入对评估预测与真实的差距。torch.nn 中有很多不同的损失函数

nn.MSELoss()为例:

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

# tensor(0.7627, grad_fn=<MseLossBackward>)

计算图:

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

当我们调用 loss.backward() 时,会针对整个图执行微分操作。图中所有具有 requires_grad=True 的张量的 .grad 梯度会被累积起来。

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU
"""
<MseLossBackward object at 0x7fffc2c1b588>
<AddmmBackward object at 0x7fffc2c1b1d0>
<AccumulateGrad object at 0x7fffc2c1b588>
"""

(4)反向传播

调用 loss.backward() 获得反向传播的误差。但是在调用前需要清除已存在的梯度,否则梯度将被累加到已存在的梯度。

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

"""
conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0041, -0.0055,  0.0028,  0.0037, -0.0049, -0.0003])
"""

(5)更新参数(权重)

The simplest update rule used in practice is the Stochastic Gradient Descent (SGD):

weight = weight - learning_rate * gradient

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

如果想用 SGD, Nesterov-SGD, Adam, RMSProp等其他优化方式,可以用torch.optim中的相应方法。

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   
# zero the gradient buffers
# Observe how gradient buffers had to be manually set to zero using optimizer.zero_grad(). This is because gradients are accumulated as explained in Backprop section.

output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

训练分类器

当我们处理文本,图像,音频,视频等任务时,我们可以用python的标准库把数据读成numpy的ndarray然后再转换成PyTorch中的tensor。

对于图像任务,PyTorch 提供了专门的包 torchvision,它包含了一些常用图像数据集的导入以及处理一些基本图像数据集的方法。下面以 CIFAR10 数据集(该数据集有如下 10 个类别:airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck。CIFAR-10 的图像都是 3×32×32 ,即 3 个颜色通道,32×32 像素。)为例用PyTorch完成分类器训练。

训练一个图像分类器,基本流程如下:

  1. 使用 torchvision 加载和归一化 CIFAR10 训练集和测试集。
  2. 定义一个卷积神经网络。
  3. 定义损失函数。
  4. 在训练集上训练网络。
  5. 在测试集上测试网络。
!wget -nc "https://labfile.oss.aliyuncs.com/courses/1348/cifar-10-python.tar.gz" -P ./data/
import torchvision
import torchvision.transforms as transforms

# 图像预处理步骤
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# 训练数据加载器
trainset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(
    trainset, batch_size=4, shuffle=True, num_workers=2)
# 测试数据加载器
testset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(
    testset, batch_size=4, shuffle=False, num_workers=2)
# 图像类别
classes = ('plane', 'car', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck')

trainloader, testloader
"""
Using downloaded and verified file: ./data/cifar-10-python.tar.gz
Files already downloaded and verified
(<torch.utils.data.dataloader.DataLoader at 0x7fffc2c3aef0>,
 <torch.utils.data.dataloader.DataLoader at 0x7fffc0449f60>)
"""

可视化其中的一些训练图像:

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

image-20190708020206068

复制上一节中定义神经网络代码,并修改输入为 3 通道图像。

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
net

"""
Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
"""

我们使用交叉熵作为损失函数,使用带动量的随机梯度下降完成参数优化。

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
optimizer
"""
SGD (
Parameter Group 0
    dampening: 0
    lr: 0.001
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)
"""

训练网络:

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

在测试集上测试网络:

先看看测试集上图片的内容:

dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

再看看神经网络的预测:

outputs = net(images)
outputs

取预测的类别:

_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

在整体数据集上的测试结果:

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

查看每一类的预测准确性:

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

Reference