第一次安装CUDA的过程简直抓狂,中间出现了很多次莫名其妙的bug,踩了很多坑。比如装好了CUDA重启后进不去桌面系统了,直接黑屏、比如鼠标键盘都不work了、再比如装好了却安装不了TensorFlow-GPU……看了一圈网上的安装教程,发现还是官方指南真香了~

新年第一篇,分享一下我的Ubuntu 18.04 + CUDA 10.0 + cuDNN 7.6.5 + TensorFlow 2.0 安装笔记,希望可以帮助大家少踩坑。

整个安装流程大致是:安装显卡驱动 -> 安装CUDA -> 安装cuDNN -> 安装tensorflow-gpu并测试。

1. Ubuntu安装和更新

全新的ubuntu18.04系统,先进行一些基本的安装和更新。具体的系统安装过程省略。

sudo apt-get update # 更新源
sudo apt-get upgrade # 更新已安装的包
sudo apt-get install vim

2. 安装显卡驱动

2.1 禁用Nouveau驱动

注意:使用runfile安装需要手动禁用系统自带的Nouveau驱动

lsmod | grep nouveau # 要确保这条命令无输出
vim /etc/modprobe.d/blacklist-nouveau.conf

# 添加下面两行:
#######################################################
blacklist nouveau
options nouveau modeset=0
#######################################################

# 保存后重启:
sudo update-initramfs -u
sudo reboot

# 再次输入以下命令,无输出就表示设置成功了
lsmod | grep nouveau

2.2 安装合适的显卡驱动

# 先清空现有的显卡驱动及依赖并重启
sudo apt-get remove --purge nvidia* 
sudo apt autoremove                 
sudo reboot                         
# 添加ppa源并安装最新的驱动
sudo add-apt-repository ppa:graphics-drivers/ppa 
sudo apt update
ubuntu-drivers devices                          
sudo apt install nvidia-driver-440
# 为了防止自动更新驱动导致的兼容性问题,我们还可以锁定驱动版本:
sudo apt-mark hold nvidia-driver-440 
# nvidia-driver-440 set on hold.

并在【软件和更新】菜单中的附加驱动列表中,找到刚刚安装的nvidia-driver-440,选定即可。

输入sudo reboot重启后输入nvidia-smi,显示下图信息,这样表示显卡驱动已经ready:

image-20191230144403606

lsmod | grep nvidia # 看到下面的输出则为安装成功,如果无输出,表示有问题

image-20191231111357213

也可以手动去官网下载对应的安装程序安装显卡

# 动态监测显卡
watch -n 1 nvidia-smi # 1表示每1秒刷新一次
watch -n 0.01 nvidia-smi # 也可改成0.01s刷新一次
# 也可以用gpustat
pip install gpustat
gpustat -i 1 -P

3. 安装CUDA

百度百科:CUDA(Compute Unified Device Architecture),是显卡厂商NVIDIA推出的运算平台。 CUDA是一种由NVIDIA推出的通用并行计算架构,该架构使GPU能够解决复杂的计算问题。

Linux系统下有两种方案安装CUDA:一种是Package Manager Installation(.deb),另一种是Runfile Installation(.run)。本文采取的是第一种(也是官方推荐的方式)。

CUDA对于系统环境有严格的依赖,比如对于CUDA10.0有如下的要求。其他的版本可查看对应的Online Documentation

3.1 安装前的准备

在安装CUDA之前需要先确定环境是ready的,以免出现乱七八糟的bug无从下手。直接引用官网的说明:

Some actions must be taken before the CUDA Toolkit and Driver can be installed on Linux:

  • Verify the system has a CUDA-capable GPU.
  • Verify the system is running a supported version of Linux.
  • Verify the system has gcc installed.
  • Verify the system has the correct kernel headers and development packages installed.
  • Download the NVIDIA CUDA Toolkit.
  • Handle conflicting installation methods.
3.1.1 确认你有支持CUDA的GPU
lspci | grep -i nvidia | grep VGA
3.1.2 确认你的linux版本
uname -m && cat /etc/*release
uname -a
# The x86_64 line indicates you are running on a 64-bit system.
3.1.3 确认gcc版本
gcc --version
# gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
3.1.4 安装对应内核版本的头文件

查看kernel的版本:

uname -r
# 5.0.0-37-generic

This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers.

安装对应内核版本的头文件:

sudo apt-get install linux-headers-$(uname -r)
3.1.5 选择安装方式

下载对应的安装包(以官方推荐的Deb packages安装方式为例)

The CUDA Toolkit can be installed using either of two different installation mechanisms: distribution-specific packages (RPM and Deb packages), or a distribution-independent package (runfile packages). The distribution-independent package has the advantage of working across a wider set of Linux distributions, but does not update the distribution’s native package management system. The distribution-specific packages interface with the distribution’s native package management system. It is recommended to use the distribution-specific packages, where possible.

image-20191230164607833

image-20191230164654335

3.1.6 彻底卸载之前安装过的相关应用,避免冲突

如果是全新的ubuntu,可忽略此部分,执行2.2部分即可。

image-20191230180216163

如果ubuntu下用RPM/Deb安装的:

sudo apt-get --purge remove <package_name> 
sudo apt autoremove

如果是runfile安装的:

sudo /usr/bin/nvidia-uninstall
sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl

3.2 安装

首先确保已经下载好对应的.deb文件,然后执行:

sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub # 根据执行完第一步的提示输入,比如我是:
# sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-toolkit-10-0 # 注意不是cuda,因为在第二步中装过驱动了,此过程安装cuda-toolkit-10-0即可

image-20191230182544813

3.3 安装后

安装之后需要手动进行一些设置才能使CUDA正常的工作。

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
nvcc -V # 检查CUDA是否安装成功

# OUTPUT:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

最好关闭系统的自动更新,防止安装好的环境突然bug:

sudo vi /etc/apt/apt.conf.d/10periodic

# 修改为:
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";

也可以通过桌面设置:System Settings => Software&Updates => updates

4. 安装cuDNN

NVIDIA cuDNN是用于深度神经网络的GPU加速库。

首先需要注册下载对应CUDA版本号的cuDNN安装包,链接

比如对应CUDA10.0,我下载的是:tar -zxvf cudnn-10.0-linux-x64-v7.6.5.32.tgz

tar -zxvf cudnn-10.0-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

验证是否安装成功:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

# 输出
"""
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
"""

更推荐使用Debian File去安装,因为可以通过里面的样例去验证cuDNN是否成功安装。首先下载下面三个文件:

# 分别下载
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.0_amd64.deb

# 安装完验证:
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN
# Test passed!

另外也可以用conda来安装cudatoolkit和cuDNN,但要保证驱动是ready的。不过我没有试验过。

conda install cudatoolkit=10.0
conda install -c anaconda cudnn

5. 安装TensorFlow2.0 GPU

# 安装conda
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

# conda添加国内源:
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --set show_channel_urls yes

conda create -y -n tf2 python=3.7
conda activate tf2
# source activate tf2
pip install --upgrade pip
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install tensorflow-gpu
pip install catboost


# 或者:
conda create -y -n tf_2.1 python=3.7 tensorflow-gpu==2.1.0
conda create -y -n tf_2.0 python=3.7 tensorflow-gpu==2.0.0

# 或者 TF 2.2 这么安装也OK
conda create -y -n TF2.2 python=3.8
conda activate TF2.2
pip install --upgrade pip
pip install tensorflow-gpu==2.2.0
conda install cudatoolkit=10.1 cudnn=7.6.5

测试:

import tensorflow as tf
print(tf.__version__)
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
"""
2.0.0
Num GPUs Available:  2
"""
"""
测试程序:
源链接:https://github.com/dragen1860/TensorFlow-2.x-Tutorials/blob/master/08-ResNet/main.py
"""
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" 
import tensorflow as tf
import numpy as np
from tensorflow import keras

tf.random.set_seed(22)
np.random.seed(22)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
assert tf.__version__.startswith('2.')

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(
    np.float32) / 255.
# [b, 28, 28] => [b, 28, 28, 1]
x_train, x_test = np.expand_dims(x_train, axis=3), np.expand_dims(x_test,
                                                                  axis=3)
# one hot encode the labels. convert back to numpy as we cannot use a combination of numpy
# and tensors as input to keras
y_train_ohe = tf.one_hot(y_train, depth=10).numpy()
y_test_ohe = tf.one_hot(y_test, depth=10).numpy()

print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

# 3x3 convolution
def conv3x3(channels, stride=1, kernel=(3, 3)):
    return keras.layers.Conv2D(
        channels,
        kernel,
        strides=stride,
        padding='same',
        use_bias=False,
        kernel_initializer=tf.random_normal_initializer())

class ResnetBlock(keras.Model):
    def __init__(self, channels, strides=1, residual_path=False):
        super(ResnetBlock, self).__init__()
        self.channels = channels
        self.strides = strides
        self.residual_path = residual_path
        self.conv1 = conv3x3(channels, strides)
        self.bn1 = keras.layers.BatchNormalization()
        self.conv2 = conv3x3(channels)
        self.bn2 = keras.layers.BatchNormalization()
        if residual_path:
            self.down_conv = conv3x3(channels, strides, kernel=(1, 1))
            self.down_bn = tf.keras.layers.BatchNormalization()
            
    def call(self, inputs, training=None):
        residual = inputs
        x = self.bn1(inputs, training=training)
        x = tf.nn.relu(x)
        x = self.conv1(x)
        x = self.bn2(x, training=training)
        x = tf.nn.relu(x)
        x = self.conv2(x)
        # this module can be added into self.
        # however, module in for can not be added.
        if self.residual_path:
            residual = self.down_bn(inputs, training=training)
            residual = tf.nn.relu(residual)
            residual = self.down_conv(residual)
        x = x + residual
        return x

class ResNet(keras.Model):
    def __init__(self, block_list, num_classes, initial_filters=16, **kwargs):
        super(ResNet, self).__init__(**kwargs)
        self.num_blocks = len(block_list)
        self.block_list = block_list
        self.in_channels = initial_filters
        self.out_channels = initial_filters
        self.conv_initial = conv3x3(self.out_channels)
        self.blocks = keras.models.Sequential(name='dynamic-blocks')
        # build all the blocks
        for block_id in range(len(block_list)):
            for layer_id in range(block_list[block_id]):

                if block_id != 0 and layer_id == 0:
                    block = ResnetBlock(self.out_channels,
                                        strides=2,
                                        residual_path=True)
                else:
                    if self.in_channels != self.out_channels:
                        residual_path = True
                    else:
                        residual_path = False
                    block = ResnetBlock(self.out_channels,
                                        residual_path=residual_path)
                self.in_channels = self.out_channels
                self.blocks.add(block)
            self.out_channels *= 2
        self.final_bn = keras.layers.BatchNormalization()
        self.avg_pool = keras.layers.GlobalAveragePooling2D()
        self.fc = keras.layers.Dense(num_classes)

    def call(self, inputs, training=None):
        out = self.conv_initial(inputs)
        out = self.blocks(out, training=training)
        out = self.final_bn(out, training=training)
        out = tf.nn.relu(out)
        out = self.avg_pool(out)
        out = self.fc(out)
        return out

def main():
    num_classes = 10
    batch_size = 128
    epochs = 2
    # build model and optimizer
    model = ResNet([2, 2, 2], num_classes)
    model.compile(optimizer=keras.optimizers.Adam(0.001),
                  loss=keras.losses.CategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    model.build(input_shape=(None, 28, 28, 1))
    print("Number of variables in the model :", len(model.variables))
    model.summary()
    # train
    model.fit(x_train,
              y_train_ohe,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test_ohe),
              verbose=1)

    # evaluate on test set
    scores = model.evaluate(x_test, y_test_ohe, batch_size, verbose=1)
    print("Final test loss and accuracy :", scores)

if __name__ == '__main__':
    main()

监测GPU使用:

watch -n 0.01 nvidia-smi

image-20191231145755896

测试catboost使用CPU:

from catboost.datasets import titanic
import numpy as np
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier, Pool, cv
from sklearn.metrics import accuracy_score

train_df, test_df = titanic()
null_value_stats = train_df.isnull().sum(axis=0)
null_value_stats[null_value_stats != 0]

train_df.fillna(-999, inplace=True)
test_df.fillna(-999, inplace=True)

X = train_df.drop('Survived', axis=1)
y = train_df.Survived

X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.75, random_state=42)
X_test = test_df

categorical_features_indices = np.where(X.dtypes != np.float)[0]

model = CatBoostClassifier(
    task_type="GPU",
    custom_metric=['Accuracy'],
    random_seed=666,
    logging_level='Silent'
)

model.fit(
    X_train, y_train,
    cat_features=categorical_features_indices,
    eval_set=(X_validation, y_validation),
    logging_level='Verbose',  # you can comment this for no text output
    plot=True
);

监测GPU使用:

watch -n 0.01 nvidia-smi

6. 一次成功的NVIDIA显卡修复

输入nvdia-smi报错:

nvidia-smi has failed because it couldn’t communicate with the nvidia driver. make sure that the latest nvidia driver is installed and running.

# detect the model of your nvidia graphic card and the recommended driver.
sudo ubuntu-drivers devices

image-20200503020955397

sudo apt install nvidia-driver-440
# 或者:sudo ubuntu-drivers autoinstall
sudo rmmod nvidia_uvm # 不一定需要
sudo modprobe nvidia_uvm
nvidia-smi

# 关于modprobe和lsmod
# We use the lsmod command to list currently loaded drivers. This command actually obtains its data from the /proc/modules file.
lsmod | grep nvidia

REFERENCE

官方-NVIDIA CUDA Installation Guide for Linux

CUDA_Quick_Start_Guide-pdf

CUDA_Installation_Guide_Linux-pdf

官方-cuDNN安装

[How To] Install Latest NVIDIA Drivers In Linux