Kategoria[10] CUDA [17] Programowanie CentOS 8 Linux

[1] CUDA – Instalacja

11 marca 2022 Wyłączono przez Adam [zicherka] Nogły

Zainstaluj platformę obliczeniową GPU (GPGPU (General-Purpose computing on Graphics Processing Units)), CUDA (Compute Unified Device Architecture) dostarczoną przez firmę NVIDIA.

Aby korzystać z CUDA, twój komputer musi mieć karty graficzne NVIDIA, które obsługują CUDA. Upewnij się o tym, na poniższej stronie (większość produktów z ostatnich kilku lat jest kompatybilna): https://developer.nvidia.com/cuda-gpus.

Dodatkowo w zależności od posiadanej karty graficznej oraz wymaganych dla niej sterowników potrzebujesz odpowiedniej wersji CUDA: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html.

CUDA Toolkit	Toolkit Driver Version
CUDA Toolkit	Linux x86_64 Driver Version	Windows x86_64 Driver Version
CUDA 11.6 Update 1	>=510.47.03	>=511.65
CUDA 11.6 GA	>=510.39.01	>=511.23
CUDA 11.5 Update 2	>=495.29.05	>=496.13
CUDA 11.5 Update 1	>=495.29.05	>=496.13
CUDA 11.5 GA	>=495.29.05	>=496.04
CUDA 11.4 Update 4	>=470.82.01	>=472.50
CUDA 11.4 Update 3	>=470.82.01	>=472.50
CUDA 11.4 Update 2	>=470.57.02	>=471.41
CUDA 11.4 Update 1	>=470.57.02	>=471.41
CUDA 11.4.0 GA	>=470.42.01	>=471.11
CUDA 11.3.1 Update 1	>=465.19.01	>=465.89
CUDA 11.3.0 GA	>=465.19.01	>=465.89
CUDA 11.2.2 Update 2	>=460.32.03	>=461.33
CUDA 11.2.1 Update 1	>=460.32.03	>=461.09
CUDA 11.2.0 GA	>=460.27.03	>=460.82
CUDA 11.1.1 Update 1	>=455.32	>=456.81
CUDA 11.1 GA	>=455.23	>=456.38
CUDA 11.0.3 Update 1	>= 450.51.06	>= 451.82
CUDA 11.0.2 GA	>= 450.51.05	>= 451.48
CUDA 11.0.1 RC	>= 450.36.06	>= 451.22
CUDA 10.2.89	>= 440.33	>= 441.22
CUDA 10.1 (10.1.105 general release, and updates)	>= 418.39	>= 418.96
CUDA 10.0.130	>= 410.48	>= 411.31
CUDA 9.2 (9.2.148 Update 1)	>= 396.37	>= 398.26
CUDA 9.2 (9.2.88)	>= 396.26	>= 397.44
CUDA 9.1 (9.1.85)	>= 390.46	>= 391.29
CUDA 9.0 (9.0.76)	>= 384.81	>= 385.54
CUDA 8.0 (8.0.61 GA2)	>= 375.26	>= 376.51
CUDA 8.0 (8.0.44)	>= 367.48	>= 369.30
CUDA 7.5 (7.5.16)	>= 352.31	>= 353.66
CUDA 7.0 (7.0.28)	>= 346.46	>= 347.62

Z tej strony można pobrać archiwalne oraz najnowszą wersję CUDA Toolkit: https://developer.nvidia.com/cuda-toolkit-archive.

Dodatkowo dochodzi obsługa GCC. W poniższej tabeli podano maksymalne wersje GCC i ich obsługę przez CUDA.

CUDA version	max supported GCC version
11.4.1+, 11.5, 11.6	11
11.1, 11.2, 11.3, 11.4.0	10
11	9
10.1, 10.2	8
9.2, 10.0	7
9.0, 9.1	6
8	5.3
7	4.9
5.5, 6	4.8
4.2, 5	4.6
4.1	4.5
4.0	4.4

[1] Zainstaluj sterownik graficzny NVIDIA dla swojej karty graficznej, patrz tutaj.

[2] Zainstaluj CUDA z oficjalnego repozytorium NVIDIA.

Przykład ten opiera się na środowisku, które ustawiłeś w oficjalnym repozytorium NVIDIA podczas instalacji sterownika w poprzedniej sekcji [1].

[root@vlsr05 ~]# dnf install cuda
[root@vlsr05 ~]# mcedit /etc/profile.d/cuda116.sh
# stwórz nowy
export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

[root@vlsr05 ~]# source /etc/profile.d/cuda1161.sh
[root@vlsr05 ~]# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Thu_Feb_10_18:23:41_PST_2022
Cuda compilation tools, release 11.6, V11.6.112
Build cuda_11.6.r11.6/compiler.30978841_0

[3] Sprawdź poprawność instalacji wykonując poniższe polecenia jako „zwykły” użytkownik.

Od wersji Cuda 11.6 przykłady nie są dostarczane razem z instalatorem i trzeba je pobrać i zainstalować osobno z GitHub. Adres: https://github.com/NVIDIA/cuda-samples.

# skopiuj przykłady
[user01@vlsr05 ~]# git clone https://github.com/NVIDIA/cuda-samples.git
[user01@vlsr05 ~]$ cd ./cuda-samples/Samples/1_Utilities/deviceQuery
# skompiluj przykład deviceQuery
[user01@vlsr05 deviceQuery]$ make
# uruchom przykład deviceQuery
[user01@vlsr05 deviceQuery]$ ./deviceQuery
./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce GTX 1050"
  CUDA Driver Version / Runtime Version          11.6 / 11.6
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 2000 MBytes (2097152000 bytes)
  (005) Multiprocessors, (128) CUDA Cores/MP:    640 CUDA Cores
  GPU Max Clock rate:                            1468 MHz (1.47 GHz)
  Memory Clock rate:                             3504 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.6, CUDA Runtime Version = 11.6, NumDevs = 1
Result = PASS

# spróbuj uruchomić przykład bandwidthTest
[user01@vlsr05 deviceQuery]$ cd ~/cuda-samples/Samples/1_Utilities/bandwidthTest/
[user01@vlsr05 bandwidthTest]$ make
[user01@vlsr05 bandwidthTest]$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
 Device 0: NVIDIA GeForce GTX 1050
 Quick Mode
 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     12.7
 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     13.2
 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     98.4
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

[1] OpenStack XENA – Wstęp

[2] NVIDIA HPC SDK