[4] TensorFlow@Docker + GPU
13 marca 2022Zainstaluj TensorFlow, czyli bibliotekę uczenia maszynowego.
W tym przykładzie zainstaluj oficjalny obraz platformy Docker TensorFlow z obsługą GPU i uruchom go w kontenerze.
[1] Zainstaluj NVIDIA Container Toolkit, patrz tutaj.
[2] Zainstaluj i używaj TensorFlow Docker (GPU) na koncie użytkownika root.
Jeśli chcesz uruchomić go przez zwykłych użytkowników, zapoznaj się z sekcją [4].
# ściągnij obraz kontenera TensorFlow 2.4 [root@vlsr05 ~]# podman pull tensorflow/tensorflow:2.4.1-gpu [root@vlsr05 ~]# podman images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/nvidia/cuda 11.4.0-base 13cf6a46b953 8 months ago 129 MB docker.io/tensorflow/tensorflow 2.4.1-gpu edb49f6a133b 13 months ago 5.55 GB # sprawdź poprawność działania i uruchom [nvidia-smi] [root@vlsr05 ~]# podman run -e NVIDIA_VISIBLE_DEVICES=all --rm tensorflow:2.4.1-gpu nvidia-smi Sat Mar 12 16:29:33 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 36% 37C P0 N/A / 75W | 0MiB / 2048MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ # sprawdź poprawność działania i uruchom TensorFlow [root@vlsr05 ~]# podman run -e NVIDIA_VISIBLE_DEVICES=all --rm tensorflow:2.4.1-gpu python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))" 2022-03-12 16:40:30.011147: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 16:40:31.249456: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-03-12 16:40:31.250170: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2022-03-12 16:40:31.494350: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.494502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1050 computeCapability: 6.1 coreClock: 1.468GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 104.43GiB/s 2022-03-12 16:40:31.494525: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 16:40:31.496518: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2022-03-12 16:40:31.496582: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2022-03-12 16:40:31.497457: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2022-03-12 16:40:31.497728: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2022-03-12 16:40:31.499870: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2022-03-12 16:40:31.500441: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2022-03-12 16:40:31.500649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2022-03-12 16:40:31.500751: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.500913: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.501017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-03-12 16:40:31.501293: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-03-12 16:40:31.501488: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-03-12 16:40:31.501595: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.501719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1050 computeCapability: 6.1 coreClock: 1.468GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 104.43GiB/s 2022-03-12 16:40:31.501740: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 16:40:31.501767: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2022-03-12 16:40:31.501784: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2022-03-12 16:40:31.501799: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2022-03-12 16:40:31.501814: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2022-03-12 16:40:31.501829: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2022-03-12 16:40:31.501844: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2022-03-12 16:40:31.501859: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2022-03-12 16:40:31.501926: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.502067: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.502168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-03-12 16:40:31.502199: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 16:40:31.871299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-03-12 16:40:31.871332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2022-03-12 16:40:31.871341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2022-03-12 16:40:31.871522: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.871758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.871933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 16:40:31.872073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1619 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) tf.Tensor(-363.40094, shape=(), dtype=float32)
[3] Jeśli jest uruchomiony SELinux, zmień jego politykę.
[root@vlsr05 ~]# mcedit my-python.te # stwórz nowy module my-python 1.0; require { type container_t; type xserver_misc_device_t; type device_t; class chr_file { getattr ioctl map open read write }; } #============= container_t ============== allow container_t device_t:chr_file map; allow container_t device_t:chr_file { getattr ioctl open read write }; allow container_t xserver_misc_device_t:chr_file map; [root@vlsr05 ~]# checkmodule -m -M -o my-python.mod my-python.te [root@vlsr05 ~]# semodule_package --outfile my-python.pp --module my-python.mod [root@vlsr05 ~]# semodule -i my-python.pp
[4] Aby uruchomić kontener CUDA i TensorFlow przez zwykłych użytkowników, należy zmienić ustawienia.
[root@vlsr05 ~]# mcedit /etc/nvidia-container-runtime/config.toml disable-require = false #swarm-resource = "DOCKER_RESOURCE_GPU" #accept-nvidia-visible-devices-envvar-when-unprivileged = true #accept-nvidia-visible-devices-as-volume-mounts = false [nvidia-container-cli] #root = "/run/nvidia/driver" #path = "/usr/bin/nvidia-container-cli" environment = [] #debug = "/var/log/nvidia-container-toolkit.log" #ldcache = "/etc/ld.so.cache" load-kmods = true # odkomentuj i zmień na [true] no-cgroups = true #user = "root:video" ldconfig = "@/sbin/ldconfig" [nvidia-container-runtime] #debug = "/var/log/nvidia-container-runtime.log" # sprawdź poprawność działania logując się jako zwykły użytkownik i uruchom kontener [user01@vlsr05 ~]$ podman pull tensorflow/tensorflow:2.4.1-gpu [user01@vlsr05 ~]$ podman images REPOSITORY TAG IMAGE ID CREATED SIZE docker.io/tensorflow/tensorflow 2.4.1-gpu edb49f6a133b 13 months ago 5.55 GB # sprawdź poprawność działania uruchamiając [nvidia-smi] [user01@vlsr05 ~]$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ tensorflow:2.4.1-gpu /usr/bin/nvidia-smi Sat Mar 12 17:09:13 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 36% 37C P0 N/A / 75W | 0MiB / 2048MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ # sprawdź działanie kontenera uruchamiając program Hello World [user01@vlsr05 ~]$ podman run -e NVIDIA_VISIBLE_DEVICES=all --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ tensorflow:2.4.1-gpu python3 -c "import tensorflow as tf; hello = tf.constant('Hello, TensorFlow World'); tf.print(hello)" 2022-03-12 17:12:02.954811: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 17:12:04.243748: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-03-12 17:12:04.244530: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 2022-03-12 17:12:04.514011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.514161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1050 computeCapability: 6.1 coreClock: 1.468GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 104.43GiB/s 2022-03-12 17:12:04.514191: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 17:12:04.516306: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2022-03-12 17:12:04.516536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2022-03-12 17:12:04.517440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2022-03-12 17:12:04.517713: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2022-03-12 17:12:04.519853: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2022-03-12 17:12:04.520428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2022-03-12 17:12:04.520613: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2022-03-12 17:12:04.520703: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.520906: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.521006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-03-12 17:12:04.521302: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-03-12 17:12:04.521511: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-03-12 17:12:04.521634: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.521770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1050 computeCapability: 6.1 coreClock: 1.468GHz coreCount: 5 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 104.43GiB/s 2022-03-12 17:12:04.521791: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 17:12:04.521814: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 2022-03-12 17:12:04.521828: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11 2022-03-12 17:12:04.521842: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 2022-03-12 17:12:04.521857: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 2022-03-12 17:12:04.521872: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10 2022-03-12 17:12:04.521887: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 2022-03-12 17:12:04.521902: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 2022-03-12 17:12:04.521977: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.522113: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.522198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0 2022-03-12 17:12:04.522222: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2022-03-12 17:12:04.894981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-03-12 17:12:04.895018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0 2022-03-12 17:12:04.895030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N 2022-03-12 17:12:04.895212: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.895517: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.895721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-03-12 17:12:04.895858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1619 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) Hello, TensorFlow World