Tuesday, January 4, 2022

[SOLVED] W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries

Issue

How should I fix this in CentOS 7?

[jalal@goku ~]$ pip freeze | grep tensorflow
tensorflow-estimator==2.2.0
tensorflow-gpu==2.2.0
[jalal@goku ~]$ python
Python 3.8.5 (default, Mar 31 2021, 02:37:07) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-06-07 23:50:07.811271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-07 23:50:07.867796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-07 23:50:07.869403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-07 23:50:07.870136: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:
2021-06-07 23:50:07.874249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-07 23:50:07.877819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-07 23:50:07.878745: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-07 23:50:07.882687: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-07 23:50:07.884788: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-07 23:50:07.890952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-07 23:50:07.891011: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Num GPUs Available:  0

This is despite having two GPUs: enter image description here

[jalal@goku ~]$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.9.2009 (Core)
Release:    7.9.2009
Codename:   Core

also,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

I tried the following as suggested by https://github.com/tensorflow/tensorflow/issues/38194#issuecomment-629801937 and didn't work:

[jalal@goku djrn]$ ls /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
lrwxrwxrwx. 1 root root 20 Sep 21  2020 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 -> libcudart.so.10.2.89
[jalal@goku djrn]$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1
[sudo] password for jalal: 
ln: failed to create symbolic link ‘/usr/lib/x86_64-linux-gnu/libcudart.so.10.1’: No such file or directory

since:

ls: cannot access /usr/lib/x86_64-linux-gnu: No such file or directory

To be specific, I need tensforflow to work with CUDA 10.2, I am fine with any version of tensorflow (preference is tensorflow 2+), however couldn't find a version that works with CUDA 10.2. https://www.tensorflow.org/install/source#tested_build_configurations

Also, based on this, my CUDA version is 10.2 which is different from both nvidia-smi and nvcc --version versions:

$ stat /usr/local/cuda
  File: ‘/usr/local/cuda’ -> ‘/usr/local/cuda-10.2’
  Size: 20          Blocks: 0          IO Block: 4096   symbolic link
Device: fd00h/64768d    Inode: 67157410    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:usr_t:s0
Access: 2021-05-20 10:43:06.864530636 -0400
Modify: 2020-09-21 09:39:18.559883390 -0400
Change: 2020-09-21 09:39:18.559883390 -0400
 Birth: -

P.S.: I have made my virtual environment using python venv command and don't want to use conda or pyenv.

P.P.S.: I did this soft link and still doesn't work:

(djrn) [jalal@goku djrn]$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/libcudart.so.10.1
[sudo] password for jalal: 
(djrn) [jalal@goku djrn]$ ls /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
lrwxrwxrwx. 1 root root 20 Sep 21  2020 /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 -> libcudart.so.10.2.89
(djrn) [jalal@goku djrn]$ python
Python 3.8.5 (default, Mar 31 2021, 02:37:07) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-06-08 01:40:39.152040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-08 01:40:39.401399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:40:39.403106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:40:39.403438: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64:
2021-06-08 01:40:39.406985: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-08 01:40:39.410320: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-08 01:40:39.410912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-08 01:40:39.414628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-08 01:40:39.416297: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-08 01:40:39.422208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-08 01:40:39.422260: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Num GPUs Available:  0
>>> 

Solution

Credits to jonno_FTW

$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1
$ export LD_LIBRARY_PATH=/usr/lib

solved the problem. Now I see the following as output:

(djrn) [jalal@goku djrn]$ python
Python 3.8.5 (default, Mar 31 2021, 02:37:07) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-06-08 01:45:59.138197: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-06-08 01:45:59.191833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:45:59.193773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-06-08 01:45:59.194216: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-06-08 01:45:59.197372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-06-08 01:45:59.200555: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-06-08 01:45:59.201078: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-06-08 01:45:59.204664: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-06-08 01:45:59.206295: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-06-08 01:45:59.212072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-06-08 01:45:59.217509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
Num GPUs Available:  2


Answered By - Mona Jalal