OpenCV CUDA on 🛞`s
I think we all know opencv - it’s a pretty old yet widely used and performant computer vision library with a lot of useful algorithms. One of the advantages of it - you can configure opencv how you want to get really nice performance gain. One of the ways - compile it to multiply matrices on GPUs to speed up both “old” CV filters (which are essentially convolutions) and use deep learning inference (yes, you can do it in opencv with the special module added).
Preparing the environment with GPUs and CUDA
Here I use machine with GPUs and Ubuntu as the base OS with CUDA drivers installed (don’t want to place here instructions on how to install CUDA, but I suggest just to use “official” nvidia docker images).
It’s pretty common nowadays to use both for training and inference in prod. relatively cheap VMs with NVIDIA A10G GPUs. So to build our libs, I’ve used g5.xlarge
AWS EC2 instance with Ubuntu 20.04 as a base image, python 3.8 and CUDA 11.6 - you can use any similar setup.
Why Wheel
Python wheel is a standard of python package distribution (google’s definition). Essentially - it’s just a zip archive with all the files needed to install the package, like python code, *.pyc byte code and some compiled platform-specific native shared libraries (e.g. *.so).
The core advantage that makes us care about wheels in the ML domain - no need to compile libs during installation since we have a wheel that already contain compiled extension modules. Since almost all of libraries that we use are essentially python bindings for some lower level code in C/C++/Fortran/
Ideally, lib’s developer should put pre-built wheels for the specific platforms somewhere publicly, so everybody could just pip install them. But it’s not the case for a lot of important libraries, unfortunately ;(
It becomes especially important when we want to build libs with CUDA support.
And, also, some libraries may be very slow to build on the CI workers inside the special containers, like triton. So, obviously, installing pre-built wheel just with pip install saves a lot of time.
Compile
First, log-in to build machine and install OS dependencies:
apt-get update -y && apt-get upgrade -y
apt-get install -y \
build-essential git cmake \
unzip pkg-config wget \
libavcodec-dev libavformat-dev libswscale-dev \
libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev \
libgtk-3-dev libpng-dev libjpeg-dev \
libopenexr-dev libtiff-dev libwebp-dev \
libv4l-dev libxvidcore-dev libx264-dev \
libgtk-3-dev libatlas-base-dev gfortran
Then, you need to pull the opencv-python repo and follow the instructions for the manual build.
During the build, you need to provide cmake
flags, that suits your use cases. At this particular case, we want CUDA support and extra libs (ENABLE_CONTRIB
) with neural networks support to be compiled. Here is the script:
cd opencv-python
rm -rf build && mkdir -p build
pip install numpy==1.23.4
export CMAKE_ARGS="-D CMAKE_BUILD_TYPE=RELEASE -D INSTALL_PYTHON_EXAMPLES=OFF -D INSTALL_C_EXAMPLES=OFF -D OPENCV_ENABLE_NONFREE=ON -D WITH_CUDA=ON -D WITH_CUDNN=ON -D OPENCV_DNN_CUDA=ON -D ENABLE_FAST_MATH=1 -D CUDA_FAST_MATH=1 -D CUDA_ARCH_BIN=8.6 -D WITH_CUBLAS=1 -D BUILD_EXAMPLES=OFF"
export ENABLE_CONTRIB=1
pip wheel . --verbose -w dist
It could take from 30 mins to couple hours to compile. Be patient.
If everything works - you’ll have a precious wheel archive in the dist:
opencv_contrib_python-4.7.0.77db6ba-cp38-cp38-linux_x86_64.whl
Install and add runtime dependencies
Now, you can just install built wheel with pip:
pip install opencv_contrib_python-4.7.0.77db6ba-cp38-cp38-linux_x86_64.whl
After that open Python interpreter and try to import cv2
, most probably you’ll see errors related to some *.so
could not be found:
ImportError: libhdf5_serial.so.100: cannot open shared object file: No such file or directory
That’s because opencv relies on lots of shared libraries that should be installed on OS level and you don’t have them installed yet on your system.
In order to debug that issue, you’ll need two tools: ldd
and apt-file
. ldd
is usually presented on most of the Linux distributions and is used to print out all the shared object dependencies. And
apt-file
is the utility to find apt
package by providing some string pattern to search for. You can install and update cache like that:
sudo apt install apt-file && apt-file update
So first you run ldd
against opencv shared object, that should be located somewhere there:
ldd /usr/local/lib/python3.X/dist-packages/cv2*.so
The output of ldd
could look like that:
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f60a55bf000)
...
libhdf5_serial.so.100 => not found
So you’re interested in not found lines (to make it simple just ldd
+grep
:
ldd cv2*.so | grep "not found"
And then apt-file could be used to find the package those object files are belong to:
apt-file search libhdf5_serial.so.100
It will output the list of apt packages (there could be duplicates - that’s normal):
libhdf5-100: /usr/lib/arm-linux-gnueabihf/libhdf5_serial.so.100
libhdf5-100: /usr/lib/arm-linux-gnueabihf/libhdf5_serial.so.100.0.1
So you just install that libhdf5-100
with apt install libhdf5-100
and it should be found next time you run ldd
!
Repeat that for all the not found dependencies and opencv will finally work!
>>> import cv2
>>> cv2.__version__
'4.7.0'
>>> print(dir(cv2.cuda))
[... 'ORB', 'ORB_create', 'OpticalFlowDual_TVL1', 'OpticalFlowDual_TVL1_create', 'SHARED_ATOMICS', 'SURF_CUDA',...]
After doing all that, you’ll end up with the list of OS dependencies that should be installed in the container alongside with the built wheel to use it in your apps - so just add it to your Dockerfile. Here is a real missing deps for the tritonserver
container:
RUN apt-get update -y && apt-get install -y \
libhdf5-103 \
libgtk-3-0 \
libdc1394-22 \
libgstreamer-plugins-base1.0-0 \
libavcodec58 \
libavformat58 \
libswscale5
Test inference
Let’s image that we downloaded tensorflow pre-trained FSRCNN model and placed it to the same folder as the run-script:
import os
import cv2
from cv2 import dnn_superres
import numpy as np
base_path = os.path.dirname(os.path.abspath(__file__))
model_id = os.path.join(base_path, "FSRCNN_x2.pb")
net = dnn_superres.DnnSuperResImpl_create()
net.readModel(model_id)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
net.setModel("fsrcnn", 2)
test_img = (np.random.random((512, 512, 3)) * 255).astype(np.uint8)
result = net.upsample(test_img)
print(test_img.shape, result.shape)
Excpected output:
>>>
(512, 512, 3) (1024, 1024, 3)
Congrats, it works!