This is a follow up to https://hackweek.suse.com/projects/architecting-a-machine-learning-project-with-suse-caasp.
In the last hackweek I learned that in order to run machine learning workflows on top of SUSE CaaSP, the missing piece is to have libnvidia-containers and nvidia-containers-runtime-hook packaged.
Since then, nvidia has added the build for leap15 in libnvidia-container and nvidia-container-runtime.
However, none of them is released into the libnvidia-container repo nor nvidia-container-runtime repo.
This project is about packaging those two projects in the openSUSE Build Service for openSUSE Leap 15.1.
Looking for hackers with the skills:
This project is part of:
Hack Week 19
Activity
Comments
-
over 5 years ago by jordimassaguerpla | Reply
First package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/libnvidia-container And a Pull Request to upstream: https://github.com/NVIDIA/libnvidia-container/pull/77
-
over 5 years ago by jordimassaguerpla | Reply
Second package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/nvidia-container-runtime-toolkit
-
over 5 years ago by jordimassaguerpla | Reply
Prove that this worked:
On a workstation with Quadro K2000 with SLE15SP1:
Installing nvidia graphics driver kernel module
zypper ar https://download.nvidia.com/suse/sle15sp1/ nvidia zypper ref zypper install nvidia-gfxG05-kmp-default modprobe nvidia lsmod | grep nvidiaExpected output:
nvidia_drm 49152 0 nvidia_modeset 1114112 1 nvidia_drm drm_kms_helper 204800 1 nvidia_drm drm 536576 3 nvidia_drm,drm_kms_helper nvidia_uvm 1036288 0 nvidia 20414464 2 nvidia_modeset,nvidia_uvm ipmi_msghandler 110592 2 nvidia,ipmi_devintfInstalling NVIDIA driver for computing with GPUs using CUDA
zypper install nvidia-computeG05 nvidia-smiExpected output:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2000 Off | 00000000:05:00.0 Off | N/A | | 30% 43C P0 N/A / N/A | 0MiB / 1997MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+Installing libnvidia-containers
zypper ar https://download.opensuse.org/repositories/home:/jordimassaguerpla:/nvidia_container/SLE_15_SP1/ nvidia_container zypper install libnvidia-container usermod -G root USER usermod -G video USERUSER should be a user in your system which is not root
su - USER -c nvidia-container-cli infoexpected output
NVRM version: 440.59 CUDA version: 10.2 Device Index: 0 Device Minor: 0 Model: Quadro K2000 Brand: Quadro GPU UUID: GPU-6a04b812-c20e-aeb6-9047-6382930eef7d Bus Location: 00000000:05:00.0 Architecture: 3.0> NOTE: we need to use a different user that is not root for this test because the root user does not run with the video group by default. We will fix this later when installing the toolkit. If you use root, you will see this error message
nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detectedInstalling nvidia-container-toolkit
zypper install nvidia-container-toolkitTest with podman
zypper install podman podman-cni-config podman run nvidia/cuda nvidia-smiexpected output:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2000 Off | 00000000:05:00.0 Off | N/A | | 30% 43C P0 N/A / N/A | 0MiB / 1997MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+So it works!
-
over 5 years ago by jordimassaguerpla | Reply
As a result, I updated the docs: https://github.com/jordimassaguerpla/SUSEhackweek18/commit/5fca6c12034b4df34c403f14276be754e809b086#diff-2df0241dfedf44f37dcafae751ab29ae
-
over 5 years ago by jordimassaguerpla | Reply
The previous link got broken ... damn markdown ;) docs
-
over 5 years ago by jordimassaguerpla | Reply
Upstream (NVIDIA) uses Dockerfiles to build the packages for the other distros.
Here a small experiment of building the SUSE Leap RPM with a Dockerfile within OBS:
https://build.opensuse.org/package/show/home:jordimassaguerpla:branches:openSUSE:Templates:Images:15.1/libnvidia-containers
-
over 5 years ago by jordimassaguerpla | Reply
Result of the experiment. Using a Dockerfile works very good because you can develop and debug using "docker build" and then you can commit that to obs to have a build on a central location, store the sources, etc. etc.
The issue is that the result is an image, it can't be the RPM. There is no "-v" option to mount a volume during the build. Thus, even you can build the image in obs, then you have to run the image to extract the RPM.
obs=build.opensuse.org.
Similar Projects
Technical talks at universities by agamez
Description
This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.
For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.
Goals
- Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
- Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
- Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.
Resources
- Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
- SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.