SUSE Hack Week: Packaging libnvidia-containers and nvidia-container-runtime-hook

This is a follow up to https://hackweek.suse.com/projects/architecting-a-machine-learning-project-with-suse-caasp.

In the last hackweek I learned that in order to run machine learning workflows on top of SUSE CaaSP, the missing piece is to have libnvidia-containers and nvidia-containers-runtime-hook packaged.

Since then, nvidia has added the build for leap15 in libnvidia-container and nvidia-container-runtime.

However, none of them is released into the libnvidia-container repo nor nvidia-container-runtime repo.

This project is about packaging those two projects in the openSUSE Build Service for openSUSE Leap 15.1.

Looking for hackers with the skills:

nvidia machinelearning containers

This project is part of:

Hack Week 19

Activity

almost 6 years ago: drdavis liked this project.

almost 6 years ago: afesta liked this project.

almost 6 years ago: jordimassaguerpla added keyword "nvidia" to this project.

almost 6 years ago: jordimassaguerpla added keyword "machinelearning" to this project.

almost 6 years ago: jordimassaguerpla added keyword "containers" to this project.

almost 6 years ago: a_faerber liked this project.

almost 6 years ago: jordimassaguerpla started this project.

almost 6 years ago: jordimassaguerpla originated this project.

Comments

almost 6 years ago by jordimassaguerpla | Reply

First package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/libnvidia-container And a Pull Request to upstream: https://github.com/NVIDIA/libnvidia-container/pull/77

almost 6 years ago by jordimassaguerpla | Reply

Second package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/nvidia-container-runtime-toolkit

almost 6 years ago by jordimassaguerpla | Reply

Prove that this worked:

On a workstation with Quadro K2000 with SLE15SP1:

Installing nvidia graphics driver kernel module

zypper ar https://download.nvidia.com/suse/sle15sp1/ nvidia
zypper ref
zypper install nvidia-gfxG05-kmp-default
modprobe nvidia
lsmod | grep nvidia

Expected output:

nvidia_drm             49152  0
nvidia_modeset       1114112  1 nvidia_drm
drm_kms_helper        204800  1 nvidia_drm
drm                   536576  3 nvidia_drm,drm_kms_helper
nvidia_uvm           1036288  0
nvidia              20414464  2 nvidia_modeset,nvidia_uvm
ipmi_msghandler       110592  2 nvidia,ipmi_devintf

Installing NVIDIA driver for computing with GPUs using CUDA

zypper install nvidia-computeG05
nvidia-smi

Expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Installing libnvidia-containers

zypper ar https://download.opensuse.org/repositories/home:/jordimassaguerpla:/nvidia_container/SLE_15_SP1/ nvidia_container
zypper install libnvidia-container
usermod -G root USER
usermod -G video USER

USER should be a user in your system which is not root

su - USER -c nvidia-container-cli info

expected output

NVRM version:   440.59
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          Quadro K2000
Brand:          Quadro
GPU UUID:       GPU-6a04b812-c20e-aeb6-9047-6382930eef7d
Bus Location:   00000000:05:00.0
Architecture:   3.0

> NOTE: we need to use a different user that is not root for this test because the root user does not run with the video group by default. We will fix this later when installing the toolkit. If you use root, you will see this error message

nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected

Installing nvidia-container-toolkit

zypper install nvidia-container-toolkit

Test with podman

zypper install podman podman-cni-config
podman run nvidia/cuda nvidia-smi

expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

So it works!

Prove that this worked:

On a workstation with Quadro K2000 with SLE15SP1:

Installing nvidia graphics driver kernel module

zypper ar https://download.nvidia.com/suse/sle15sp1/ nvidia
zypper ref
zypper install nvidia-gfxG05-kmp-default
modprobe nvidia
lsmod | grep nvidia

Expected output:

nvidia_drm             49152  0
nvidia_modeset       1114112  1 nvidia_drm
drm_kms_helper        204800  1 nvidia_drm
drm                   536576  3 nvidia_drm,drm_kms_helper
nvidia_uvm           1036288  0
nvidia              20414464  2 nvidia_modeset,nvidia_uvm
ipmi_msghandler       110592  2 nvidia,ipmi_devintf

Installing NVIDIA driver for computing with GPUs using CUDA

zypper install nvidia-computeG05
nvidia-smi

Expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Installing libnvidia-containers

zypper ar https://download.opensuse.org/repositories/home:/jordimassaguerpla:/nvidia_container/SLE_15_SP1/ nvidia_container
zypper install libnvidia-container
usermod -G root USER
usermod -G video USER

USER should be a user in your system which is not root

su - USER -c nvidia-container-cli info

expected output

NVRM version:   440.59
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          Quadro K2000
Brand:          Quadro
GPU UUID:       GPU-6a04b812-c20e-aeb6-9047-6382930eef7d
Bus Location:   00000000:05:00.0
Architecture:   3.0

nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected

Installing nvidia-container-toolkit

zypper install nvidia-container-toolkit

Test with podman

zypper install podman podman-cni-config
podman run nvidia/cuda nvidia-smi

expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

So it works!

# A First Level Header
## A Second Level Header

Use one asterisk to *emphasize*

Use two asterisks for **strong emphasis**

- Use hyphens
- for unordereed
- lists

This is an [link to example.com](http://example.com/)

This is an image ![an openSUSE geeko icon](https://en.opensuse.org/images/d/d0/Icon-distribution.png)

This is a user link @hans

This is a project link hw#some-cool-title

More Complex Markdown Help

Formatting Help

almost 6 years ago by jordimassaguerpla | Reply

As a result, I updated the docs: https://github.com/jordimassaguerpla/SUSEhackweek18/commit/5fca6c12034b4df34c403f14276be754e809b086#diff-2df0241dfedf44f37dcafae751ab29ae

almost 6 years ago by jordimassaguerpla | Reply

The previous link got broken ... damn markdown ;) docs

almost 6 years ago by jordimassaguerpla | Reply

Upstream (NVIDIA) uses Dockerfiles to build the packages for the other distros.

Here a small experiment of building the SUSE Leap RPM with a Dockerfile within OBS:

https://build.opensuse.org/package/show/home:jordimassaguerpla:branches:openSUSE:Templates:Images:15.1/libnvidia-containers

almost 6 years ago by jordimassaguerpla | Reply

Result of the experiment. Using a Dockerfile works very good because you can develop and debug using "docker build" and then you can commit that to obs to have a build on a central location, store the sources, etc. etc.

The issue is that the result is an image, it can't be the RPM. There is no "-v" option to mount a volume during the build. Thus, even you can build the image in obs, then you have to run the image to extract the RPM.

obs=build.opensuse.org.

Similar Projects

containers

Technical talks at universities by agamez

Description

This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.

For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.

Goals

Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.

Resources

Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.

Rewrite Distrobox in go (POC) by fabriziosestito

Description

Rewriting Distrobox in Go.

Main benefits:

Easier to maintain and to test
Adapter pattern for different container backends (LXC, systemd-nspawn, etc.)

Goals

Build a minimal starting point with core commands
Keep the CLI interface compatible: existing users shouldn't notice any difference
Use a clean Go architecture with adapters for different container backends
Keep dependencies minimal and binary size small
Benchmark against the original shell script

Resources

Upstream project: https://github.com/89luca89/distrobox/
Distrobox site: https://distrobox.it/
ArchWiki: https://wiki.archlinux.org/title/Distrobox