This is a follow up to https://hackweek.suse.com/projects/architecting-a-machine-learning-project-with-suse-caasp.

In the last hackweek I learned that in order to run machine learning workflows on top of SUSE CaaSP, the missing piece is to have libnvidia-containers and nvidia-containers-runtime-hook packaged.

Since then, nvidia has added the build for leap15 in libnvidia-container and nvidia-container-runtime.

However, none of them is released into the libnvidia-container repo nor nvidia-container-runtime repo.

This project is about packaging those two projects in the openSUSE Build Service for openSUSE Leap 15.1.

Looking for hackers with the skills:

nvidia machinelearning containers

This project is part of:

Hack Week 19

Activity

  • about 4 years ago: drdavis liked this project.
  • about 4 years ago: afesta liked this project.
  • about 4 years ago: jordimassaguerpla added keyword "nvidia" to this project.
  • about 4 years ago: jordimassaguerpla added keyword "machinelearning" to this project.
  • about 4 years ago: jordimassaguerpla added keyword "containers" to this project.
  • about 4 years ago: a_faerber liked this project.
  • about 4 years ago: jordimassaguerpla started this project.
  • about 4 years ago: jordimassaguerpla originated this project.

  • Comments

    • jordimassaguerpla
    • jordimassaguerpla
    • jordimassaguerpla
      about 4 years ago by jordimassaguerpla | Reply

      Prove that this worked:

      On a workstation with Quadro K2000 with SLE15SP1:

      Installing nvidia graphics driver kernel module

      zypper ar https://download.nvidia.com/suse/sle15sp1/ nvidia zypper ref zypper install nvidia-gfxG05-kmp-default modprobe nvidia lsmod | grep nvidia

      Expected output:

      nvidiadrm 49152 0 nvidiamodeset 1114112 1 nvidiadrm drmkmshelper 204800 1 nvidiadrm drm 536576 3 nvidiadrm,drmkmshelper nvidiauvm 1036288 0 nvidia 20414464 2 nvidiamodeset,nvidiauvm ipmimsghandler 110592 2 nvidia,ipmidevintf

      Installing NVIDIA driver for computing with GPUs using CUDA

      zypper install nvidia-computeG05 nvidia-smi

      Expected output:

      +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2000 Off | 00000000:05:00.0 Off | N/A | | 30% 43C P0 N/A / N/A | 0MiB / 1997MiB | 0% Default | +-------------------------------+----------------------+----------------------+

      +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

      Installing libnvidia-containers

      zypper ar https://download.opensuse.org/repositories/home:/jordimassaguerpla:/nvidia_container/SLE_15_SP1/ nvidia_container zypper install libnvidia-container usermod -G root USER usermod -G video USER

      USER should be a user in your system which is not root

      su - USER -c nvidia-container-cli info

      expected output

      NVRM version: 440.59 CUDA version: 10.2

      Device Index: 0 Device Minor: 0 Model: Quadro K2000 Brand: Quadro GPU UUID: GPU-6a04b812-c20e-aeb6-9047-6382930eef7d Bus Location: 00000000:05:00.0 Architecture: 3.0

      NOTE: we need to use a different user that is not root for this test because the root user does not run with the video group by default. We will fix this later when installing the toolkit. If you use root, you will see this error message

      nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected

      Installing nvidia-container-toolkit

      zypper install nvidia-container-toolkit

      Test with podman

      zypper install podman podman-cni-config podman run nvidia/cuda nvidia-smi

      expected output:

      +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K2000 Off | 00000000:05:00.0 Off | N/A | | 30% 43C P0 N/A / N/A | 0MiB / 1997MiB | 0% Default | +-------------------------------+----------------------+----------------------+

      +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

      So it works! add-emoji

    • jordimassaguerpla
    • jordimassaguerpla
      about 4 years ago by jordimassaguerpla | Reply

      The previous link got broken ... damn markdown ;) docs

    • jordimassaguerpla
      about 4 years ago by jordimassaguerpla | Reply

      Upstream (NVIDIA) uses Dockerfiles to build the packages for the other distros.

      Here a small experiment of building the SUSE Leap RPM with a Dockerfile within OBS:

      https://build.opensuse.org/package/show/home:jordimassaguerpla:branches:openSUSE:Templates:Images:15.1/libnvidia-containers

    • jordimassaguerpla
      about 4 years ago by jordimassaguerpla | Reply

      Result of the experiment. Using a Dockerfile works very good because you can develop and debug using "docker build" and then you can commit that to obs to have a build on a central location, store the sources, etc. etc.

      The issue is that the result is an image, it can't be the RPM. There is no "-v" option to mount a volume during the build. Thus, even you can build the image in obs, then you have to run the image to extract the RPM.

      obs=build.opensuse.org.

    Similar Projects

    AI frontend to Bugzilla by paolodepa

    Project Description

    Over the years, our b...


    Predefined app security policy template for NeuVector by feih

    Project Description

    Idea is to predefin...


    Containerized home mirror by lkocman

    I'm running a simple home mirror, but I managed...


    Hangar: tool for mirror container images & generate rancher image lists. by StarryWang

    Project Description

    Hangar is a tool for ...


    Package MONAI Machine Learning Models for Medical Applications by jordimassaguerpla

    Project Description

    MONAI Deploy aims to ...


    A set of utilities to produce a "from scratch" OCI/Docker container using Opensuse/SLE rpms by ldragon

    [comment]: # (Please use the project descriptio...