This is a follow up to https://hackweek.suse.com/projects/architecting-a-machine-learning-project-with-suse-caasp.

In the last hackweek I learned that in order to run machine learning workflows on top of SUSE CaaSP, the missing piece is to have libnvidia-containers and nvidia-containers-runtime-hook packaged.

Since then, nvidia has added the build for leap15 in libnvidia-container and nvidia-container-runtime.

However, none of them is released into the libnvidia-container repo nor nvidia-container-runtime repo.

This project is about packaging those two projects in the openSUSE Build Service for openSUSE Leap 15.1.

Looking for hackers with the skills:

nvidia machinelearning containers

This project is part of:

Hack Week 19

Activity

  • almost 5 years ago: drdavis liked this project.
  • almost 5 years ago: afesta liked this project.
  • almost 5 years ago: jordimassaguerpla added keyword "nvidia" to this project.
  • almost 5 years ago: jordimassaguerpla added keyword "machinelearning" to this project.
  • almost 5 years ago: jordimassaguerpla added keyword "containers" to this project.
  • almost 5 years ago: a_faerber liked this project.
  • almost 5 years ago: jordimassaguerpla started this project.
  • almost 5 years ago: jordimassaguerpla originated this project.

  • Comments

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      First package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/libnvidia-container And a Pull Request to upstream: https://github.com/NVIDIA/libnvidia-container/pull/77

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      Second package ready: https://build.opensuse.org/package/show/home:jordimassaguerpla:nvidia_container/nvidia-container-runtime-toolkit

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      Prove that this worked:

      On a workstation with Quadro K2000 with SLE15SP1:

      Installing nvidia graphics driver kernel module

      zypper ar https://download.nvidia.com/suse/sle15sp1/ nvidia
      zypper ref
      zypper install nvidia-gfxG05-kmp-default
      modprobe nvidia
      lsmod | grep nvidia
      

      Expected output:

      nvidia_drm             49152  0
      nvidia_modeset       1114112  1 nvidia_drm
      drm_kms_helper        204800  1 nvidia_drm
      drm                   536576  3 nvidia_drm,drm_kms_helper
      nvidia_uvm           1036288  0
      nvidia              20414464  2 nvidia_modeset,nvidia_uvm
      ipmi_msghandler       110592  2 nvidia,ipmi_devintf
      

      Installing NVIDIA driver for computing with GPUs using CUDA

      zypper install nvidia-computeG05
      nvidia-smi
      

      Expected output:

      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
      | 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      |  No running processes found                                                 |
      +-----------------------------------------------------------------------------+
      

      Installing libnvidia-containers

      zypper ar https://download.opensuse.org/repositories/home:/jordimassaguerpla:/nvidia_container/SLE_15_SP1/ nvidia_container
      zypper install libnvidia-container
      usermod -G root USER
      usermod -G video USER
      

      USER should be a user in your system which is not root

      su - USER -c nvidia-container-cli info
      

      expected output

      NVRM version:   440.59
      CUDA version:   10.2
      
      Device Index:   0
      Device Minor:   0
      Model:          Quadro K2000
      Brand:          Quadro
      GPU UUID:       GPU-6a04b812-c20e-aeb6-9047-6382930eef7d
      Bus Location:   00000000:05:00.0
      Architecture:   3.0
      

      > NOTE: we need to use a different user that is not root for this test because the root user does not run with the video group by default. We will fix this later when installing the toolkit. If you use root, you will see this error message

      nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected
      

      Installing nvidia-container-toolkit

      zypper install nvidia-container-toolkit
      

      Test with podman

      zypper install podman podman-cni-config
      podman run nvidia/cuda nvidia-smi
      

      expected output:

      +-----------------------------------------------------------------------------+
      | NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
      |-------------------------------+----------------------+----------------------+
      | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
      |===============================+======================+======================|
      |   0  Quadro K2000        Off  | 00000000:05:00.0 Off |                  N/A |
      | 30%   43C    P0    N/A /  N/A |      0MiB /  1997MiB |      0%      Default |
      +-------------------------------+----------------------+----------------------+
      
      +-----------------------------------------------------------------------------+
      | Processes:                                                       GPU Memory |
      |  GPU       PID   Type   Process name                             Usage      |
      |=============================================================================|
      |  No running processes found                                                 |
      +-----------------------------------------------------------------------------+
      

      So it works! add-emoji

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      As a result, I updated the docs: https://github.com/jordimassaguerpla/SUSEhackweek18/commit/5fca6c12034b4df34c403f14276be754e809b086#diff-2df0241dfedf44f37dcafae751ab29ae

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      The previous link got broken ... damn markdown ;) docs

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      Upstream (NVIDIA) uses Dockerfiles to build the packages for the other distros.

      Here a small experiment of building the SUSE Leap RPM with a Dockerfile within OBS:

      https://build.opensuse.org/package/show/home:jordimassaguerpla:branches:openSUSE:Templates:Images:15.1/libnvidia-containers

    • jordimassaguerpla
      almost 5 years ago by jordimassaguerpla | Reply

      Result of the experiment. Using a Dockerfile works very good because you can develop and debug using "docker build" and then you can commit that to obs to have a build on a central location, store the sources, etc. etc.

      The issue is that the result is an image, it can't be the RPM. There is no "-v" option to mount a volume during the build. Thus, even you can build the image in obs, then you have to run the image to extract the RPM.

      obs=build.opensuse.org.

    Similar Projects

    FamilyTrip Planner: A Personalized Travel Planning Platform for Families by pherranz

    Description

    FamilyTrip Planner is an innovative travel planning application designed to optimize travel experiences for families with children. By integrating APIs for flights, accommodations, and local activities, the app generates complete itineraries tailored to each family’s unique interests and needs. Recommendations are based on customizable parameters such as destination, trip duration, children’s ages, and personal preferences. FamilyTrip Planner not only simplifies the travel planning process but also offers a comprehensive, personalized experience for families.

    Goals

    This project aims to: - Create a user-friendly platform that assists families in planning complete trips, from flight and accommodation options to recommended family-friendly activities. - Provide intelligent, personalized travel itineraries using artificial intelligence to enhance travel enjoyment and minimize time and cost. - Serve as an educational project for exploring Go programming and artificial intelligence, with the goal of building proficiency in both.

    Resources

    To develop FamilyTrip Planner, the project will leverage: - APIs such as Skyscanner, Google Places, and TripAdvisor to source real-time information on flights, accommodations, and activities. - Go programming language to manage data integration, API connections, and backend development. - Basic machine learning libraries to implement AI-driven itinerary suggestions tailored to family needs and preferences.


    Technical talks at universities by agamez

    Description

    This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.

    For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.

    Goals

    • Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
    • Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
    • Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.

    Resources

    • Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
    • SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.


    ADS-B receiver with MicroOS by epaolantonio

    I would like to put one of my spare Raspberry Pis to good use, and what better way to see what flies above my head at any time? add-emoji

    There are various ready-to-use distros already set-up to provide feeder data to platforms like Flightradar24, ADS-B Exchange, FlightAware etc... The goal here would be to do it using MicroOS as a base and containerized decoding of ADS-B data (via tools like dump1090) and web frontend (tar1090).

    Goals

    • Create a working receiver using MicroOS as a base, and containers based on Tumbleweed
    • Make it easy to install
    • Optimize for maximum laziness (i.e. it should take care of itself with minimum intervention)

    Resources

    • 1x Small Board Computer capable of running MicroOS
    • 1x RTL2832U DVB-T dongle
    • 1x MicroSD card
    • https://github.com/antirez/dump1090
    • https://github.com/flightaware/dump1090 (dump1090 fork by FlightAware)
    • https://github.com/wiedehopf/tar1090

    Project status (2024-11-22)

    So I'd say that I'm pretty satisfied with how it turned out. I've packaged readsb (as a replacement for dump1090), tar1090, tar1090-db and mlat-client (not used yet).

    Current status:

    • Able to set-up a working receiver using combustion+ignition (web app based on Fuel Ignition)
    • Able to feed to various feeds using the Beast protocol (Airplanes.live, ADSB.fi, ADSB.lol, ADSBExchange.com, Flyitalyadsb.com, Planespotters.net)
    • Able to feed to Flightradar24 (initial-setup available but NOT tested! I've only tested using a key I already had)
    • Local web interface (tar1090) to easily visualize the results
    • Cockpit pre-configured to ease maintenance

    What's missing:

    • MLAT (Multilateration) support. I've packaged mlat-client already, but I have to wire it up
    • FlightAware support

    Give it a go at https://g7.github.io/adsbreceiver/ !

    Project links


    Enable the containerized Uyuni server to run on different host OS by j_renner

    Description

    The Uyuni server is provided as a container, but we still require it to run on Leap Micro? This is not how people expect to use containerized applications, so it would be great if we tested other host OSs and enabled them by providing builds of necessary tools for (e.g. mgradm). Interesting candidates should be:

    • openSUSE Leap
    • Cent OS 7
    • Ubuntu
    • ???

    Goals

    Make it really easy for anyone to run the Uyuni containerized server on whatever OS they want (with support for containers of course).


    Improve Development Environment on Uyuni by mbussolotto

    Description

    Currently create a dev environment on Uyuni might be complicated. The steps are:

    • add the correct repo
    • download packages
    • configure your IDE (checkstyle, format rules, sonarlint....)
    • setup debug environment
    • ...

    The current doc can be improved: some information are hard to be find out, some others are completely missing.

    Dev Container might solve this situation.

    Goals

    Uyuni development in no time:

    • using VSCode:
      • setting.json should contains all settings (for all languages in Uyuni, with all checkstyle rules etc...)
      • dev container should contains all dependencies
      • setup debug environment
    • implement a GitHub Workspace solution
    • re-write documentation

    Lots of pieces are already implemented: we need to connect them in a consistent solution.

    Resources

    • https://github.com/uyuni-project/uyuni/wiki


    SUSE AI Meets the Game Board by moio

    Use tabletopgames.ai’s open source TAG and PyTAG frameworks to apply Statistical Forward Planning and Deep Reinforcement Learning to two board games of our own design. On an all-green, all-open source, all-AWS stack!
    A chameleon playing chess in a train car, as a metaphor of SUSE AI applied to games


    Results: Infrastructure Achievements

    We successfully built and automated a containerized stack to support our AI experiments. This included:

    A screenshot of k9s and nvtop showing PyTAG running in Kubernetes with GPU acceleration

    ./deploy.sh and voilà - Kubernetes running PyTAG (k9s, above) with GPU acceleration (nvtop, below)

    Results: Game Design Insights

    Our project focused on modeling and analyzing two card games of our own design within the TAG framework:

    • Game Modeling: We implemented models for Dario's "Bamboo" and Silvio's "Totoro" and "R3" games, enabling AI agents to play thousands of games ...in minutes!
    • AI-driven optimization: By analyzing statistical data on moves, strategies, and outcomes, we iteratively tweaked the game mechanics and rules to achieve better balance and player engagement.
    • Advanced analytics: Leveraging AI agents with Monte Carlo Tree Search (MCTS) and random action selection, we compared performance metrics to identify optimal strategies and uncover opportunities for game refinement .

    Cards from the three games

    A family picture of our card games in progress. From the top: Bamboo, Totoro, R3

    Results: Learning, Collaboration, and Innovation

    Beyond technical accomplishments, the project showcased innovative approaches to coding, learning, and teamwork:

    • "Trio programming" with AI assistance: Our "trio programming" approach—two developers and GitHub Copilot—was a standout success, especially in handling slightly-repetitive but not-quite-exactly-copypaste tasks. Java as a language tends to be verbose and we found it to be fitting particularly well.
    • AI tools for reporting and documentation: We extensively used AI chatbots to streamline writing and reporting. (Including writing this report! ...but this note was added manually during edit!)
    • GPU compute expertise: Overcoming challenges with CUDA drivers and cloud infrastructure deepened our understanding of GPU-accelerated workloads in the open-source ecosystem.
    • Game design as a learning platform: By blending AI techniques with creative game design, we learned not only about AI strategies but also about making games fun, engaging, and balanced.

    Last but not least we had a lot of fun! ...and this was definitely not a chatbot generated line!

    The Context: AI + Board Games