Project Description

MONAI Deploy aims to become the de-facto standard for developing packaging, testing, deploying, and running medical AI applications in clinical production. MONAI Deploy creates a set of intermediate steps where researchers and physicians can build confidence in the techniques and approaches used with AI — allowing for an iterative workflow.

Contributors to MONAI include Nvidia, Mayo Clinic, King's College London, NHS, Standford University, ... and many others.

The core piece of MONAI Deploy is the Monai Application Package (MAP), which contains the ML model in a "runnable state". It is implemented as a container, and built with docker-nvidia. This is relevant because you need a GPU to build the MAP. You'll see later why this is relevant. This container can be later run on a k8s cluster.

Before you create a MAP, you need to train the ML models. They are trained with MONAI Core, another framework which is a piece of the whole MONAI "puzzle". Those models can be published in the MONAI Model Zoo. They are published using a very specific format, which is called "a bundle".

In the monai deploy app sdk project in github, you can see several examples on how to package a "model bundle" into a MAP. Plus in the documentation you can find a step by step guide on how to build them, meaning how to create the MAP (the container). Examples, both code and documentation, use the ML models in the MONAI model Zoo.

MONAI model Zoo is free, you can search for models and use them for your research. However, there is not such a repository for MAPs, even the docs and examples show how to build those models into MAPS.

And this is the motivation of the project, to create this "link", and release into a registry, at least one MAP based on a model in MONAI Model Zoo.

Goal for this Hackweek

The specific goal is to implement a Continuous Integration workflow that builds a MAP (Monai Application Package), based on the example in code and documentation. Specifically, it is to implement a github action workflow that releases it into github container registry.

Implementation

The github action workflow will be added to a fork of the monai-deploy-app-sdk project, given we will be using the code that is already in the examples directory. Later a Pull Request can be created to the upstream project.

A limitation of this project is that we need to run the github action in a GPU node. ASFAIK github does not support that, so we need to run this on an external runner. For that I will be using MS Azure cloud to host a vm with GPU. For 3 reasons: first, it should be faster to clone from github from azure; second, I will try to use the free 90 days; third, I want to get familiar with Azure.

Finally, most probably I will use terraform to deploy the node in Azure.

This way, every time we want to release a new model in the MAP format, we will deploy a vm in Azure, do the build with the GPU, release into the github container registry, and remove the vm.

Resources

https://monai.io/ https://monai.io/model-zoo.html https://docs.monai.io/projects/monai-deploy-app-sdk/en/latest/gettingstarted/tutorials/monaibundle_app.html https://github.com/Project-MONAI/monai-deploy-app-sdk/tree/main/examples/apps

This project is part of:

Hack Week 23

Activity

  • almost 2 years ago: jordimassaguerpla joined this project.
  • almost 2 years ago: vliaskovitis liked this project.
  • almost 2 years ago: rtsvetkov started this project.
  • about 2 years ago: jordimassaguerpla added keyword "github_actions" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "github-ci" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "ci" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "gpu" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "azure" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "cloud" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "monai" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "medical" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "containers" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "ml" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "mlops" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "ai" to this project.
  • about 2 years ago: jordimassaguerpla added keyword "artificial-intelligence" to this project.
  • about 2 years ago: jordimassaguerpla originated this project.

  • Comments

    • jordimassaguerpla
      almost 2 years ago by jordimassaguerpla | Reply

      I was able to create a terraform file and a workflow file but then I was not able to make the build work.

      Here the terraform file:

      https://github.com/jordimassaguerpla/monai-deploy-app-sdk/blob/main/main.tf

      Here the workflow file:

      https://github.com/jordimassaguerpla/monai-deploy-app-sdk/blob/main/.github/workflows/buildandpush_models.yml

    • jordimassaguerpla
      almost 2 years ago by jordimassaguerpla | Reply

      I think the issue is that it tries to load the container, but I had not installed nvidia-docker2, and thus it can't load the container.

    • jordimassaguerpla
      almost 2 years ago by jordimassaguerpla | Reply

      Here the fix for libseccomp, so nvidia-container-toolkit can be installed: https://build.opensuse.org/request/show/1128309 Here the fix for nvidia-holoscan: https://github.com/nvidia-holoscan/holoscan-sdk/pull/14

      With these 2 fixes and by increasing the Disc in Azure to 64GB, I was able to build the ML model as a container :)

    • jordimassaguerpla
      almost 2 years ago by jordimassaguerpla | Reply

      And voilà, here the MONAI Application Package ready to be used:

      https://github.com/jordimassaguerpla/monai-deploy-app-sdk/pkgs/container/monai-deploy-app-sdk%2Fsimple_app-x64-workstation-dgpu-linux-amd64

    Similar Projects

    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server


    Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios

    Description

    Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.

    This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.

    The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.

    Goals

    By the end of Hack Week, we aim to have a single, working Python script that:

    1. Connects to Prometheus and executes a query to fetch detailed test failure history.
    2. Processes the raw data into a format suitable for the Gemini API.
    3. Successfully calls the Gemini API with the data and a clear prompt.
    4. Parses the AI's response to extract a simple list of flaky tests.
    5. Saves the list to a JSON file that can be displayed in Grafana.
    6. New panel in our Dashboard listing the Flaky tests

    Resources


    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server


    Provide personal account sign-in for Himmelblau (Azure Entra Id) by dmulder

    Description

    Himmelblau currently does not support personal account sign-in, but only sign-in for business/school accounts. Adding personal account sign-in would broaden the userbase, and potentially attract more users to Linux from Windows (since they could easily migrate their existing Windows account, etc).

    Goals

    Implement personal account sign-in for libhimmelblau.

    Resources

    gitlab.com/samba-team/libhimmelblau


    Technical talks at universities by agamez

    Description

    This project aims to empower the next generation of tech professionals by offering hands-on workshops on containerization and Kubernetes, with a strong focus on open-source technologies. By providing practical experience with these cutting-edge tools and fostering a deep understanding of open-source principles, we aim to bridge the gap between academia and industry.

    For now, the scope is limited to Spanish universities, since we already have the contacts and have started some conversations.

    Goals

    • Technical Skill Development: equip students with the fundamental knowledge and skills to build, deploy, and manage containerized applications using open-source tools like Kubernetes.
    • Open-Source Mindset: foster a passion for open-source software, encouraging students to contribute to open-source projects and collaborate with the global developer community.
    • Career Readiness: prepare students for industry-relevant roles by exposing them to real-world use cases, best practices, and open-source in companies.

    Resources

    • Instructors: experienced open-source professionals with deep knowledge of containerization and Kubernetes.
    • SUSE Expertise: leverage SUSE's expertise in open-source technologies to provide insights into industry trends and best practices.