Description

A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.

Goals

Allow PyTorch users to save and load machine learning models in OCI registries.

Resources

Looking for hackers with the skills:

ai mlops pytorch oci cloud

This project is part of:

Hack Week 24

Activity

  • about 1 year ago: horon liked this project.
  • about 1 year ago: jguilhermevanz started this project.
  • about 1 year ago: jguilhermevanz added keyword "ai" to this project.
  • about 1 year ago: jguilhermevanz added keyword "mlops" to this project.
  • about 1 year ago: jguilhermevanz added keyword "pytorch" to this project.
  • about 1 year ago: jguilhermevanz added keyword "oci" to this project.
  • about 1 year ago: jguilhermevanz added keyword "cloud" to this project.
  • about 1 year ago: jguilhermevanz originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Docs Navigator MCP: SUSE Edition by mackenzie.techdocs

    MCP Docs Navigator: SUSE Edition

    Description

    Docs Navigator MCP: SUSE Edition is an AI-powered documentation navigator that makes finding information across SUSE, Rancher, K3s, and RKE2 documentation effortless. Built as a Model Context Protocol (MCP) server, it enables semantic search, intelligent Q&A, and documentation summarization using 100% open-source AI models (no API keys required!). The project also allows you to bring your own keys from Anthropic and Open AI for parallel processing.

    Goals

    • [ X ] Build functional MCP server with documentation tools
    • [ X ] Implement semantic search with vector embeddings
    • [ X ] Create user-friendly web interface
    • [ X ] Optimize indexing performance (parallel processing)
    • [ X ] Add SUSE branding and polish UX
    • [ X ] Stretch Goal: Add more documentation sources
    • [ X ] Stretch Goal: Implement document change detection for auto-updates

    Coming Soon!

    • Community Feedback: Test with real users and gather improvement suggestions

    Resources


    GenAI-Powered Systemic Bug Evaluation and Management Assistant by rtsvetkov

    Motivation

    What is the decision critical question which one can ask on a bug? How this question affects the decision on a bug and why?

    Let's make GenAI look on the bug from the systemic point and evaluate what we don't know. Which piece of information is missing to take a decision?

    Description

    To build a tool that takes a raw bug report (including error messages and context) and uses a large language model (LLM) to generate a series of structured, Socratic-style or Systemic questions designed to guide a the integration and development toward the root cause, rather than just providing a direct, potentially incorrect fix.

    Goals

    Set up a Python environment

    Set the environment and get a Gemini API key. 2. Collect 5-10 realistic bug reports (from open-source projects, personal projects, or public forums like Stack Overflow—include the error message and the initial context).

    Build the Dialogue Loop

    1. Write a basic Python script using the Gemini API.
    2. Implement a simple conversational loop: User Input (Bug) -> AI Output (Question) -> User Input (Answer to AI's question) -> AI Output (Next Question). Code Implementation

    Socratic/Systemic Strategy Implementation

    1. Refine the logic to ensure the questions follow a Socratic and Systemic path (e.g., from symptom-> context -> assumptions -> -> critical parts -> ).
    2. Implement Function Calling (an advanced feature of the Gemini API) to suggest specific actions to the user, like "Run a ping test" or "Check the database logs."
    3. Implement Bugzillla call to collect the
    4. Implement Questioning Framework as LLVM pre-conditioning
    5. Define set of instructions
    6. Assemble the Tool

    Resources

    What are Systemic Questions?

    Systemic questions explore the relationships, patterns, and interactions within a system rather than focusing on isolated elements.
    In IT, they help uncover hidden dependencies, feedback loops, assumptions, and side-effects during debugging or architecture analysis.

    Gitlab Project

    gitlab.suse.de/sle-prjmgr/BugDecisionCritical_Question


    "what is it" file and directory analysis via MCP and local LLM, for console and KDE by rsimai

    Description

    Users sometimes wonder what files or directories they find on their local PC are good for. If they can't determine from the filename or metadata, there should an easy way to quickly analyze the content and at least guess the meaning. An LLM could help with that, through the use of a filesystem MCP and to-text-converters for typical file types. Ideally this is integrated into the desktop environment but works as well from a console. All data is processed locally or "on premise", no artifacts remain or leave the system.

    Goals

    • The user can run a command from the console, to check on a file or directory
    • The filemanager contains the "analyze" feature within the context menu
    • The local LLM could serve for other use cases where privacy matters

    TBD

    • Find or write capable one-shot and interactive MCP client
    • Find or write simple+secure file access MCP server
    • Create local LLM service with appropriate footprint, containerized
    • Shell command with options
    • KDE integration (Dolphin)
    • Package
    • Document

    Resources


    Is SUSE Trending? Popularity and Developer Sentiment Insight Using Native AI Capabilities by terezacerna

    Description

    This project aims to explore the popularity and developer sentiment around SUSE and its technologies compared to Red Hat and their technologies. Using publicly available data sources, I will analyze search trends, developer preferences, repository activity, and media presence. The final outcome will be an interactive Power BI dashboard that provides insights into how SUSE is perceived and discussed across the web and among developers.

    Goals

    1. Assess the popularity of SUSE products and brand compared to Red Hat using Google Trends.
    2. Analyze developer satisfaction and usage trends from the Stack Overflow Developer Survey.
    3. Use the GitHub API to compare SUSE and Red Hat repositories in terms of stars, forks, contributors, and issue activity.
    4. Perform sentiment analysis on GitHub issue comments to measure community tone and engagement using built-in Copilot capabilities.
    5. Perform sentiment analysis on Reddit comments related to SUSE technologies using built-in Copilot capabilities.
    6. Use Gnews.io to track and compare the volume of news articles mentioning SUSE and Red Hat technologies.
    7. Test the integration of Copilot (AI) within Power BI for enhanced data analysis and visualization.
    8. Deliver a comprehensive Power BI report summarizing findings and insights.
    9. Test the full potential of Power BI, including its AI features and native language Q&A.

    Resources

    1. Google Trends: Web scraping for search popularity data
    2. Stack Overflow Developer Survey: For technology popularity and satisfaction comparison
    3. GitHub API: For repository data (stars, forks, contributors, issues, comments).
    4. Gnews.io API: For article volume and mentions analysis.
    5. Reddit: SUSE related topics with comments.


    SUSE Observability MCP server by drutigliano

    Description

    The idea is to implement the SUSE Observability Model Context Protocol (MCP) Server as a specialized, middle-tier API designed to translate the complex, high-cardinality observability data from StackState (topology, metrics, and events) into highly structured, contextually rich, and LLM-ready snippets.

    This MCP Server abstract the StackState APIs. Its primary function is to serve as a Tool/Function Calling target for AI agents. When an AI receives an alert or a user query (e.g., "What caused the outage?"), the AI calls an MCP Server endpoint. The server then fetches the relevant operational facts, summarizes them, normalizes technical identifiers (like URNs and raw metric names) into natural language concepts, and returns a concise JSON or YAML payload. This payload is then injected directly into the LLM's prompt, ensuring the final diagnosis or action is grounded in real-time, accurate SUSE Observability data, effectively minimizing hallucinations.

    Goals

    • Grounding AI Responses: Ensure that all AI diagnoses, root cause analyses, and action recommendations are strictly based on verifiable, real-time data retrieved from the SUSE Observability StackState platform.
    • Simplifying Data Access: Abstract the complexity of StackState's native APIs (e.g., Time Travel, 4T Data Model) into simple, semantic functions that can be easily invoked by LLM tool-calling mechanisms.
    • Data Normalization: Convert complex, technical identifiers (like component URNs, raw metric names, and proprietary health states) into standardized, natural language terms that an LLM can easily reason over.
    • Enabling Automated Remediation: Define clear, action-oriented MCP endpoints (e.g., execute_runbook) that allow the AI agent to initiate automated operational workflows (e.g., restarts, scaling) after a diagnosis, closing the loop on observability.

     Hackweek STEP

    • Create a functional MCP endpoint exposing one (or more) tool(s) to answer queries like "What is the health of service X?") by fetching, normalizing, and returning live StackState data in an LLM-ready format.

     Scope

    • Implement read-only MCP server that can:
      • Connect to a live SUSE Observability instance and authenticate (with API token)
      • Use tools to fetch data for a specific component URN (e.g., current health state, metrics, possibly topology neighbors, ...).
      • Normalize response fields (e.g., URN to "Service Name," health state DEVIATING to "Unhealthy", raw metrics).
      • Return the data as a structured JSON payload compliant with the MCP specification.

    Deliverables

    • MCP Server v0.1 A running Golang MCP server with at least one tool.
    • A README.md and a test script (e.g., curl commands or a simple notebook) showing how an AI agent would call the endpoint and the resulting JSON payload.

    Outcome A functional and testable API endpoint that proves the core concept: translating complex StackState data into a simple, LLM-ready format. This provides the foundation for developing AI-driven diagnostics and automated remediation.

    Resources

    • https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine
    • https://www.datadoghq.com/blog/datadog-remote-mcp-server
    • https://modelcontextprotocol.io/specification/2025-06-18/index
    • https://modelcontextprotocol.io/docs/develop/build-server

     Basic implementation

    • https://github.com/drutigliano19/suse-observability-mcp-server

    Results

    Successfully developed and delivered a fully functional SUSE Observability MCP Server that bridges language models with SUSE Observability's operational data. This project demonstrates how AI agents can perform intelligent troubleshooting and root cause analysis using structured access to real-time infrastructure data.

    Example execution


    Exploring Modern AI Trends and Kubernetes-Based AI Infrastructure by jluo

    Description

    Build a solid understanding of the current landscape of Artificial Intelligence and how modern cloud-native technologies—especially Kubernetes—support AI workloads.

    Goals

    Use Gemini Learning Mode to guide the exploration, surface relevant concepts, and structure the learning journey:

    • Gain insight into the latest AI trends, tools, and architectural concepts.
    • Understand how Kubernetes and related cloud-native technologies are used in the AI ecosystem (model training, deployment, orchestration, MLOps).

    Resources

    • Red Hat AI Topic Articles

      • https://www.redhat.com/en/topics/ai
    • Kubeflow Documentation

      • https://www.kubeflow.org/docs/
    • Q4 2025 CNCF Technology Landscape Radar report:

      • https://www.cncf.io/announcements/2025/11/11/cncf-and-slashdata-report-finds-leading-ai-tools-gaining-adoption-in-cloud-native-ecosystems/
      • https://www.cncf.io/wp-content/uploads/2025/11/cncfreporttechradar_111025a.pdf
    • Agent-to-Agent (A2A) Protocol

      • https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/


    Kubernetes-Based ML Lifecycle Automation by lmiranda

    Description

    This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.

    The pipeline will automate the lifecycle of a machine learning model, including:

    • Data ingestion/collection
    • Model training as a Kubernetes Job
    • Model artifact storage in an S3-compatible registry (e.g. Minio)
    • A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
    • A lightweight inference service that loads and serves the latest model
    • Monitoring of model performance and service health through Prometheus/Grafana

    The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.

    Goals

    By the end of Hack Week, the project should:

    1. Produce a fully functional ML pipeline running on Kubernetes with:

      • Data collection job
      • Training job container
      • Storage and versioning of trained models
      • Automated deployment of new model versions
      • Model inference API service
      • Basic monitoring dashboards
    2. Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.

    3. Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).

    4. Prepare a short demo explaining the end-to-end process and how new models flow through the system.

    Resources

    Project Repository

    Updates

    1. Training pipeline and datasets
    2. Inference Service py


    Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti

    Description

    The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.

    The initial architecture can be checked out on the Repository listed under Resources.

    Among the features that will differ this project from other monitoring/notification systems:

    • Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
    • All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
    • Easy account management, being able to parse all required configuration by a single JSON file
    • Eliminate integrations by not requiring metrics to go through a data-agreggator

    Goals

    • Create a minimal working prototype following the workflow specified on the documentation
    • Provide instructions on installation/usage
    • Work on email notifying capabilities

    Resources