Project Description

So you have an idea for a machine learning project for HackWeek. Have you thought about what tools you'll be using? Choosing the right set of machine learning tools and making them work together can be time consuming, not to mention the unavoidable learning curve. Perhaps you could use some help with that.

The SUSE AI/ML team has the answer: FuseML - an open source machine learning DevOps orchestrator that can get your machine learning projects up and running as easy as lighting a fuse.

FuseML started as a spin off project Carrier. Think "Carrier for Machine Learning": you write your ML application using one of the popular machine learning libraries (e.g. scikit-learn, TensorFlow, PyTorch, XGBoost) and FuseML takes care of all operations necessary to get your machine learning models in action, so you can concentrate on your code.

FuseML workflow

The catch: FuseML is still in a pre-alpha state, although it can already be used to showcase basic features. While using it, you may run into some corner cases we haven't covered yet, but you'll not be alone: we're here to help.

The rewards: access to expert knowledge in AI/ML and a chance to have your ML project published into the FuseML gallery of sample applications.

What you'll need: to install and use FuseML, you'll need a kubernetes cluster. If you don't already have one handy, or if you're low on hardware resources, you can install minikube, kind or k3s on your machine.

Goal for this Hackweek

  • discover new use cases and AI/ML tools to be enabled for FuseML
  • offer assistance and guidelines on AI/ML best practices and tools in the context of FuseML
  • pimp up FuseML's gallery of sample applications

Resources

This project is part of:

Hack Week 20

Activity

  • almost 4 years ago: acho liked this project.
  • almost 4 years ago: ories liked this project.
  • almost 4 years ago: afesta liked this project.
  • almost 4 years ago: jsuchome joined this project.
  • almost 4 years ago: flaviosr liked this project.
  • almost 4 years ago: flaviosr joined this project.
  • almost 4 years ago: stefannica started this project.
  • almost 4 years ago: stefannica added keyword "#fuseml" to this project.
  • almost 4 years ago: stefannica added keyword "#ai" to this project.
  • almost 4 years ago: stefannica added keyword "#machinelearning" to this project.
  • almost 4 years ago: stefannica added keyword "#kubernetes" to this project.
  • almost 4 years ago: stefannica added keyword "#artificial-intelligence" to this project.
  • almost 4 years ago: stefannica added keyword "#mlops" to this project.
  • almost 4 years ago: stefannica added keyword "#mlflow" to this project.
  • almost 4 years ago: stefannica added keyword "#sklearn" to this project.
  • almost 4 years ago: stefannica added keyword "#pytorch" to this project.
  • almost 4 years ago: stefannica added keyword "#ternsorflow" to this project.
  • almost 4 years ago: stefannica originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    AI for product management by a_jaeger

    Description

    Learn about AI and how it can help myself

    What are the jobs that a PM does where AI can help - and how?

    Goals

    • Investigate how AI can help with different tasks
    • Check out different AI tools, which one is best for which job
    • Summarize learning

    Resources

    • Reading some blog posts by PMs that looked into it
    • Popular and less popular AI tools

    Work is done SUSE internally at https://confluence.suse.com/display/~a_jaeger/Hackweek+25+-+AI+for+a+PM and subpages.


    COOTWbot by ngetahun

    Project Description

    At SCC, we have a rotating task of COOTW (Commanding Office of the Week). This task involves responding to customer requests from jira and slack help channels, monitoring production systems and doing small chores. Usually, we have documentation to help the COOTW answer questions and quickly find fixes. Most of these are distributed across github, trello and SUSE Support documentation. The aim of this project is to explore the magic of LLMs and create a conversational bot.

    Goal for this Hackweek

    • Build data ingestion Data source:
      • SUSE KB docs
      • scc github docs
      • scc trello knowledge board
    • Test out new RAG architecture

    • https://gitlab.suse.de/ngetahun/cootwbot


    Run local LLMs with Ollama and explore possible integrations with Uyuni by PSuarezHernandez

    Description

    Using Ollama you can easily run different LLM models in your local computer. This project is about exploring Ollama, testing different LLMs and try to fine tune them. Also, explore potential ways of integration with Uyuni.

    Goals

    • Explore Ollama
    • Test different models
    • Fine tuning
    • Explore possible integration in Uyuni

    Resources

    • https://ollama.com/
    • https://huggingface.co/
    • https://apeatling.com/articles/part-2-building-your-training-data-for-fine-tuning/


    Save pytorch models in OCI registries by jguilhermevanz

    Description

    A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.

    Goals

    Allow PyTorch users to save and load machine learning models in OCI registries.

    Resources


    Gen-AI chatbots and test-automation of generated responses by mdati

    Description

    Start experimenting the generative SUSE-AI chat bot, asking questions on different areas of knowledge or science and possibly analyze the quality of the LLM model response, specific and comparative, checking the answers provided by different LLM models to a same query, using proper quality metrics or tools or methodologies.

    Try to define basic guidelines and requirements for quality test automation of AI-generated responses.

    First approach of investigation can be based on manual testing: methodologies, findings and data can be useful then to organize valid automated testing.

    Goals

    • Identify criteria and measuring scales for assessment of a text content.
    • Define quality of an answer/text based on defined criteria .
    • Identify some knowledge sectors and a proper list of problems/questions per sector.
    • Manually run query session and apply evaluation criteria to answers.
    • Draft requirements for test automation of AI answers.

    Resources

    • Announcement of SUSE-AI for Hack Week in Slack
    • Openplatform and related 3 LLM models gemma:2b, llama3.1:8b, qwen2.5-coder:3b.

    Notes

    • Foundation models (FMs):
      are large deep learning neural networks, trained on massive datasets, that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively.

    • Large language models (LLMs):
      are a category of foundation models pre-trained on immense amounts of data acquiring abilities by learning statistical relationships from vast amounts of text during a self- and semi-supervised training process, making them capable of understanding and generating natural language and other types of content , to perform a wide range of tasks.
      LLMs can be used for generative AI (artificial intelligence) to produce content based on input prompts in human language.

    Validation of a AI-generated answer is not an easy task to perform, as manually as automated.
    An LLM answer text shall contain a given level of informations: correcness, completeness, reasoning description etc.
    We shall rely in properly applicable and measurable criteria of validation to get an assessment in a limited amount of time and resources.


    FamilyTrip Planner: A Personalized Travel Planning Platform for Families by pherranz

    Description

    FamilyTrip Planner is an innovative travel planning application designed to optimize travel experiences for families with children. By integrating APIs for flights, accommodations, and local activities, the app generates complete itineraries tailored to each family’s unique interests and needs. Recommendations are based on customizable parameters such as destination, trip duration, children’s ages, and personal preferences. FamilyTrip Planner not only simplifies the travel planning process but also offers a comprehensive, personalized experience for families.

    Goals

    This project aims to: - Create a user-friendly platform that assists families in planning complete trips, from flight and accommodation options to recommended family-friendly activities. - Provide intelligent, personalized travel itineraries using artificial intelligence to enhance travel enjoyment and minimize time and cost. - Serve as an educational project for exploring Go programming and artificial intelligence, with the goal of building proficiency in both.

    Resources

    To develop FamilyTrip Planner, the project will leverage: - APIs such as Skyscanner, Google Places, and TripAdvisor to source real-time information on flights, accommodations, and activities. - Go programming language to manage data integration, API connections, and backend development. - Basic machine learning libraries to implement AI-driven itinerary suggestions tailored to family needs and preferences.


    Metrics Server viewer for Kubernetes by bkampen

    This project is finished please visit the github repo below for the tool.

    Description

    Build a CLI tools which can visualize Kubernetes metrics from the metrics-server, so you're able to watch these without installing Prometheus and Grafana on a cluster.

    Goals

    • Learn more about metrics-server
    • Learn more about the inner workings of Kubernetes.
    • Learn more about Go

    Resources

    https://github.com/bvankampen/metrics-viewer


    Small healthcheck tool for Longhorn by mbrookhuis

    Project Description

    We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.

    As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.

    This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.

    Goal for this Hackweek

    At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.

    Overview

    This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:

    • node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems

    • deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.

    • pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.

    If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.

    The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.

    The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.

    Installation

    To install this project, perform the following steps:

    • Create the directory /opt/k8s-check

    mkdir /opt/k8s-check

    • Copy all the file to this directory and make the following changes:

    chmod +x k8s-check.py


    kubectl clone: Seamlessly Clone Kubernetes Resources Across Multiple Rancher Clusters and Projects by dpunia

    Description

    kubectl clone is a kubectl plugin that empowers users to clone Kubernetes resources across multiple clusters and projects managed by Rancher. It simplifies the process of duplicating resources from one cluster to another or within different namespaces and projects, with optional on-the-fly modifications. This tool enhances multi-cluster resource management, making it invaluable for environments where Rancher orchestrates numerous Kubernetes clusters.

    Goals

    1. Seamless Multi-Cluster Cloning
      • Clone Kubernetes resources across clusters/projects with one command.
      • Simplifies management, reduces operational effort.

    Resources

    1. Rancher & Kubernetes Docs

      • Rancher API, Cluster Management, Kubernetes client libraries.
    2. Development Tools

      • Kubectl plugin docs, Go programming resources.

    Building and Installing the Plugin

    1. Set Environment Variables: Export the Rancher URL and API token:
    • export RANCHER_URL="https://rancher.example.com"
    • export RANCHER_TOKEN="token-xxxxx:xxxxxxxxxxxxxxxxxxxx"
    1. Build the Plugin: Compile the Go program:
    • go build -o kubectl-clone ./pkg/
    1. Install the Plugin: Move the executable to a directory in your PATH:
    • mv kubectl-clone /usr/local/bin/

    Ensure the file is executable:

    • chmod +x /usr/local/bin/kubectl-clone
    1. Verify the Plugin Installation: Test the plugin by running:
    • kubectl clone --help

    You should see the usage information for the kubectl-clone plugin.

    Usage Examples

    1. Clone a Deployment from One Cluster to Another:
    • kubectl clone --source-cluster c-abc123 --type deployment --name nginx-deployment --target-cluster c-def456 --new-name nginx-deployment-clone
    1. Clone a Service into Another Namespace and Modify Labels:


    Setup Kanidm as OIDC provider on Kubernetes by jkuzilek

    Description

    I am planning to upgrade my homelab Kubernetes cluster to the next level and need an OIDC provider for my services, including K8s itself.

    Goals

    • Successfully configure and deploy Kanidm on homelab cluster
    • Integrate with K8s auth
    • Integrate with other services (Envoy Gateway, Container Registry, future deployment of Forgejo?)

    Resources


    Integrate Backstage with Rancher Manager by nwmacd

    Description

    Backstage (backstage.io) is an open-source, CNCF project that allows you to create your own developer portal. There are many plugins for Backstage.

    This could be a great compliment to Rancher Manager.

    Goals

    Learn and experiment with Backstage and look at how this could be integrated with Rancher Manager. Goal is to have some kind of integration completed in this Hack week.

    Progress

    Screen shot of home page at the end of Hackweek:

    Home

    Day One

    • Got Backstage running locally, understanding configuration with HTTPs.
    • Got Backstage embedded in an IFRAME inside of Rancher
    • Added content into the software catalog (see: https://backstage.io/docs/features/techdocs/getting-started/)
    • Understood more about the entity model

    Day Two

    • Connected Backstage to the Rancher local cluster and configured the Kubernetes plugin.
    • Created Rancher theme to make the light theme more consistent with Rancher

    Home

    Days Three and Day Four

    • Created two backend plugins for Backstage:

      1. Catalog Entity Provider - this imports users from Rancher into Backstage
      2. Auth Provider - uses the proxied sign-in pattern to check the Rancher session cookie, to user that to authenticate the user with Rancher and then log them into Backstage by connecting this to the imported User entity from the catalog entity provider plugin.
    • With this in place, you can single-sign-on between Rancher and Backstage when it is deployed within Rancher. Note this is only when running locally for development at present

    Home

    Home

    Day Five

    • Start to build out a production deployment for all of the above
    • Made some progress, but hit issues with the authentication and proxying when running proxied within Rancher, which needs further investigation


    Save pytorch models in OCI registries by jguilhermevanz

    Description

    A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.

    Goals

    Allow PyTorch users to save and load machine learning models in OCI registries.

    Resources


    Save pytorch models in OCI registries by jguilhermevanz

    Description

    A prerequisite for running applications in a cloud environment is the presence of a container registry. Another common scenario is users performing machine learning workloads in such environments. However, these types of workloads require dedicated infrastructure to run properly. We can leverage these two facts to help users save resources by storing their machine learning models in OCI registries, similar to how we handle some WebAssembly modules. This approach will save users the resources typically required for a machine learning model repository for the applications they need to run.

    Goals

    Allow PyTorch users to save and load machine learning models in OCI registries.

    Resources


    Make more sense of openQA test results using AI by livdywan

    Description

    AI has the potential to help with something many of us spend a lot of time doing which is making sense of openQA logs when a job fails.

    User Story

    Allison Average has a puzzled look on their face while staring at log files that seem to make little sense. Is this a known issue, something completely new or maybe related to infrastructure changes?

    Goals

    • Leverage a chat interface to help Allison
    • Create a model from scratch based on data from openQA
    • Proof of concept for automated analysis of openQA test results

    Bonus

    • Use AI to suggest solutions to merge conflicts
      • This would need a merge conflict editor that can suggest solving the conflict
    • Use image recognition for needles

    Resources

    Timeline

    Day 1

    • Conversing with open-webui to teach me how to create a model based on openQA test results

    Day 2

    Highlights

    • I briefly tested compared models to see if they would make me more productive. Between llama, gemma and mistral there was no amazing difference in the results for my case.
    • Convincing the chat interface to produce code specific to my use case required very explicit instructions.
    • Asking for advice on how to use open-webui itself better was frustratingly unfruitful both in trivial and more advanced regards.
    • Documentation on source materials used by LLM's and tools for this purpose seems virtually non-existent - specifically if a logo can be generated based on particular licenses

    Outcomes

    • Chat interface-supported development is providing good starting points and open-webui being open source is more flexible than Gemini. Although currently some fancy features such as grounding and generated podcasts are missing.
    • Allison still has to be very experienced with openQA to use a chat interface for test review. Publicly available system prompts would make that easier, though.