Description

I want to setup a Radeon RX 9600 XT 16 GB at home with ROCm on Slowroll.

Goals

I want to test how fast AI inference can get with the GPU and if I can use LoRA to re-train an existing free model for some task.

Resources

  • https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
  • https://build.opensuse.org/project/show/science:GPU:ROCm
  • https://src.opensuse.org/ROCm/
  • https://www.suse.com/c/lora-fine-tuning-llms-for-text-classification/

Results

got inference working with llama.cpp:

export LLAMACPP_ROCM_ARCH=gfx1200
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
-Dhipblas_DIR=/usr/lib64/cmake/hipblaslt/ \
&& cmake --build build --config Release -j8
m=models/gpt-oss-20b-mxfp4.gguf
cd $P/llama.cpp && build/bin/llama-server --model $m --threads 8 --port 8005 --host 0.0.0.0 --device ROCm0 --n-gpu-layers 999

Without the --device option it faulted. Maybe because my APU also appears there?

I updated/fixed various related packages: https://src.opensuse.org/ROCm/rocm-examples/pulls/1 https://src.opensuse.org/ROCm/hipblaslt/pulls/1 SR 1320959

benchmark

I benchmarked inference with llama.cpp + gpt-oss-20b-mxfp4.gguf and ROCm offloading to a Radeon RX 9060 XT 16GB. I varied the number of layers that went to the GPU:

  • 0 layers 14.49 tokens/s (8 CPU cores)
  • 9 layers 17.79 tokens/s 34% VRAM
  • 15 layers 22.39 tokens/s 51% VRAM
  • 20 layers 27.49 tokens/s 64% VRAM
  • 24 layers 41.18 tokens/s 74% VRAM
  • 25+ layers 86.63 tokens/s 75% VRAM (only 200% CPU load)

So there is a significant performance-boost if the whole model fits into the GPU's VRAM.

training data

I collected training data from my bugzilla archive: https://www.zq1.de/~bernhard/linux/opensuse/bugzilla/bugzillapkgdata.json.xz This contains one JSON per line for summary, description(body) and packages(pkgs) that got fixes. The extraction code lives in https://github.com/bmwiedemann/bugzillai

Looking for hackers with the skills:

ai training rocm

This project is part of:

Hack Week 25

Activity

  • 18 days ago: pgonin liked this project.
  • 18 days ago: mkoutny liked this project.
  • 19 days ago: bmwiedemann started this project.
  • 19 days ago: bmwiedemann liked this project.
  • 20 days ago: bmwiedemann added keyword "rocm" to this project.
  • 20 days ago: bmwiedemann added keyword "ai" to this project.
  • 20 days ago: bmwiedemann added keyword "training" to this project.
  • 20 days ago: bmwiedemann originated this project.

  • Comments

    • bmwiedemann
      7 days ago by bmwiedemann | Reply

      collected training data: https://www.zq1.de/~bernhard/linux/opensuse/bugzilla/

    Similar Projects

    Enable more features in mcp-server-uyuni by j_renner

    Description

    I would like to contribute to mcp-server-uyuni, the MCP server for Uyuni / Multi-Linux Manager) exposing additional features as tools. There is lots of relevant features to be found throughout the API, for example:

    • System operations and infos
    • System groups
    • Maintenance windows
    • Ansible
    • Reporting
    • ...

    At the end of the week I managed to enable basic system group operations:

    • List all system groups visible to the user
    • Create new system groups
    • List systems assigned to a group
    • Add and remove systems from groups

    Goals

    • Set up test environment locally with the MCP server and client + a recent MLM server [DONE]
    • Identify features and use cases offering a benefit with limited effort required for enablement [DONE]
    • Create a PR to the repo [DONE]

    Resources


    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    Extended private brain - RAG my own scripts and data into offline LLM AI by tjyrinki_suse

    Description

    For purely studying purposes, I'd like to find out if I could teach an LLM some of my own accumulated knowledge, to use it as a sort of extended brain.

    I might use qwen3-coder or something similar as a starting point.

    Everything would be done 100% offline without network available to the container, since I prefer to see when network is needed, and make it so it's never needed (other than initial downloads).

    Goals

    1. Learn something about RAG, LLM, AI.
    2. Find out if everything works offline as intended.
    3. As an end result have a new way to access my own existing know-how, but so that I can query the wisdom in them.
    4. Be flexible to pivot in any direction, as long as there are new things learned.

    Resources

    To be found on the fly.

    Timeline

    Day 1 (of 4)

    • Tried out a RAG demo, expanded on feeding it my own data
    • Experimented with qwen3-coder to add a persistent chat functionality, and keeping vectors in a pickle file
    • Optimizations to keep everything within context window
    • Learn and add a bit of PyTest

    Day 2

    • More experimenting and more data
    • Study ChromaDB
    • Add a Web UI that works from another computer even though the container sees network is down

    Day 3

    • The above RAG is working well enough for demonstration purposes.
    • Pivot to trying out OpenCode, configuring local Ollama qwen3-coder there, to analyze the RAG demo.
    • Figured out how to configure Ollama template to be usable under OpenCode. OpenCode locally is super slow to just running qwen3-coder alone.

    Day 4 (final day)

    • Battle with OpenCode that was both slow and kept on piling up broken things.
    • Call it success as after all the agentic AI was working locally.
    • Clean up the mess left behind a bit.

    Blog Post

    Summarized the findings at blog post.


    Song Search with CLAP by gcolangiuli

    Description

    Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

    SUSE Hackweek AI Song Search

    Goals

    Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

    • Music Tagging;
    • Free text search;
    • Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

    The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

    Result

    In this MVP we implemented:

    • Async Song Analysis with Clap model
    • Free Text Search of the songs
    • Similar song search based on vector representation
    • Containerised version with web interface

    We also documented what went well and what can be improved in the use of AI.

    You can have a look at the result here:

    Future implementation can be related to performance improvement and stability of the analysis.

    References


    MCP Trace Suite by r1chard-lyu

    Description

    This project plans to create an MCP Trace Suite, a system that consolidates commonly used Linux debugging tools such as bpftrace, perf, and ftrace.

    The suite is implemented as an MCP Server. This architecture allows an AI agent to leverage the server to diagnose Linux issues and perform targeted system debugging by remotely executing and retrieving tracing data from these powerful tools.

    • Repo: https://github.com/r1chard-lyu/systracesuite
    • Demo: Slides

    Goals

    1. Build an MCP Server that can integrate various Linux debugging and tracing tools, including bpftrace, perf, ftrace, strace, and others, with support for future expansion of additional tools.

    2. Perform testing by intentionally creating bugs or issues that impact system performance, allowing an AI agent to analyze the root cause and identify the underlying problem.

    Resources

    • Gemini CLI: https://geminicli.com/
    • eBPF: https://ebpf.io/
    • bpftrace: https://github.com/bpftrace/bpftrace/
    • perf: https://perfwiki.github.io/main/
    • ftrace: https://github.com/r1chard-lyu/tracium/