Description

I'm implementing a split-horizon DNS for my home Kubernetes cluster to be able to access my internal (and external) services over the local network through public domains. I managed to make a PoC with the k8s_gateway plugin for CoreDNS. However, I soon found out it responds with IPs for all Gateways assigned to HTTPRoutes, publishing public IPs as well as the internal Loadbalancer ones.

To remedy this issue, a simple filtering mechanism has to be implemented.

Goals

  • Learn an acceptable amount of Golang
  • Implement GatewayClass (and IngressClass) filtering for k8s_gateway
  • Deploy on homelab cluster
  • Profit?

Resources

EDIT: Feature mostly complete. An unfinished PR lies here. Successfully tested working on homelab cluster.

Looking for hackers with the skills:

kubernetes golang dns

This project is part of:

Hack Week 24

Activity

  • about 1 year ago: jmeza liked this project.
  • about 1 year ago: paulgonin liked this project.
  • about 1 year ago: pdostal liked this project.
  • about 1 year ago: fgiudici liked this project.
  • about 1 year ago: parag.jain joined this project.
  • about 1 year ago: jkuzilek added keyword "dns" to this project.
  • about 1 year ago: jkuzilek added keyword "golang" to this project.
  • about 1 year ago: jkuzilek added keyword "kubernetes" to this project.
  • about 1 year ago: jkuzilek started this project.
  • about 1 year ago: jkuzilek originated this project.

  • Comments

    • parag.jain
      about 1 year ago by parag.jain | Reply

      I am interested!! Can I join ?

      • jkuzilek
        about 1 year ago by jkuzilek | Reply

        Sorry, unfortunately, I already managed to finish most of the task and created a PR. Didn't take that much work as I thought.

    Similar Projects

    Self-Scaling LLM Infrastructure Powered by Rancher by ademicev0

    Self-Scaling LLM Infrastructure Powered by Rancher

    logo


    Description

    The Problem

    Running LLMs can get expensive and complex pretty quickly.

    Today there are typically two choices:

    1. Use cloud APIs like OpenAI or Anthropic. Easy to start with, but costs add up at scale.
    2. Self-host everything - set up Kubernetes, figure out GPU scheduling, handle scaling, manage model serving... it's a lot of work.

    What if there was a middle ground?

    What if infrastructure scaled itself instead of making you scale it?

    Can we use existing Rancher capabilities like CAPI, autoscaling, and GitOps to make this simpler instead of building everything from scratch?

    Project Repository: github.com/alexander-demicev/llmserverless


    What This Project Does

    A key feature is hybrid deployment: requests can be routed based on complexity or privacy needs. Simple or low-sensitivity queries can use public APIs (like OpenAI), while complex or private requests are handled in-house on local infrastructure. This flexibility allows balancing cost, privacy, and performance - using cloud for routine tasks and on-premises resources for sensitive or demanding workloads.

    A complete, self-scaling LLM infrastructure that:

    • Scales to zero when idle (no idle costs)
    • Scales up automatically when requests come in
    • Adds more nodes when needed, removes them when demand drops
    • Runs on any infrastructure - laptop, bare metal, or cloud

    Think of it as "serverless for LLMs" - focus on building, the infrastructure handles itself.

    How It Works

    A combination of open source tools working together:

    Flow:

    • Users interact with OpenWebUI (chat interface)
    • Requests go to LiteLLM Gateway
    • LiteLLM routes requests to:
      • Ollama (Knative) for local model inference (auto-scales pods)
      • Or cloud APIs for fallback


    Cluster API Provider for Harvester by rcase

    Project Description

    The Cluster API "infrastructure provider" for Harvester, also named CAPHV, makes it possible to use Harvester with Cluster API. This enables people and organisations to create Kubernetes clusters running on VMs created by Harvester using a declarative spec.

    The project has been bootstrapped in HackWeek 23, and its code is available here.

    Work done in HackWeek 2023

    • Have a early working version of the provider available on Rancher Sandbox : *DONE *
    • Demonstrated the created cluster can be imported using Rancher Turtles: DONE
    • Stretch goal - demonstrate using the new provider with CAPRKE2: DONE and the templates are available on the repo

    DONE in HackWeek 24:

    DONE in 2025 (out of Hackweek)

    • Support of ClusterClass
    • Add to clusterctl community providers, you can add it directly with clusterctl
    • Testing on newer versions of Harvester v1.4.X and v1.5.X
    • Support for clusterctl generate cluster ...
    • Improve Status Conditions to reflect current state of Infrastructure
    • Improve CI (some bugs for release creation)

    Goals for HackWeek 2025

    • FIRST and FOREMOST, any topic is important to you
    • Add e2e testing
    • Certify the provider for Rancher Turtles
    • Add Machine pool labeling
    • Add PCI-e passthrough capabilities.
    • Other improvement suggestions are welcome!

    Thanks to @isim and Dominic Giebert for their contributions!

    Resources

    Looking for help from anyone interested in Cluster API (CAPI) or who wants to learn more about Harvester.

    This will be an infrastructure provider for Cluster API. Some background reading for the CAPI aspect:


    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The CONCLUSION!!!

    A add-emoji State of the Union add-emoji document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below! add-emoji


    OpenPlatform Self-Service Portal by tmuntan1

    Description

    In SUSE IT, we developed an internal developer platform for our engineers using SUSE technologies such as RKE2, SUSE Virtualization, and Rancher. While it works well for our existing users, the onboarding process could be better.

    To improve our customer experience, I would like to build a self-service portal to make it easy for people to accomplish common actions. To get started, I would have the portal create Jira SD tickets for our customers to have better information in our tickets, but eventually I want to add automation to reduce our workload.

    Goals

    • Build a frontend website (Angular) that helps customers create Jira SD tickets.
    • Build a backend (Rust with Axum) for the backend, which would do all the hard work for the frontend.

    Resources (SUSE VPN only)

    • development site: https://ui-dev.openplatform.suse.com/login?returnUrl=%2Fopenplatform%2Fforms
    • https://gitlab.suse.de/itpe/core/open-platform/op-portal/backend
    • https://gitlab.suse.de/itpe/core/open-platform/op-portal/frontend


    Kubernetes-Based ML Lifecycle Automation by lmiranda

    Description

    This project aims to build a complete end-to-end Machine Learning pipeline running entirely on Kubernetes, using Go, and containerized ML components.

    The pipeline will automate the lifecycle of a machine learning model, including:

    • Data ingestion/collection
    • Model training as a Kubernetes Job
    • Model artifact storage in an S3-compatible registry (e.g. Minio)
    • A Go-based deployment controller that automatically deploys new model versions to Kubernetes using Rancher
    • A lightweight inference service that loads and serves the latest model
    • Monitoring of model performance and service health through Prometheus/Grafana

    The outcome is a working prototype of an MLOps workflow that demonstrates how AI workloads can be trained, versioned, deployed, and monitored using the Kubernetes ecosystem.

    Goals

    By the end of Hack Week, the project should:

    1. Produce a fully functional ML pipeline running on Kubernetes with:

      • Data collection job
      • Training job container
      • Storage and versioning of trained models
      • Automated deployment of new model versions
      • Model inference API service
      • Basic monitoring dashboards
    2. Showcase a Go-based deployment automation component, which scans the model registry and automatically generates & applies Kubernetes manifests for new model versions.

    3. Enable continuous improvement by making the system modular and extensible (e.g., additional models, metrics, autoscaling, or drift detection can be added later).

    4. Prepare a short demo explaining the end-to-end process and how new models flow through the system.

    Resources

    Project Repository

    Updates

    1. Training pipeline and datasets
    2. Inference Service py


    Q2Boot - A handy QEMU VM launcher by amanzini

    Description

    Q2Boot (Qemu Quick Boot) is a command-line tool that wraps QEMU to provide a streamlined experience for launching virtual machines. It automatically configures common settings like KVM acceleration, virtio drivers, and networking while allowing customization through both configuration files and command-line options.

    The project originally was a personal utility in D, now recently rewritten in idiomatic Go. It lives at repository https://github.com/ilmanzo/q2boot

    Goals

    Improve the project, testing with different scenarios , address issues and propose new features. It will benefit of some basic integration testing by providing small sample disk images.

    Updates

    • Dec 1, 2025 : refactor command line options, added structured logging. Released v0.0.2
    • Dec 2, 2025 : added external monitor via telnet option
    • Dec 4, 2025 : released v0.0.3 with architecture auto-detection
    • Dec 5, 2025 : filing new issues and general polishment. Designing E2E testing

    Resources


    Create a go module to wrap happy-compta.fr by cbosdonnat

    Description

    https://happy-compta.fr is a tool for french work councils simple book keeping. While it does the job, it has no API to work with and it is tedious to enter loads of operations.

    Goals

    Write a go client module to be used as an API to programmatically manipulate the tool.

    Writing an example tool to load data from a CSV file would be good too.


    SUSE Health Check Tools by roseswe

    SUSE HC Tools Overview

    A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.

    Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.

    Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.

    Goals

    • Overall improvement of the tools
    • Adding new collectors
    • Add support for SLES16

    Resources

    csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go

    docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*

    $ getrpm -r pacemaker >> Product ID: 2795 (SUSE Linux Enterprise Server for SAP Applications 15 SP7 x86_64), RPM Name: +--------------+----------------------------+--------+--------------+--------------------+ | Package Name | Version | Arch | Release | Repository | +--------------+----------------------------+--------+--------------+--------------------+ | pacemaker | 2.1.10+20250718.fdf796ebc8 | x86_64 | 150700.3.3.1 | sle-ha/15.7/x86_64 | | pacemaker | 2.1.9+20250410.471584e6a2 | x86_64 | 150700.1.9 | sle-ha/15.7/x86_64 | +--------------+----------------------------+--------+--------------+--------------------+ Total packages found: 2


    Play with the userfaultfd(2) system call and download on demand using HTTP Range Requests with Golang by rbranco

    Description

    The userfaultfd(2) is a cool system call to handle page faults in user-space. This should allow me to list the contents of an ISO or similar archive without downloading the whole thing. The userfaultfd(2) part can also be done in theory with the PROT_NONE mprotect + SIGSEGV trick, for complete Unix portability, though reportedly being slower.

    Goals

    1. Create my own library for userfaultfd(2) in Golang.
    2. Create my own library for HTTP Range Requests.
    3. Complete portability with Unix.
    4. Benchmarks.
    5. Contribute some tests to LTP.

    Resources

    1. https://docs.kernel.org/admin-guide/mm/userfaultfd.html
    2. https://www.cons.org/cracauer/cracauer-userfaultfd.html


    Create a Cloud-Native policy engine with notifying capabilities to optimize resource usage by gbazzotti

    Description

    The goal of this project is to begin the initial phase of development of an all-in-one Cloud-Native Policy Engine that notifies resource owners when their resources infringe predetermined policies. This was inspired by a current issue in the CES-SRE Team where other solutions seemed to not exactly correspond to the needs of the specific workloads running on the Public Cloud Team space.

    The initial architecture can be checked out on the Repository listed under Resources.

    Among the features that will differ this project from other monitoring/notification systems:

    • Pre-defined sensible policies written at the software-level, avoiding a learning curve by requiring users to write their own policies
    • All-in-one functionality: logging, mailing and all other actions are not required to install any additional plugins/packages
    • Easy account management, being able to parse all required configuration by a single JSON file
    • Eliminate integrations by not requiring metrics to go through a data-agreggator

    Goals

    • Create a minimal working prototype following the workflow specified on the documentation
    • Provide instructions on installation/usage
    • Work on email notifying capabilities

    Resources


    DNS management with DNSControl by itorres

    Description

    We use several systems to manage DNS at SUSE and openSUSE: BIND, external providers, PowerDNS... each of them is managed in a different way either with raw zones (BIND) or Terraform (external providers).

    DNSControl is an opinionated tool to manage DNS as code while being provider agnostic. It's developed and used by StackExchange, was spearheaded by Tom Limoncelly and is already being used to manage DNS for openSUSE.

    Implementing DNSControl should allow us to have a single DNS operations interface that end users can leverage.

    This would reduce complexity for end users as they can use a single simplified ECMAScript based DSL instead of BIND zones for internal and HCL config for external.

    Operations for our IT organization would be greatly reduced. DNSControl itself has several internal checks that reduce our need to do linting and we can concentrate on implementing logical checks based on ownership.

    This simplifies reviews a lot and the integration with BIND and providers allows our IT organization to implement an apply on merge.

    At an organizational level it will separate our DNS tasks from other IT operations, speeding up DNS changes and allowing us to delegate DNS reviews to service desk or even customer teams through CODEOWNERS.

    Goals

    • Create a test subdomain in one of our internal BIND servers to be managed with DNSControl.
    • Create an internal DNSControl repository to implement gitops for DNS.
    • Deploy DNS changes strictly through gitops.

    Extended goals

    • Implement CODEOWNERS.
    • Replicate main goals for external DNS.

    Resources