How it is
Currently the telegraf webhooks plugin for github produces rather dumb measurements. Just extracting data from the webhook and putting it into the TSDB. The interesting data for engineering metrics you have to calculate yourself then.
How it should be
Given a set of input variables (list of accounts and labels) the plugin should calculate
- cycle time (the time between certain PRs open and deploy)
- change failure rate (the amount of opened PRs divided by the PRs with label)
- change quality (the amount of issues with priority labels)
How it would be nice
While doing this, think a more about the data models for other use-cases.
This project is part of:
Hack Week 20
Activity
Comments
Similar Projects
Mammuthus - The NFS-Ganesha inside Kubernetes controller by vcheng
Description
As the user-space NFS provider, the NFS-Ganesha is wieldy use with serval projects. e.g. Longhorn/Rook. We want to create the Kubernetes Controller to make configuring NFS-Ganesha easy. This controller will let users configure NFS-Ganesha through different backends like VFS/CephFS.
Goals
- Create NFS-Ganesha Package on OBS: nfs-ganesha5, nfs-ganesha6
- Create NFS-Ganesha Container Image on OBS: Image
- Create a Kubernetes controller for NFS-Ganesha and support the VFS configuration on demand. Mammuthus
Resources
terraform-provider-feilong by e_bischoff
Project Description
People need to test operating systems and applications on s390 platform.
Installation from scratch solutions include:
- just deploy and provision manually
(with the help of
ftpboot
script, if you are at SUSE) - use
s3270
terminal emulation (used byopenQA
people?) - use
LXC
from IBM to start CP commands and analyze the results - use
zPXE
to do some PXE-alike booting (used by theorthos
team?) - use
tessia
to install from scratch using autoyast - use
libvirt
for s390 to do some nested virtualization on some already deployed z/VM system - directly install a Linux kernel on a LPAR and use
kvm
+libvirt
from there
Deployment from image solutions include:
- use
ICIC
web interface (openstack
in disguise, contributed by IBM) - use
ICIC
from theopenstack
terraform
provider (used byRancher
QA) - use
zvm_ansible
to controlSMAPI
- connect directly to
SMAPI
low-level socket interface
IBM Cloud Infrastructure Center (ICIC
) harnesses the Feilong API, but you can use Feilong
without installing ICIC
, provided you set up a "z/VM cloud connector" into one of your VMs following this schema.
What about writing a terraform Feilong
provider, just like we have the terraform
libvirt
provider? That would allow to transparently call Feilong
from your main.tf files to deploy and destroy resources on your system/z.
Other Feilong-based solutions include:
- make
libvirt
Feilong-aware - simply call
Feilong
from shell scripts withcurl
- use
zvmconnector
client python library from Feilong - use
zthin
part of Feilong to directly commandSMAPI
.
Goal for Hackweek 23
My final goal is to be able to easily deploy and provision VMs automatically on a z/VM system, in a way that people might enjoy even outside of SUSE.
My technical preference is to write a terraform provider plugin, as it is the approach that involves the least software components for our deployments, while remaining clean, and compatible with our existing development infrastructure.
Goals for Hackweek 24
Feilong provider works and is used internally by SUSE Manager team. Let's push it forward!
Let's add support for fiberchannel disks and multipath.
Possible goals for Hackweek 25
Modernization, maturity, and maintenance.
Flaky Tests AI Finder for Uyuni and MLM Test Suites by oscar-barrios
Description
Our current Grafana dashboards provide a great overview of test suite health, including a panel for "Top failed tests." However, identifying which of these failures are due to legitimate bugs versus intermittent "flaky tests" is a manual, time-consuming process. These flaky tests erode trust in our test suites and slow down development.
This project aims to build a simple but powerful Python script that automates flaky test detection. The script will directly query our Prometheus instance for the historical data of each failed test, using the jenkins_build_test_case_failure_age
metric. It will then format this data and send it to the Gemini API with a carefully crafted prompt, asking it to identify which tests show a flaky pattern.
The final output will be a clean JSON list of the most probable flaky tests, which can then be used to populate a new "Top Flaky Tests" panel in our existing Grafana test suite dashboard.
Goals
By the end of Hack Week, we aim to have a single, working Python script that:
- Connects to Prometheus and executes a query to fetch detailed test failure history.
- Processes the raw data into a format suitable for the Gemini API.
- Successfully calls the Gemini API with the data and a clear prompt.
- Parses the AI's response to extract a simple list of flaky tests.
- Saves the list to a JSON file that can be displayed in Grafana.
- New panel in our Dashboard listing the Flaky tests
Resources
- Jenkins Prometheus Exporter: https://github.com/uyuni-project/jenkins-exporter/
- Data Source: Our internal Prometheus server.
- Key Metric:
jenkins_build_test_case_failure_age{jobname, buildid, suite, case, status, failedsince}
. - Existing Query for Reference:
count by (suite) (max_over_time(jenkins_build_test_case_failure_age{status=~"FAILED|REGRESSION", jobname="$jobname"}[$__range]))
. - AI Model: The Google Gemini API.
- Example about how to interact with Gemini API: https://github.com/srbarrios/FailTale/
- Visualization: Our internal Grafana Dashboard.
- Internal IaC: https://gitlab.suse.de/galaxy/infrastructure/-/tree/master/srv/salt/monitoring