SUSE Hack Week: Tool to collect relevant data from images and containers tested in openQA

Project Description

This idea has been partially implemented for JeOS images, where we are collecting some data from the images whenever a new build ends up in openQA. For instance, https://openqa.opensuse.org/tests/2419705#step/image_info/9 is collecting the size of the image, as well as total number of RPMs, the list of RPMs with their size and some filesystem information. This information is pushed to a InfluxDB located in a EC2 instance for each build. Then, the data is displayed using Grafana: http://35.158.113.219:3000/d/FEP4FZQnz/minimal-vm-image-size?orgId=1&from=now-30d&to=now

I would like to extend this to be used by container images (BCI) and track the history of their size, number of RPMS, etc. For this, the current implementation can be improved by using better representation tool than Grafana, which is quite limited when it comes to customizing the graphs.

The graphs could show the information per build (ascending) instead of timestamp. And when doing mouse over a data point, a popup could be shown displaying all the information (metadata) of this image, not only the value of that point. For example, for a given build X, the size of the image is Y, but we should be able to display additional metadata, like image name, openqa job url, etc.

Goal for this Hackweek

Extend this functionality to collect data from BCI (container images)
Create better representation of the information with more flexible graphs (maybe Chart.js)
Create a web dashboard to display this information
Re-think about which DB to be used (InfluxDB vs Postgress vs MySQL ...)

Join this project Leave this project

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 21

Activity

over 3 years ago: ilausuch joined this project.

over 3 years ago: bchou liked this project.

over 3 years ago: mkoutny liked this project.

over 3 years ago: jlausuch joined this project.

over 3 years ago: maritawerner liked this project.

over 3 years ago: radolin liked this project.

over 3 years ago: mloviska started this project.

over 3 years ago: gameboy974 liked this project.

over 3 years ago: jlausuch originated this project.

Comments

over 3 years ago by jlausuch | Reply

We will first start evaluating which DB offers more suitable solution for what we want.

InfluxDB: (https://docs.influxdata.com/influxdb/v1.8/concepts/key_concepts/) - Main usecase of this DB is to store performance metrics over time. - It's pure timestamp based, the timestamp is created at data insertion time, which acts as the "primary key" of the table. - You don't need to define the fields of the table when creating it, the fields will be inserted when pushing data to it. - There are 2 types of fields: "field keys" and "field values". - Challenging (or impossible) to define relations between tables.

Postgresql: (https://www.postgresqltutorial.com/) - More similar to the relational DB world. - Timestamp is just another optional field, not mandatory. - Possible to define primary keys (useful for our use case) - Possible to define relations between tables (although we don't need this).

MySQL: - Maybe too powerful for what we want? https://www.guru99.com/postgresql-vs-mysql-difference.html

So, from my point of view, InfluxDB is very limited and very much focused on collecting performance metrics over time (cpu usage, network, etc). Our use case is slightly different, we want to collect specific data of different nature (image size, number of packages, etc) and for each of these data values, we want to have some unique KEY which will be the image per se (arch, build, flavor, etc). Therefore, I think Postgres is a better fit for us.

Also, an important thing to consider is how good those integrate with non-Grafana representation tools, like Chart.js.

over 3 years ago by jlausuch | Reply

I have created a confluence page for the HowTo (setting up postgress, using Chart.js, ...). https://confluence.suse.com/display/~jlausuch/Hackweek+June+2022

over 3 years ago by jlausuch | Reply

I managed to have postgres up and running in a container with some security access to push data to it. The problem of postgres is that it doesn't come with rest api by default like for influxdb. Since we don't want to install tools in client side (openQA) like psql, we need to have some REST API for postgres. The solution is PostgREST: https://postgrest.org/en/stable/ Which is basically a proxy between the DB and the remote http requests. It offers a simple api to get and push data with simple curl command.

In the confluence page I am adding all the needed steps to reproduce it. Although the db is running in a container, I'm using a volume to store the data locally so we don't lose it if the container goes down.

@mloviska is now trying to make the JS pg connector work, so we can load the data in JS to be plotted afterwards.

over 3 years ago by jlausuch | Reply

This is the code to push the container image information to the database: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15166

For the Front-End, we have created a new repository: https://github.com/ilausuch/image-data-ng

Similar Projects

This project is one of its kind!

Project Description

Goal for this Hackweek

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 3 years ago by jlausuch | Reply

over 3 years ago by jlausuch | Reply

over 3 years ago by jlausuch | Reply

over 3 years ago by jlausuch | Reply

Similar Projects