Project Description

This idea has been partially implemented for JeOS images, where we are collecting some data from the images whenever a new build ends up in openQA. For instance, https://openqa.opensuse.org/tests/2419705#step/image_info/9 is collecting the size of the image, as well as total number of RPMs, the list of RPMs with their size and some filesystem information. This information is pushed to a InfluxDB located in a EC2 instance for each build. Then, the data is displayed using Grafana: http://35.158.113.219:3000/d/FEP4FZQnz/minimal-vm-image-size?orgId=1&from=now-30d&to=now

I would like to extend this to be used by container images (BCI) and track the history of their size, number of RPMS, etc. For this, the current implementation can be improved by using better representation tool than Grafana, which is quite limited when it comes to customizing the graphs.

The graphs could show the information per build (ascending) instead of timestamp. And when doing mouse over a data point, a popup could be shown displaying all the information (metadata) of this image, not only the value of that point. For example, for a given build X, the size of the image is Y, but we should be able to display additional metadata, like image name, openqa job url, etc.

Goal for this Hackweek

  • Extend this functionality to collect data from BCI (container images)
  • Create better representation of the information with more flexible graphs (maybe Chart.js)
  • Create a web dashboard to display this information
  • Re-think about which DB to be used (InfluxDB vs Postgress vs MySQL ...)

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 21

Activity

  • almost 2 years ago: ilausuch joined this project.
  • almost 2 years ago: bchou liked this project.
  • almost 2 years ago: mkoutny liked this project.
  • almost 2 years ago: jlausuch joined this project.
  • almost 2 years ago: maritawerner liked this project.
  • almost 2 years ago: radolin liked this project.
  • almost 2 years ago: mloviska started this project.
  • almost 2 years ago: gameboy974 liked this project.
  • almost 2 years ago: jlausuch originated this project.

  • Comments

    • jlausuch
      almost 2 years ago by jlausuch | Reply

      We will first start evaluating which DB offers more suitable solution for what we want.

      InfluxDB: (https://docs.influxdata.com/influxdb/v1.8/concepts/key_concepts/) - Main usecase of this DB is to store performance metrics over time. - It's pure timestamp based, the timestamp is created at data insertion time, which acts as the "primary key" of the table. - You don't need to define the fields of the table when creating it, the fields will be inserted when pushing data to it. - There are 2 types of fields: "field keys" and "field values". - Challenging (or impossible) to define relations between tables.

      Postgresql: (https://www.postgresqltutorial.com/) - More similar to the relational DB world. - Timestamp is just another optional field, not mandatory. - Possible to define primary keys (useful for our use case) - Possible to define relations between tables (although we don't need this).

      MySQL: - Maybe too powerful for what we want? https://www.guru99.com/postgresql-vs-mysql-difference.html

      So, from my point of view, InfluxDB is very limited and very much focused on collecting performance metrics over time (cpu usage, network, etc). Our use case is slightly different, we want to collect specific data of different nature (image size, number of packages, etc) and for each of these data values, we want to have some unique KEY which will be the image per se (arch, build, flavor, etc). Therefore, I think Postgres is a better fit for us.

      Also, an important thing to consider is how good those integrate with non-Grafana representation tools, like Chart.js.

    • jlausuch
      almost 2 years ago by jlausuch | Reply

      I have created a confluence page for the HowTo (setting up postgress, using Chart.js, ...). https://confluence.suse.com/display/~jlausuch/Hackweek+June+2022

    • jlausuch
      almost 2 years ago by jlausuch | Reply

      I managed to have postgres up and running in a container with some security access to push data to it. The problem of postgres is that it doesn't come with rest api by default like for influxdb. Since we don't want to install tools in client side (openQA) like psql, we need to have some REST API for postgres. The solution is PostgREST: https://postgrest.org/en/stable/ Which is basically a proxy between the DB and the remote http requests. It offers a simple api to get and push data with simple curl command.

      In the confluence page I am adding all the needed steps to reproduce it. Although the db is running in a container, I'm using a volume to store the data locally so we don't lose it if the container goes down.

      @mloviska is now trying to make the JS pg connector work, so we can load the data in JS to be plotted afterwards.

    • jlausuch
      almost 2 years ago by jlausuch | Reply

      This is the code to push the container image information to the database: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15166

      For the Front-End, we have created a new repository: https://github.com/ilausuch/image-data-ng

    Similar Projects

    This project is one of its kind!