Investigate possibilities for the distributed builds for Ceph to speed up builds.

This task could have 2 scopes.

  • replace build and vstart for developers to run in containers for any target base (openSUSE, Ubuntu and etc)

  • distribute build jobs across nodes, probably on k8s cluster

There are some projects already to distribute the build:

distcc

icecream

As the first step maybe get to know already existent system and how much they speed up the build to have a reference point.

Idea is to build container (and reuse it later) with all dependencies (./install-deps.sh on some base for devs, or chroot before build phase for osc/obs) and start number of build jobs on k8s cluster (or podman if local dev env).

That could help: * developers to build, test their changes on any dist base, locally or on k8s * speed up builds for IBS/OBS to distribute build on k8s cluster

For dev environment buildah could be used to get base OS and actually run ./install-deps.sh and get current base for the build. For osc/isc, that tool already prepares chroot, so that might be consumed as container base.

Some tool needs to be used/developed to generate k8s manifests to run build based on that container, run those manifests locally with podman play kube or remotely scheduling those manifest on k8s cluster and some how gather the results from the jobs and store resulted binaries/rpms/containers somewhere to run/store them.

Looking for hackers with the skills:

icecream distcc ceph osc obs

This project is part of:

Hack Week 19

Activity

  • almost 5 years ago: rgrigorev started this project.
  • almost 5 years ago: tbechtold liked this project.
  • almost 5 years ago: denisok added keyword "icecream" to this project.
  • almost 5 years ago: denisok added keyword "distcc" to this project.
  • almost 5 years ago: denisok added keyword "ceph" to this project.
  • almost 5 years ago: denisok added keyword "osc" to this project.
  • almost 5 years ago: denisok added keyword "obs" to this project.
  • almost 5 years ago: denisok originated this project.

  • Comments

    • rgrigorev
      almost 5 years ago by rgrigorev | Reply

      I did some measurements on our node: Main build node + icecream manage ses-client-6 main storage device /dev/nvme0n1p3 Drive speed: 'Timing buffered disk reads: 5388 MB in 3.00 seconds = 1795.90 MB/sec' CPU Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz/ 16Threads/ https://ark.intel.com/content/www/us/en/ark/products/123547/intel-xeon-silver-4110-processor-11m-cache-2-10-ghz.html 64G RAM minion nodes ses-client-7 and ses-client-8

      description real user sys 1 node 210m38.879s 202m4.993s 12m45.631s 1 node -j 6 47m44.742s 254m36.362s 15m27.561s 1 node -j 11 34m38.795s 318m48.671s 17m22.380s 2 nodes -j 12 26m44.379s 156m59.053s 11m44.222s 3 nodes -j 18 25m4.973s 63m48.105s 7m53.588s 3 nodes -j 33 17m27.299s 55m18.695s 7m38.526s 3 nodes -j 33, 25G network 17m0.235s 46m24.425s 7m13.615s .... + ram disk 17m2.661s 47m0.784s 7m11.908s .... + fix in boost 13m22.895s 45m16.003s 4m55.285s

    • rgrigorev
      almost 5 years ago by rgrigorev | Reply

      I did some measurements on our node:

      main build node + icecream manage ses-client-6
      main storage device /dev/nvme0n1p3
      Drive speed: 'Timing buffered disk reads: 5388 MB in 3.00 seconds = 1795.90 MB/sec'
      CPU Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz/ 16Threads/ full cpu description
      64G RAM
      minion nodes ses-client-7 and ses-client-8

      > description ________ real __________ user __________ sys
      > 1 node ______________ 210m38.879s __ 202m4.993s ___ 12m45.631s
      > 1 node -j 6 ___________ 47m44.742s ___ 254m36.362s __ 15m27.561s
      >1 node -j 11 __________ 34m38.795s ___ 318m48.671s __ 17m22.380s
      >2 nodes -j 12 _________ 26m44.379s ___ 156m59.053s __ 11m44.222s
      >3 nodes -j 18 _________ 25m4.973s ____ 63m48.105s ___ 7m53.588s
      >3 nodes -j 33 _________ 17m27.299s ___ 55m18.695s ___ 7m38.526s
      >3 nodes -j 33, 25G net _ 17m0.235s _____ 46m24.425s ___ 7m13.615s
      >.... + ram disk ________ 17m2.661s _____ 47m0.784s ____ 7m11.908s
      .... + fix in boost ______ 13m22.895s ____ 45m16.003s ___ 4m55.285s

    • rpenyaev
      almost 5 years ago by rpenyaev | Reply

      Roman, can the following kernel patch also speedup our ceph builds? Seems worth to try

      https://www.phoronix.com/scan.php?page=news_item&px=Linux-Pipe-Parallel-Job-Opt https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ddad21d3e99c743a3aa473121dc5561679e26bb

    • rgrigorev
      almost 5 years ago by rgrigorev | Reply

      3 nodes -j 33 (~11 per node), hsm network, ram disk, manual fixes in boost files + kernel 5.6.7 pre

      real 13m22.895s user 45m16.003s sys 4m55.285s yes, there is also some improvement

      • rgrigorev
        almost 5 years ago by rgrigorev | Reply

        -j 11, kernel 5.6.0-rc1-197.29 (include new patch from Linux) real 31m13.175s user 321m5.955s sys 16m13.355s

        baseline -j 11 real 34m38.795s user 318m48.671s sys 17m22.380s

    Similar Projects

    Testing and adding GNU/Linux distributions on Uyuni by juliogonzalezgil

    Join the Gitter channel! https://gitter.im/uyuni-project/hackweek

    Uyuni is a configuration and infrastructure management tool that saves you time and headaches when you have to manage and update tens, hundreds or even thousands of machines. It also manages configuration, can run audits, build image containers, monitor and much more!

    Currently there are a few distributions that are completely untested on Uyuni or SUSE Manager (AFAIK) or just not tested since a long time, and could be interesting knowing how hard would be working with them and, if possible, fix whatever is broken.

    For newcomers, the easiest distributions are those based on DEB or RPM packages. Distributions with other package formats are doable, but will require adapting the Python and Java code to be able to sync and analyze such packages (and if salt does not support those packages, it will need changes as well). So if you want a distribution with other packages, make sure you are comfortable handling such changes.

    No developer experience? No worries! We had non-developers contributors in the past, and we are ready to help as long as you are willing to learn. If you don't want to code at all, you can also help us preparing the documentation after someone else has the initial code ready, or you could also help with testing :-)

    The idea is testing Salt and Salt-ssh clients, but NOT traditional clients, which are deprecated.

    To consider that a distribution has basic support, we should cover at least (points 3-6 are to be tested for both salt minions and salt ssh minions):

    1. Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
    2. Onboarding (salt minion from UI, salt minion from bootstrap scritp, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator)
    3. Package management (install, remove, update...)
    4. Patching
    5. Applying any basic salt state (including a formula)
    6. Salt remote commands
    7. Bonus point: Java part for product identification, and monitoring enablement
    8. Bonus point: sumaform enablement (https://github.com/uyuni-project/sumaform)
    9. Bonus point: Documentation (https://github.com/uyuni-project/uyuni-docs)
    10. Bonus point: testsuite enablement (https://github.com/uyuni-project/uyuni/tree/master/testsuite)

    If something is breaking: we can try to fix it, but the main idea is research how supported it is right now. Beyond that it's up to each project member how much to hack :-)

    • If you don't have knowledge about some of the steps: ask the team
    • If you still don't know what to do: switch to another distribution and keep testing.

    This card is for EVERYONE, not just developers. Seriously! We had people from other teams helping that were not developers, and added support for Debian and new SUSE Linux Enterprise and openSUSE Leap versions :-)

    Pending

    FUSS

    FUSS is a complete GNU/Linux solution (server, client and desktop/standalone) based on Debian for managing an educational network.

    https://fuss.bz.it/

    Seems to be a Debian 12 derivative, so adding it could be quite easy.

    • [W] Reposync (this will require using spacewalk-common-channels and adding channels to the .ini file)
    • [W] Onboarding (salt minion from UI, salt minion from bootstrap script, and salt-ssh minion) (this will probably require adding OS to the bootstrap repository creator) --> Working for all 3 options (salt minion UI, salt minion bootstrap script and salt-ssh minion from the UI).
    • [W] Package management (install, remove, update...) --> Installing a new package works, needs to test the rest.
    • [I] Patching (if patch information is available, could require writing some code to parse it, but IIRC we have support for Ubuntu already). No patches detected. Do we support patches for Debian at all?
    • [W] Applying any basic salt state (including a formula)
    • [W] Salt remote commands
    • [ ] Bonus point: Java part for product identification, and monitoring enablement


    Implement a full OBS api client in Rust by nbelouin

    Description

    I recently started to work on tooling for OBS using rust, to do so I started a Rust create to interact with OBS API, I only implemented a few routes/resources for what I needed. What about making it a full fledged OBS client library.

    Goals

    • Implement more routes/resources
    • Implement a test suite against the actual OBS implementation
    • Bonus: Create an osc like cli in Rust using the library

    Resources

    • https://github.com/suse-edge/obs-tools/tree/main/obs-client
    • https://api.opensuse.org/apidocs/


    Research openqa-trigger-from-obs and openqa-trigger-from-ibs-plugin by qwang

    Description

    openqa-trigger-from-obs project is a framework that OSD is using it to automatically sync the defined images and repositories from OBS/IBS to its assets for testing. This framework very likely will be used for the synchronize to each location's openqa include openqa.qa2.suse.asia Beijing local procy scc scc-proxy.suse.asia(although it's not a MUST to our testing) it's now rewriting requests to openqa.qa2.suse.asia instead of openqa.suse.de, the assets/repo should be consistent the format Beijing local openQA is maintaining an own script but still need many manually activities when new build comes, and not consistent to OSD, that will request many test code change due to CC network change

    Goals

    Research this framework in case it will be re-used for Beijing local openQA, and will need to be setup and maintained by ourselves

    Resources

    https://github.com/os-autoinst/openqa-trigger-from-obs/tree/master https://gitlab.suse.de/openqa/openqa-trigger-from-ibs-plugin

    beijing :rainbow machine


    Git CI to automate the creation of product definition by gyribeiro

    Description

    Automate the creation of product definition

    Goals

    Create a Git CI that will:

    • automatically be triggered once a change (commit) in package list is done.
    • run tool responsible to update product definition based on the changes in package list
    • test the updated product definition in OBS
    • submit a pull request updating the product definition in the repository

    NOTE: this Git CI may also be triggered manually

    Resources

    • https://docs.gitlab.com/ee/ci/
    • https://openbuildservice.org/2021/05/31/scm-integration/
    • https://github.com/openSUSE/openSUSE-release-tools


    Switch software-o-o to parse repomd data by hennevogel

    Currently software.opensuse.org search is using the OBS binary search for everything, even for packages inside the openSUSE distributions. Let's switch this to use repomd data from download.opensuse.org