SUSE Hack Week: How software creation process can save energy and CO2 emissions

Project Description

Everything we do in SUSE requires a certain amount of energy. This energy has a cost and it causes also a certain amount of CO2 emissions. In particular, as Kernel QA team, we run Kernel testing pretty often causing energy consumption that could be saved by introducing optimizations inside the LTP testing.

In this project we use a new parallel execution implementation, in order to talk about how software creation process can save energy and CO2 emissions inside a SW company.

Goal for this Hackweek

We want to answer the following questions:

How many tests can run in parallel?
How much energy we save per LTP execution in a virtualized system such as openQA?
Can we improve the parallelization model to save more energy?

Resources

runltp-ng: https://github.com/linux-test-project/runltp-ng/
runltp-ng with parallelization support: https://github.com/acerv/runltp-ng/tree/parallel_coroutines

Jan 31

I had some issues with the runltp-ng parallel execution, due to the choice of moving UI thread in the coroutines Thread. Tests took +30% time to complete with previous code, but now UI thread is working back again. Created a script to check how many parallel executions we have for all testing suites.

``` Suite: can Total tests: 3 Parallelizable tests: 2

Suite: cap_bounds Total tests: 1 Parallelizable tests: 0

Suite: commands Total tests: 37 Parallelizable tests: 0

Suite: connectors Total tests: 1 Parallelizable tests: 0

Suite: containers Total tests: 86 Parallelizable tests: 0

Suite: controllers Total tests: 346 Parallelizable tests: 1

Suite: cpuhotplug Total tests: 6 Parallelizable tests: 0

Suite: crashme Total tests: 4 Parallelizable tests: 0

Suite: crypto Total tests: 10 Parallelizable tests: 6

Suite: cve Total tests: 77 Parallelizable tests: 5

Suite: dio Total tests: 30 Parallelizable tests: 0

Suite: dmathreaddiotest Total tests: 7 Parallelizable tests: 0

Suite: fcntl-locktests Total tests: 1 Parallelizable tests: 0

Suite: filecaps Total tests: 1 Parallelizable tests: 0

Suite: fs Total tests: 68 Parallelizable tests: 0

Suite: fs_bind Total tests: 95 Parallelizable tests: 0

Suite: fspermssimple Total tests: 18 Parallelizable tests: 0

Suite: fs_readonly Total tests: 55 Parallelizable tests: 0

Suite: fsx Total tests: 1 Parallelizable tests: 0

Suite: hugetlb Total tests: 50 Parallelizable tests: 0

Suite: hyperthreading Total tests: 2 Parallelizable tests: 0

Suite: ima Total tests: 9 Parallelizable tests: 0

Suite: input Total tests: 6 Parallelizable tests: 0

Suite: io Total tests: 2 Parallelizable tests: 1

Suite: ipc Total tests: 8 Parallelizable tests: 0

Suite: irq Total tests: 1 Parallelizable tests: 1

Suite: kernel_misc Total tests: 16 Parallelizable tests: 0

Suite: kvm Total tests: 1 Parallelizable tests: 0

Suite: ltp-aio-stress Total tests: 54 Parallelizable tests: 0

Suite: ltp-aiodio.part1 Total tests: 140 Parallelizable tests: 0

Suite: ltp-aiodio.part2 Total tests: 83 Parallelizable tests: 0

Suite: ltp-aiodio.part3 Total tests: 48 Parallelizable tests: 0

Suite: ltp-aiodio.part4 Total tests: 57 Parallelizable tests: 0

Suite: math Total tests: 10 Parallelizable tests: 0

Suite: mm Total tests: 75 Parallelizable tests: 2

Suite: net.features Total tests: 62 Parallelizable tests: 0

Suite: net.ipv6 Total tests: 11 Parallelizable tests: 0

Suite: net.ipv6_lib Total tests: 6 Parallelizable tests: 2

Suite: net.multicast Total tests: 4 Parallelizable tests: 0

Suite: net.nfs Total tests: 84 Parallelizable tests: 0

Suite: net.rpc_tests Total tests: 51 Parallelizable tests: 0

Suite: net.sctp Total tests: 41 Parallelizable tests: 0

Suite: net.tcp_cmds Total tests: 21 Parallelizable tests: 0

Suite: net.tirpc_tests Total tests: 41 Parallelizable tests: 0

Suite: net_stress.appl Total tests: 10 Parallelizable tests: 0

Suite: netstress.brokenip Total tests: 11 Parallelizable tests: 0

Suite: net_stress.interface Total tests: 25 Parallelizable tests: 0

Suite: netstress.ipsecdccp Total tests: 104 Parallelizable tests: 0

Suite: netstress.ipsecicmp Total tests: 86 Parallelizable tests: 0

Suite: netstress.ipsecsctp Total tests: 104 Parallelizable tests: 0

Suite: netstress.ipsectcp Total tests: 104 Parallelizable tests: 0

Suite: netstress.ipsecudp Total tests: 106 Parallelizable tests: 0

Suite: net_stress.multicast Total tests: 24 Parallelizable tests: 0

Suite: net_stress.route Total tests: 14 Parallelizable tests: 0

Suite: nptl Total tests: 1 Parallelizable tests: 0

Suite: numa Total tests: 20 Parallelizable tests: 2

Suite: powermanagementtests Total tests: 5 Parallelizable tests: 0

Suite: powermanagementtests_exclusive Total tests: 5 Parallelizable tests: 0

Suite: pty Total tests: 9 Parallelizable tests: 1

Suite: s390x_tests Total tests: 1 Parallelizable tests: 0

Suite: sched Total tests: 11 Parallelizable tests: 0

Suite: scsi_debug.part1 Total tests: 140 Parallelizable tests: 0

Suite: securebits Total tests: 3 Parallelizable tests: 0

Suite: smack Total tests: 10 Parallelizable tests: 0

Suite: smoketest Total tests: 15 Parallelizable tests: 5

Suite: staging Total tests: 1 Parallelizable tests: 0

Suite: syscalls Total tests: 1384 Parallelizable tests: 526

Suite: syscalls-ipc Total tests: 61 Parallelizable tests: 26

Suite: tpm_tools Total tests: 12 Parallelizable tests: 0

Suite: tracing Total tests: 9 Parallelizable tests: 0

Suite: uevent Total tests: 3 Parallelizable tests: 0

Suite: watchqueue Total tests: 9 Parallelizable tests: 9

Total tests: 4017 Parallelizable tests: 589

14.66% of the tests are parallelizable ```

Feb 1

Added a new option runltp-ng --force-parallel to force parallelization even if it's not enabled by tests, but using it causes application crashes, especially for more important suites such as syscalls or syscalls-ipc. Not a good idea to use it. In general, I run a few suites collecting times we need to complete them. It seems the current rule selecting tests for parallel execution is not smart enough and most of the selected tests just end in a seconds or less. This will reflect on time results, where important testing suites, such as syscalls, will end up just a few minutes before the normal execution. We can do probably better on that side by optimizing the rule, which is currently implemented here.

``` Qemu: Distro: Tumbleweed Kernel: 6.1.8-1-default SMP: 16 RAM: 2GB

syscalls: tests: 1384 parallel: 526 (38% of the tests)

16 workers: 31m 54s
1 worker:   36m 18s

syscalls-ipc: tests: 61 parallel: 26 (42.62% of the tests)

16 workers: 2m 4s
1 worker:   2m 7s

mm: tests: 75 parallel: 2 (42.62% of the tests)

16 workers: 8m 2s
1 worker:   8m 10s

cve: tests: 77 parallel: 5 (6.49% of the tests)

16 workers: 29m 53s
1 worker:   29m 57s

```

02-03 Feb

I focused more on syscalls testing suites, since it's the most important suite that can be easily parallelized. All power consumption measurements have been taken using powerstat -a -R -d 0 1 3600 command, bringing data from the start of the testing suite execution until the end. All stats have been taken using my own laptop, since I wasn't able to access openQA workers physically. Also, to improve measurements, it would be better to have an external device for measuring power consumption. All tests run inside a Qemu instance. According with openQA stats, syscalls has been executed 35 times in the last month (Jan 2023), so we take this value into account.

Environment

``` Laptop: Model: Lenovo T14s Gen 1 CPU: AMD Ryzen 7 PRO 4750U Memory: 16GB DDR4 Hard disk: NVMe SSD

Qemu:
    CPUs: 16
    RAM:  4096MB

```

Data

CO2 emission per kWh -> W = 0.244kg CO2/kWh (5% uncertainty) Avg idle consumption -> I = 2.50 W Cost energy in germany -> P = 0.534 $/kWh syscalls exec per month -> R = 35

Normal execution

execution time: T1 = 38m 57s = 2337s energy consumption: E1 = 9 Wh monthly consumption: C1 = 35 * 9 = 0.315 kWh

Parallel execution (16 workers)

execution time: T2 = 35m 22s = 2122s -> 10% less energy consumption: E2 = 10 Wh monthly consumption: C2 = 35 * 10 = 0.350 kWh

Results

As we notice, there's a small difference between parallelization and normal execution, but overall it's so small that it won't particularly affect CO2 emissions or costs. In particular, in one year we have:

diff: D = (0.315 - 0.350) * 12 = +0.42 kWh cost: C = D * P = -0.42 * 0.534 = +0.224 $ emissions: C02 = D * W = -0.42 * 0.244 = +0.102 kg

Considering that servers might consume a bit more energy during the execution, we might have bigger values, but still pretty small. The reason is that during parallelization we use more power to run many tests in parallel.

Optimizations

At the end, we can see that in terms of costs or emissions, we don't have a big impact, but in terms of time we still can have a significant impact in one year. We have the possibility to realease openQA workers in a faster way and to complete also other jobs a bit faster. And that of course will have an impact on production, energy consumption and emissions. By taking into account our data, we can say that in one year we will save:

(T1 - T2) * R * 12 = (2337 - 2122) * 35 * 12 ~25 hours

If we are able to introduce a smarter rule to select tests which can run in parallel, the amount of saved time per year might significantly increase. Also, we still have 332 syscalls tests (about 24%) using old API which can't run in parallel nowadays.

Looking for hackers with the skills:

optimization energy kernel ltp runltp co2 testing

This project is part of:

Hack Week 22

Activity

over 2 years ago: mkoutny liked this project.

over 2 years ago: maritawerner liked this project.

over 2 years ago: okurz liked this project.

over 2 years ago: acervesato added keyword "testing" to this project.

over 2 years ago: acervesato added keyword "optimization" to this project.

over 2 years ago: acervesato added keyword "energy" to this project.

over 2 years ago: acervesato added keyword "kernel" to this project.

over 2 years ago: acervesato added keyword "ltp" to this project.

over 2 years ago: acervesato added keyword "runltp" to this project.

over 2 years ago: acervesato added keyword "co2" to this project.

over 2 years ago: acervesato started this project.

over 2 years ago: acervesato originated this project.

Comments

over 2 years ago by acervesato | Reply

.

Similar Projects

kernel

early stage kdump support by mbrugger

Project Description

When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.

Goal for the Hackweeks

Investigate if this is possible and the implications it would have (done in HW21)
Hack up a PoC (done in HW22 and HW23)
Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).

update HW23

I was able to include the crash kernel into the kernel Image.
I'll need to find a way to load that from init/main.c:start_kernel() probably after kcsan_init()
I workaround for a smoke test was to hack kexec_file_load() systemcall which has two problems:
1. My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
2. As the crash kernel is part of init.data it will be already stale once I can call kexec_file_load() from user-space.

The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.

update HW24

Day1
- rebased on v6.12 with no problems others then me breaking the config
- setting up a new compilation and qemu/virtme env
- getting desperate as nothing works that used to work
Day 2
- getting to call the invocation of loading the early kernel from __init after kcsan_init()
Day 3
- fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
- code refactoring
- I'm now able to load the crash kernel
- When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in elfcorehdr_read_notes()
Day 4
- crash systems crashes (no pun intended) in copy_old_mempage() link; will need to understand elfcorehdr...
- call path vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
Day 5
- hacking arch/arm64/kernel/crash_dump.c:copy_old_mempage() to see if crash system really starts. It does.
- fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
TODOs
- fix elfcorehdr so that we actually can make use of all this...
- test where in the boot __init() chain we can/should call kexec_early_dump()

Project Description

Goal for this Hackweek

Resources

Jan 31

Feb 1

02-03 Feb

Environment

Data

Normal execution

Parallel execution (16 workers)

Results

Optimizations

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 2 years ago by acervesato | Reply

Similar Projects

kernel

early stage kdump support by mbrugger

Project Description

Goal for the Hackweeks

update HW23

update HW24