Problem statement
Once a kernel is built, a developer/janitor may want to boot the kernel for various reasons, such as performing simple boot test or running tests and workloads from user space or simply playing around in a shell. However, an easy to use and a descriptive tool to perform those tasks doesn't exist to our knowledge.
We talked to kernel developers and were told to have a look at the following resources:
The approach
We plan to address this issue in the upcoming Hackweek. Our idea is to leverage LinuxKit as a driver to boot a given kernel image in different environments (qemu, Hyper-V, VMware and public clouds). As linuxkit is container-based, it is trivial to boot the kernel with various rootfs-images of all kinds of distributions. Note that it's easy to create custom rootfs images.
The tool we seek to implement should wrap everything up into something useful for developers and CI systems to use from the command line as well as from configuration files. The benefits of using a container-based infrastructure include:
Reproducibility: We bundle kernel images with the desired rootfs together and store them for a given amount of time. Re-running and re-creating those becomes trivial and easy.
Declarative approach: All steps to create the desired image are baked into configuration files. The benefits are again reproducibility and documentation.
Flexibility: In theory, we can bundle any kernel image with any rootfs and add as many files, binaries and directories on top as we please. Supporting different kinds of environments, including public clouds, makes the tool attractive for a broader audience.
Looking for hackers with the skills:
This project is part of:
Hack Week 16
Activity
Comments
Similar Projects
early stage kdump support by mbrugger
Project Description
When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.
Goal for the Hackweeks
- Investigate if this is possible and the implications it would have (done in HW21)
- Hack up a PoC (done in HW22 and HW23)
- Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).
update HW23
- I was able to include the crash kernel into the kernel Image.
- I'll need to find a way to load that from
init/main.c:start_kernel()
probably afterkcsan_init()
- I workaround for a smoke test was to hack
kexec_file_load()
systemcall which has two problems:- My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
- As the crash kernel is part of init.data it will be already stale once I can call
kexec_file_load()
from user-space.
The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.
update HW24
- Day1
- rebased on v6.12 with no problems others then me breaking the config
- setting up a new compilation and qemu/virtme env
- getting desperate as nothing works that used to work
- Day 2
- getting to call the invocation of loading the early kernel from
__init
afterkcsan_init()
- getting to call the invocation of loading the early kernel from
Day 3
- fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
- code refactoring
- I'm now able to load the crash kernel
- When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in
elfcorehdr_read_notes()
Day 4
- crash systems crashes (no pun intended) in
copy_old_mempage()
link; will need to understand elfcorehdr... - call path
vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
- crash systems crashes (no pun intended) in
Day 5
- hacking
arch/arm64/kernel/crash_dump.c:copy_old_mempage()
to see if crash system really starts. It does. - fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
- hacking
TODOs
- fix elfcorehdr so that we actually can make use of all this...
- test where in the boot
__init()
chain we can/should callkexec_early_dump()