Project Description
When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.
Goal for the Hackweeks
- Investigate if this is possible and the implications it would have (done in HW21)
- Hack up a PoC (done in HW22 and HW23)
- Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).
update HW23
- I was able to include the crash kernel into the kernel Image.
- I'll need to find a way to load that from
init/main.c:start_kernel()
probably afterkcsan_init()
- I workaround for a smoke test was to hack
kexec_file_load()
systemcall which has two problems:- My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
- As the crash kernel is part of init.data it will be already stale once I can call
kexec_file_load()
from user-space.
The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.
update HW24
- Day1
- rebased on v6.12 with no problems others then me breaking the config
- setting up a new compilation and qemu/virtme env
- getting desperate as nothing works that used to work
- Day 2
- getting to call the invocation of loading the early kernel from
__init
afterkcsan_init()
- getting to call the invocation of loading the early kernel from
Day 3
- fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
- code refactoring
- I'm now able to load the crash kernel
- When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in
elfcorehdr_read_notes()
Day 4
- crash systems crashes (no pun intended) in
copy_old_mempage()
link; will need to understand elfcorehdr... - call path
vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
- crash systems crashes (no pun intended) in
Day 5
- hacking
arch/arm64/kernel/crash_dump.c:copy_old_mempage()
to see if crash system really starts. It does. - fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
- hacking
TODOs
- fix elfcorehdr so that we actually can make use of all this...
- test where in the boot
__init()
chain we can/should callkexec_early_dump()
- do we really need
memdup()
or can we used the complied kernel for creating the segments? - refactor and rename everything (Kconfig menu shows in wrong place, Kconfig entry needs to go somewhere else, ekdump vs early_dump vs early_kdump
Resources
This project is part of:
Hack Week 21 Hack Week 22 Hack Week 23 Hack Week 24
Activity
Comments
-
-
8 months ago by ptesarik | Reply
FWIW I was contemplating a similar scheme back in 2016. My idea was to load: 1. kdump kernel 2. kdump initrd 3. production kernel 4. production initrd Then boot into the kdump kernel, update memory maps and kexec to the production kernel. When the production kernel crashes, pass control back to the kdump kernel. For the return to the kdump kernel, I was looking at the
KEXEC_PRESERVE_CONTEXT
flag, but in the end I doubt it's really helpful without further modifications to the production kernel. At this point, it's probably easier to boot the production kernel first and set up an initial crash kernel at early boot.Good luck!
-
4 months ago by mbrugger | Reply
[ 0.221029] Unable to handle kernel level 3 address size fault at virtual address ffff800080aa0000
[ 0.222848] Mem abort info:
[ 0.223419] ESR = 0x0000000096000003
[ 0.224187] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.225300] SET = 0, FnV = 0
[ 0.225933] EA = 0, S1PTW = 0
[ 0.226587] FSC = 0x03: level 3 address size fault
[ 0.227600] Data abort info:
[ 0.228198] ISV = 0, ISS = 0x00000003, ISS2 = 0x00000000
[ 0.229351] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.230385] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.231466] swapper pgtable: 64k pages, 48-bit VAs, pgdp=000000005d6b0000
[ 0.232850] [ffff800080aa0000] pgd=100000005ef80003, p4d=100000005ef80003, pud=100000005ef80003, pmd=100000005ef90003, pte=00681591c0000f03
[ 0.235548] Internal error: Oops: 0000000096000003 [#1] PREEMPT SMP
[ 0.236828] Modules linked in:
[ 0.237504] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-dirty #9
[ 0.239037] Hardware name: linux,dummy-virt (DT)
[ 0.240047] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.241563] pc : _memcpy+0x110/0x230
[ 0.242374] lr : _copytoiter+0x374/0x670
[ 0.243276] sp : ffff8000800efa20
[ 0.244009] x29: ffff8000800efa70 x28: 0000000000000000 x27: ffff800080aa0000
[ 0.245577] x26: ffff8000800efba0 x25: 00000000000001a8 x24: ffffd7d750365000
[ 0.247124] x23: ffff8000800efbb0 x22: 0000000000000000 x21: ffff8000800efba0
[ 0.248672] x20: 00000000000001a8 x19: 0000000000000000 x18: ffffffffffffffff
[ 0.250234] x17: 0000000087130253 x16: 00000000e547dfaa x15: 0720072007200720
[ 0.251792] x14: ffffa2fe3f221a00 x13: ffffd7d750365fb8 x12: ffffd7d75162c9c8
[ 0.253349] x11: ffffd7d75169ca30 x10: ffffd7d7516849f0 x9 : ffffd7d751684a48
[ 0.254900] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
[ 0.256457] x5 : ffff22febfcc1ba8 x4 : ffff800080aa01a8 x3 : 00000000ffffefff
[ 0.258013] x2 : 00000000000001a8 x1 : ffff800080aa0000 x0 : ffff22febfcc1a00
[ 0.259564] Call trace:
[ 0.260109] _memcpy+0x110/0x230
[ 0.260842] copyoldmempage+0xc8/0x110
[ 0.261713] readfromoldmem+0x1bc/0x268
[ 0.262595] elfcorehdrreadnotes+0x9c/0xd0
[ 0.263536] mergenoteheaderself64.constprop.15+0x110/0x3b0
[ 0.264813] vmcoreinit+0x298/0x794
[ 0.265612] dooneinitcall+0x64/0x1e8
[ 0.266455] kernelinitfreeable+0x238/0x288
Similar Projects
Improve various phones kernel mainline support (Qualcomm, Exynos, MediaTek) by pvorel
Similar to previous hackweeks ( https://hackweek.opensuse.org/projects/improve-qualcomm-soc-msm8994-slash-msm8992-kernel-mainline-support, https://hackweek.opensuse.org/projects/test-mainline-kernel-on-an-older-qualcomm-soc-msm89xx-explore-mainline-kernel-qualcomm-mainlining) try to improve kernel mainline support of various phones.
Result
In the end I concentrated again to msm8994:
- 507aae9a3549c ("arm64: dts: qcom: msm8994-angler: Enable power key, volume up/down") (will be in kernel 6.14)
- Testing of c910544d22347 ("arm64: dts: qcom: msm8994: Describe USB interrupts") (will be in kernel 6.14)
- WIP USB support for msm8994
Create a DRM driver for VGA video cards by tdz
Yes, those VGA video cards. The goal of this project is to implement a DRM graphics driver for such devices. While actual hardware is hard to obtain or even run today, qemu emulates VGA output.
VGA has a number of limitations, which make this project interesting.
- There are only 640x480 pixels (or less) on the screen. That resolution is also a soft lower limit imposed by DRM. It's mostly a problem for desktop environments though.
- Desktop environments assume 16 million colors, but there are only 16 colors with VGA. VGA's 256 color palette is not available at 640x480. We can choose those 16 colors freely. The interesting part is how to choose them. We have to build a palette for the displayed frame and map each color to one of the palette's 16 entries. This is called dithering, and VGA's limitations are a good opportunity to learn about dithering algorithms.
- VGA has an interesting memory layout. Most graphics devices use linear framebuffers, which store the pixels byte by byte. VGA uses 4 bitplanes instead. Plane 0 holds all bits 0 of all pixels. Plane 1 holds all bits 1 of all pixels, and so on.
The driver will probably not be useful to many people. But, if finished, it can serve as test environment for low-level hardware. There's some interest in supporting old Amiga and Atari framebuffers in DRM. Those systems have similar limitations as VGA, but are harder to obtain and test with. With qemu, the VGA driver could fill this gap.
Apart from the Wikipedia entry, good resources on VGA are at osdev.net and FreeVGA
Modularization and Modernization of cifs.ko for Enhanced SMB Protocol Support by hcarvalho
Creator:
Enzo Matsumiya ematsumiya@suse.de @ SUSE Samba team
Members:
Henrique Carvalho henrique.carvalho@suse.com @ SUSE Samba team
Description
Split cifs.ko in 2 separate modules; one for SMB 1.0 and 2.0.x, and another for SMB 2.1, 3.0, and 3.1.1.
Goals
Primary
Start phasing out/deprecation of older SMB versions
Secondary
- Clean up of the code (with focus on the newer versions)
- Update cifs-utils
- Update documentation
- Improve backport workflow (see below)
Technical details
Ideas for the implementation.
- fs/smb/client/{old,new}.c to generate the respective modules
- Maybe don't create separate folders? (re-evaluate as things progresses!)
- Remove server->{ops,vals} if possible
- Clean up fs_context.* -- merge duplicate options into one, handle them in userspace utils
- Reduce code in smb2pdu.c -- tons of functions with very similar init/setup -> send/recv -> handle/free flow
- Restructure multichannel
- Treat initial connection as "channel 0" regardless of multichannel enabled/negotiated status, proceed with extra channels accordingly
- Extra channel just point to "channel 0" as the primary server, no need to allocate an extra TCPServerInfo for each one
- Authentication mechanisms
- Modernize algorithms (references: himmelblau, IAKERB/Local KDC, SCRAM, oauth2 (Azure), etc.
Model checking the BPF verifier by shunghsiyu
Project Description
BPF verifier plays a crucial role in securing the system (though less so now that unprivileged BPF is disabled by default in both upstream and SLES), and bugs in the verifier has lead to privilege escalation vulnerabilities in the past (e.g. CVE-2021-3490).
One way to check whether the verifer has bugs to use model checking (a formal verification technique), in other words, build a abstract model of how the verifier operates, and then see if certain condition can occur (e.g. incorrect calculation during value tracking of registers) by giving both the model and condition to a solver.
For the solver I will be using the Z3 SMT solver to do the checking since it provide a Python binding that's relatively easy to use.
Goal for this Hackweek
Learn how to use the Z3 Python binding (i.e. Z3Py) to build a model of (part of) the BPF verifier, probably the part that's related to value tracking using tristate numbers (aka tnum), and then check that the algorithm work as intended.
Resources
- Formal Methods for the Informal Engineer: Tutorial #1 - The Z3 Theorem Prover and its accompanying notebook is a great introduction into Z3
- Has a section specifically on model checking
- Software Verification and Analysis Using Z3 a great example of using Z3 for model checking
- Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers - existing work that use formal verification to prove that the multiplication helper used for value tracking work as intended
- [PATCH v5 net-next 00/12] bpf: rewrite value tracking in verifier - initial patch set that adds tristate number to the verifier
Officially Become a Kernel Hacker! by m.crivellari
Description
My studies as well my spare time are dedicated to the Linux Kernel. Currently I'm focusing on interrupts on x86_64, but my interests are not restricted to one specific topic, for now.
I also "played" a little bit with kernel modules (ie lantern, a toy packet analyzer) and I've added a new syscall in order read from a task A, the memory of a task B.
Maybe this will be a good chance to...
Goals
- create my first kernel patch
Resources
- https://www.kernel.org/doc/html/latest/process/submitting-patches.html
- https://git-send-email.io/ (mentioned also in the kernel doc)
- https://javiercarrascocruz.github.io/kernel-contributor-1
Achivements
- found while working on a bug, this is the 1st patch: cifs: avoid deadlocks while updating iface [✅ has been merged]