SUSE Hack Week: drgn: implement crash top-level commands

Project Description

The goal of the project is to implement a collection of top-level crash commands in drgn tool. The commands should provide a top-level overview for anybody who opens a kernel core dump. I plan to select a similar set of commands as seen in crash-python tool.

Goal for this Hackweek

Implement basic commands and play with the drgn internals.

Resources

https://crash-utility.github.io/help.html
https://drgn.readthedocs.io/en/latest/
https://crash-python.readthedocs.io/en/latest/index.html

Join this project Leave this project

Looking for hackers with the skills:

crash python3 kernel debug

This project is part of:

Hack Week 22

Activity

over 2 years ago: marxin added keyword "crash" to this project.

over 2 years ago: marxin added keyword "python3" to this project.

over 2 years ago: marxin added keyword "kernel" to this project.

over 2 years ago: marxin added keyword "debug" to this project.

over 2 years ago: marxin started this project.

over 2 years ago: marxin originated this project.

Comments

over 2 years ago by marxin | Reply

I decided to implement the basic commands as part of contrib (^1) sub-folder of the project. It's the location intended for more complex listing-like (or analysis) scripts and I was able to introduce (or extend) the following commands even though my kernel knowledge is very poor. That's a good sign the drgn tool provides a friendly API and usable helper functions (^2):

ps (extended to provide memory-related stats ^3):

PID PPID CPU ST VMS RSS MEM% COMM 1 0 0 S 10.4M 6.5M 0.4 init 2 0 0 S 0 0 0.0 [kthreadd] ... 263 1 4 S 2.4G 163.5M 9.5 python3 264 1 5 S 2.4G 163.5M 9.5 python3 265 1 6 S 2.4G 163.5M 9.5 python3 266 1 10 S 2.4G 163.5M 9.5 python3 267 1 12 S 2.4G 163.5M 9.5 python3 268 1 13 S 2.4G 163.5M 9.5 python3 269 1 14 S 2.4G 163.5M 9.5 python3 270 1 15 S 2.4G 163.5M 9.5 python3 271 1 16 S 2.4G 163.5M 9.5 python3 ...

sys (newly added as ^4)

CPUS 16 DATE Fri Jan 27 20:26:24 2023 UPTIME 1 day, 7:29:37 LOAD AVERAGE 0.00, 0.00, 0.00 TASKS 317 NODENAME tw RELEASE 6.1.7-1-default VERSION #1 SMP PREEMPT_DYNAMIC Wed Jan 18 11:12:34 UTC 2023 (872045c) MACHINE x86_64 MEMORY 12.67 GiB

vmstat (newly added ^5)

``` Event Count VMZONESTAT: NRFREEPAGES 512147 NRZONELRUBASE 234271 NRZONEINACTIVEANON 234271 NRZONEACTIVEANON 196 NRZONEINACTIVEFILE 97200 NRZONEACTIVEFILE 110611 NRZONEUNEVICTABLE 1000 NRZONEWRITEPENDING 84 NRMLOCK 0 NRBOUNCE 0 NRZSPAGES 0 NRFREECMAPAGES 0

VMNODESTAT: NRLRUBASE 234322 NRINACTIVEANON 234322 NRACTIVEANON 196 NRINACTIVEFILE 97200 ... ```

vmmap (newly added ^6)

Start End Flgs Offset Dev Inode File path 55dee5284000-55dee53f3000 r-xp 00000000 fd:02 10515 /usr/lib/systemd/systemd 55dee53f3000-55dee5441000 r--p 0016f000 fd:02 10515 /usr/lib/systemd/systemd 55dee5441000-55dee5442000 rw-p 001bd000 fd:02 10515 /usr/lib/systemd/systemd 55dee5f4c000-55dee615d000 rw-p 00000000 00:00 0 7f5fc801c000-7f5fc8024000 r-xp 00000000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8024000-7f5fc8224000 ---p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8224000-7f5fc8225000 r--p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 ...

mount (newly added ^7):

Mount Type Devname Dirname ffff8fed001d8500 rootfs rootfs / ffff8fed06a197c0 proc proc /proc ffff8fed06a192c0 sysfs sysfs /sys ffff8fed06a18c80 devtmpfs devtmpfs /dev ffff8fed06a18b40 securityfs securityfs /sys/kernel/security ffff8fed06a19cc0 tmpfs tmpfs /dev/shm ffff8fed06a18500 devpts devpts /dev/pts ffff8fed06a18dc0 tmpfs tmpfs /run ...

Existing contrib scripts

There are other existing commands that can:
- list TCP connections
- list loaded kernel modules
- list all the files on a mounted device
- cgroup 2 listing

over 2 years ago by marxin | Reply

When it comes to more complex verification scripts, I was able to port Vlastimil's page table walker (^1) and a verification script (^2) used for debugging of a customer bug. Please see the following git branch: ^3.

over 2 years ago by marxin | Reply

Misc drgn observations
- One can write scripts that work for many kernel releases. One can use symbol_name in prog technique or simple wrap a code in try ... catch block and provide a fallback for older/newer releases.
- The project contains prebuilt vmlinux binaries for various versions (^1) and one can easily run a contrib script in QEMU for a selected Linux version:
$ python3 -m vmtest.vm -k '5.10.*' python3 -Bm drgn contrib/ps.py Linux version 5.10.166-vmtest18.1default (drgn@drgn) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1 SMP Mon Feb 6 08:12:05 UTC 2023 Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 ro console=0,115200 panic=-1 crashkernel=256M init=/tmp/drgn-vmtest-_6sh_xhu/init x86/fpu: x87 FPU will use FXSAVE BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... PID PPID CPU ST COMM 1 0 6 S init 2 0 15 S [kthreadd] 3 2 0 I [rcu_gp] ...

Similar Projects

kernel

pudc - A PID 1 process that barks to the internet by mssola

Description

As a fun exercise in order to dig deeper into the Linux kernel, its interfaces, the RISC-V architecture, and all the dragons in between; I'm building a blog site cooked like this:

The backend is written in a mixture of C and RISC-V assembly.
The backend is actually PID1 (for real, not within a container).
We poll and parse incoming HTTP requests ourselves.
The frontend is a mere HTML page with htmx.

The project is meant to be Linux-specific, so I'm going to use io_uring, pidfs, namespaces, and Linux-specific features in order to drive all of this.

I'm open for suggestions and so on, but this is meant to be a solo project, as this is more of a learning exercise for me than anything else.

Goals

Have a better understanding of different Linux features from user space down to the kernel internals.
Most importantly: have fun.

Resources

https://github.com/mssola/pudc

early stage kdump support by mbrugger

Project Description

When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.

Goal for the Hackweeks

Investigate if this is possible and the implications it would have (done in HW21)
Hack up a PoC (done in HW22 and HW23)
Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).

update HW23

I was able to include the crash kernel into the kernel Image.
I'll need to find a way to load that from init/main.c:start_kernel() probably after kcsan_init()
I workaround for a smoke test was to hack kexec_file_load() systemcall which has two problems:
1. My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
2. As the crash kernel is part of init.data it will be already stale once I can call kexec_file_load() from user-space.

The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.

update HW24

Day1
- rebased on v6.12 with no problems others then me breaking the config
- setting up a new compilation and qemu/virtme env
- getting desperate as nothing works that used to work
Day 2
- getting to call the invocation of loading the early kernel from __init after kcsan_init()
Day 3
- fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
- code refactoring
- I'm now able to load the crash kernel
- When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in elfcorehdr_read_notes()
Day 4
- crash systems crashes (no pun intended) in copy_old_mempage() link; will need to understand elfcorehdr...
- call path vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
Day 5
- hacking arch/arm64/kernel/crash_dump.c:copy_old_mempage() to see if crash system really starts. It does.
- fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
TODOs
- fix elfcorehdr so that we actually can make use of all this...
- test where in the boot __init() chain we can/should call kexec_early_dump()

Project Description

Goal for this Hackweek

Resources

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 2 years ago by marxin | Reply

ps (extended to provide memory-related stats ^3):

sys (newly added as ^4)

vmstat (newly added ^5)

vmmap (newly added ^6)

mount (newly added ^7):

Existing contrib scripts

over 2 years ago by marxin | Reply

over 2 years ago by marxin | Reply

Misc `drgn` observations

Similar Projects

kernel

pudc - A PID 1 process that barks to the internet by mssola

Description

Goals

Resources

early stage kdump support by mbrugger

Project Description

Goal for the Hackweeks

update HW23

update HW24

Project Description

Goal for this Hackweek

Resources

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 2 years ago by marxin | Reply

ps (extended to provide memory-related stats ^3):

sys (newly added as ^4)

vmstat (newly added ^5)

vmmap (newly added ^6)

mount (newly added ^7):

Existing contrib scripts

over 2 years ago by marxin | Reply

over 2 years ago by marxin | Reply

Misc drgn observations

Similar Projects

kernel

pudc - A PID 1 process that barks to the internet by mssola

Description

Goals

Resources

early stage kdump support by mbrugger

Project Description

Goal for the Hackweeks

update HW23

update HW24

Misc `drgn` observations