Project Description
The goal of the project is to implement a collection of top-level crash commands in drgn tool. The commands should provide a top-level overview for anybody who opens a kernel core dump. I plan to select a similar set of commands as seen in crash-python tool.
Goal for this Hackweek
Implement basic commands and play with the drgn internals.
Resources
- https://crash-utility.github.io/help.html
- https://drgn.readthedocs.io/en/latest/
- https://crash-python.readthedocs.io/en/latest/index.html
This project is part of:
Hack Week 22
Activity
Comments
-
almost 3 years ago by marxin | Reply
I decided to implement the basic commands as part of
contrib(^1) sub-folder of the project. It's the location intended for more complex listing-like (or analysis) scripts and I was able to introduce (or extend) the following commands even though my kernel knowledge is very poor. That's a good sign thedrgntool provides a friendly API and usable helper functions (^2):ps (extended to provide memory-related stats ^3):
PID PPID CPU ST VMS RSS MEM% COMM 1 0 0 S 10.4M 6.5M 0.4 init 2 0 0 S 0 0 0.0 [kthreadd] ... 263 1 4 S 2.4G 163.5M 9.5 python3 264 1 5 S 2.4G 163.5M 9.5 python3 265 1 6 S 2.4G 163.5M 9.5 python3 266 1 10 S 2.4G 163.5M 9.5 python3 267 1 12 S 2.4G 163.5M 9.5 python3 268 1 13 S 2.4G 163.5M 9.5 python3 269 1 14 S 2.4G 163.5M 9.5 python3 270 1 15 S 2.4G 163.5M 9.5 python3 271 1 16 S 2.4G 163.5M 9.5 python3 ...sys (newly added as ^4)
CPUS 16 DATE Fri Jan 27 20:26:24 2023 UPTIME 1 day, 7:29:37 LOAD AVERAGE 0.00, 0.00, 0.00 TASKS 317 NODENAME tw RELEASE 6.1.7-1-default VERSION #1 SMP PREEMPT_DYNAMIC Wed Jan 18 11:12:34 UTC 2023 (872045c) MACHINE x86_64 MEMORY 12.67 GiBvmstat (newly added ^5)
``` Event Count VMZONESTAT: NRFREEPAGES 512147 NRZONELRUBASE 234271 NRZONEINACTIVEANON 234271 NRZONEACTIVEANON 196 NRZONEINACTIVEFILE 97200 NRZONEACTIVEFILE 110611 NRZONEUNEVICTABLE 1000 NRZONEWRITEPENDING 84 NRMLOCK 0 NRBOUNCE 0 NRZSPAGES 0 NRFREECMAPAGES 0
VMNODESTAT: NRLRUBASE 234322 NRINACTIVEANON 234322 NRACTIVEANON 196 NRINACTIVEFILE 97200 ... ```
vmmap (newly added ^6)
Start End Flgs Offset Dev Inode File path 55dee5284000-55dee53f3000 r-xp 00000000 fd:02 10515 /usr/lib/systemd/systemd 55dee53f3000-55dee5441000 r--p 0016f000 fd:02 10515 /usr/lib/systemd/systemd 55dee5441000-55dee5442000 rw-p 001bd000 fd:02 10515 /usr/lib/systemd/systemd 55dee5f4c000-55dee615d000 rw-p 00000000 00:00 0 7f5fc801c000-7f5fc8024000 r-xp 00000000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8024000-7f5fc8224000 ---p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8224000-7f5fc8225000 r--p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 ...mount (newly added ^7):
Mount Type Devname Dirname ffff8fed001d8500 rootfs rootfs / ffff8fed06a197c0 proc proc /proc ffff8fed06a192c0 sysfs sysfs /sys ffff8fed06a18c80 devtmpfs devtmpfs /dev ffff8fed06a18b40 securityfs securityfs /sys/kernel/security ffff8fed06a19cc0 tmpfs tmpfs /dev/shm ffff8fed06a18500 devpts devpts /dev/pts ffff8fed06a18dc0 tmpfs tmpfs /run ...Existing contrib scripts
There are other existing commands that can:
- list TCP connections
- list loaded kernel modules
- list all the files on a mounted device
- cgroup 2 listing
-
-
almost 3 years ago by marxin | Reply
Misc
drgnobservations- One can write scripts that work for many kernel releases. One can use
symbol_name in progtechnique or simple wrap a code intry ... catch blockand provide a fallback for older/newer releases. - The project contains prebuilt
vmlinuxbinaries for various versions (^1) and one can easily run a contrib script in QEMU for a selected Linux version:
$ python3 -m vmtest.vm -k '5.10.*' python3 -Bm drgn contrib/ps.py Linux version 5.10.166-vmtest18.1default (drgn@drgn) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1 SMP Mon Feb 6 08:12:05 UTC 2023 Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 ro console=0,115200 panic=-1 crashkernel=256M init=/tmp/drgn-vmtest-_6sh_xhu/init x86/fpu: x87 FPU will use FXSAVE BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... PID PPID CPU ST COMM 1 0 6 S init 2 0 15 S [kthreadd] 3 2 0 I [rcu_gp] ... - One can write scripts that work for many kernel releases. One can use
Similar Projects
Improve chore and screen time doc generator script `wochenplaner` by gniebler
Description
I wrote a little Python script to generate PDF docs, which can be used to track daily chore completion and screen time usage for several people, with one page per person/week.
I named this script wochenplaner and have been using it for a few months now.
It needs some improvements and adjustments in how the screen time should be tracked and how chores are displayed.
Goals
- Fix chore field separation lines
- Change screen time tracking logic from "global" (week-long) to daily subtraction and weekly addition of remainders (more intuitive than current "weekly time budget method)
- Add logic to fill in chore fields/lines, ideally with pictures, falling back to text.
Resources
tbd (Gitlab repo)
Improve/rework household chore tracker `chorazon` by gniebler
Description
I wrote a household chore tracker named chorazon, which is meant to be deployed as a web application in the household's local network.
It features the ability to set up different (so far only weekly) schedules per task and per person, where tasks may span several days.
There are "tokens", which can be collected by users. Tasks can (and usually will) have rewards configured where they yield a certain amount of tokens. The idea is that they can later be redeemed for (surprise) gifts, but this is not implemented yet. (So right now one needs to edit the DB manually to subtract tokens when they're redeemed.)
Days are not rolled over automatically, to allow for task completion control.
We used it in my household for several months, with mixed success. There are many limitations in the system that would warrant a revisit.
It's written using the Pyramid Python framework with URL traversal, ZODB as the data store and Web Components for the frontend.
Goals
- Add admin screens for users, tasks and schedules
- Add models, pages etc. to allow redeeming tokens for gifts/surprises
- …?
Resources
tbd (Gitlab repo)
openQA log viewer by mpagot
Description
*** Warning: Are You at Risk for VOMIT? ***
Do you find yourself staring at a screen, your eyes glossing over as thousands of lines of text scroll by? Do you feel a wave of text-based nausea when someone asks you to "just check the logs"?
You may be suffering from VOMIT (Verbose Output Mental Irritation Toxicity).
This dangerous, work-induced ailment is triggered by exposure to an overwhelming quantity of log data, especially from parallel systems. The human brain, not designed to mentally process 12 simultaneous autoinst-log.txt files, enters a state of toxic shock. It rejects the "Verbose Output," making it impossible to find the one critical error line buried in a 50,000-line sea of "INFO: doing a thing."
Before you're forced to rm -rf /var/log in a fit of desperation, we present the digital antacid.
No panic: The openQA Log Visualizer (Also known as the "VOMIT-B-Gone 9000")
This is your web-based hazmat suit for handling toxic log environments. It bravely dives into the chaotic, multi-machine mess of your openQA test runs, finds all the related, verbose logs, and force-feeds them into a parser.
Goals
Work on the existing POC openqa-log-visualizer and change it to something usable
Resources
early stage kdump support by mbrugger
Project Description
When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.
Goal for the Hackweeks
- Investigate if this is possible and the implications it would have (done in HW21)
- Hack up a PoC (done in HW22 and HW23)
- Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).
update HW23
- I was able to include the crash kernel into the kernel Image.
- I'll need to find a way to load that from
init/main.c:start_kernel()probably afterkcsan_init() - I workaround for a smoke test was to hack
kexec_file_load()systemcall which has two problems:- My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
- As the crash kernel is part of init.data it will be already stale once I can call
kexec_file_load()from user-space.
The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.
update HW24
- Day1
- rebased on v6.12 with no problems others then me breaking the config
- setting up a new compilation and qemu/virtme env
- getting desperate as nothing works that used to work
- Day 2
- getting to call the invocation of loading the early kernel from
__initafterkcsan_init()
- getting to call the invocation of loading the early kernel from
Day 3
- fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
- code refactoring
- I'm now able to load the crash kernel
- When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in
elfcorehdr_read_notes()
Day 4
- crash systems crashes (no pun intended) in
copy_old_mempage()link; will need to understand elfcorehdr... - call path
vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
- crash systems crashes (no pun intended) in
Day 5
- hacking
arch/arm64/kernel/crash_dump.c:copy_old_mempage()to see if crash system really starts. It does. - fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
- hacking
TODOs
- fix elfcorehdr so that we actually can make use of all this...
- test where in the boot
__init()chain we can/should callkexec_early_dump()
pudc - A PID 1 process that barks to the internet by mssola
Description
As a fun exercise in order to dig deeper into the Linux kernel, its interfaces, the RISC-V architecture, and all the dragons in between; I'm building a blog site cooked like this:
- The backend is written in a mixture of C and RISC-V assembly.
- The backend is actually PID1 (for real, not within a container).
- We poll and parse incoming HTTP requests ourselves.
- The frontend is a mere HTML page with htmx.
The project is meant to be Linux-specific, so I'm going to use io_uring, pidfs, namespaces, and Linux-specific features in order to drive all of this.
I'm open for suggestions and so on, but this is meant to be a solo project, as this is more of a learning exercise for me than anything else.
Goals
- Have a better understanding of different Linux features from user space down to the kernel internals.
- Most importantly: have fun.
Resources