Project Description

The goal of the project is to implement a collection of top-level crash commands in drgn tool. The commands should provide a top-level overview for anybody who opens a kernel core dump. I plan to select a similar set of commands as seen in crash-python tool.

Goal for this Hackweek

Implement basic commands and play with the drgn internals.

Resources

  • https://crash-utility.github.io/help.html
  • https://drgn.readthedocs.io/en/latest/
  • https://crash-python.readthedocs.io/en/latest/index.html

Looking for hackers with the skills:

crash python3 kernel debug

This project is part of:

Hack Week 22

Activity

  • almost 3 years ago: marxin added keyword "crash" to this project.
  • almost 3 years ago: marxin added keyword "python3" to this project.
  • almost 3 years ago: marxin added keyword "kernel" to this project.
  • almost 3 years ago: marxin added keyword "debug" to this project.
  • almost 3 years ago: marxin started this project.
  • almost 3 years ago: marxin originated this project.

  • Comments

    • marxin
      almost 3 years ago by marxin | Reply

      I decided to implement the basic commands as part of contrib (^1) sub-folder of the project. It's the location intended for more complex listing-like (or analysis) scripts and I was able to introduce (or extend) the following commands even though my kernel knowledge is very poor. That's a good sign the drgn tool provides a friendly API and usable helper functions (^2):

      ps (extended to provide memory-related stats ^3):

      PID PPID CPU ST VMS RSS MEM% COMM 1 0 0 S 10.4M 6.5M 0.4 init 2 0 0 S 0 0 0.0 [kthreadd] ... 263 1 4 S 2.4G 163.5M 9.5 python3 264 1 5 S 2.4G 163.5M 9.5 python3 265 1 6 S 2.4G 163.5M 9.5 python3 266 1 10 S 2.4G 163.5M 9.5 python3 267 1 12 S 2.4G 163.5M 9.5 python3 268 1 13 S 2.4G 163.5M 9.5 python3 269 1 14 S 2.4G 163.5M 9.5 python3 270 1 15 S 2.4G 163.5M 9.5 python3 271 1 16 S 2.4G 163.5M 9.5 python3 ...

      sys (newly added as ^4)

      CPUS 16 DATE Fri Jan 27 20:26:24 2023 UPTIME 1 day, 7:29:37 LOAD AVERAGE 0.00, 0.00, 0.00 TASKS 317 NODENAME tw RELEASE 6.1.7-1-default VERSION #1 SMP PREEMPT_DYNAMIC Wed Jan 18 11:12:34 UTC 2023 (872045c) MACHINE x86_64 MEMORY 12.67 GiB

      vmstat (newly added ^5)

      ``` Event Count VMZONESTAT: NRFREEPAGES 512147 NRZONELRUBASE 234271 NRZONEINACTIVEANON 234271 NRZONEACTIVEANON 196 NRZONEINACTIVEFILE 97200 NRZONEACTIVEFILE 110611 NRZONEUNEVICTABLE 1000 NRZONEWRITEPENDING 84 NRMLOCK 0 NRBOUNCE 0 NRZSPAGES 0 NRFREECMAPAGES 0

      VMNODESTAT: NRLRUBASE 234322 NRINACTIVEANON 234322 NRACTIVEANON 196 NRINACTIVEFILE 97200 ... ```

      vmmap (newly added ^6)

      Start End Flgs Offset Dev Inode File path 55dee5284000-55dee53f3000 r-xp 00000000 fd:02 10515 /usr/lib/systemd/systemd 55dee53f3000-55dee5441000 r--p 0016f000 fd:02 10515 /usr/lib/systemd/systemd 55dee5441000-55dee5442000 rw-p 001bd000 fd:02 10515 /usr/lib/systemd/systemd 55dee5f4c000-55dee615d000 rw-p 00000000 00:00 0 7f5fc801c000-7f5fc8024000 r-xp 00000000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8024000-7f5fc8224000 ---p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8224000-7f5fc8225000 r--p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 ...

      mount (newly added ^7):

      Mount Type Devname Dirname ffff8fed001d8500 rootfs rootfs / ffff8fed06a197c0 proc proc /proc ffff8fed06a192c0 sysfs sysfs /sys ffff8fed06a18c80 devtmpfs devtmpfs /dev ffff8fed06a18b40 securityfs securityfs /sys/kernel/security ffff8fed06a19cc0 tmpfs tmpfs /dev/shm ffff8fed06a18500 devpts devpts /dev/pts ffff8fed06a18dc0 tmpfs tmpfs /run ...

      Existing contrib scripts

      There are other existing commands that can:

      • list TCP connections
      • list loaded kernel modules
      • list all the files on a mounted device
      • cgroup 2 listing

    • marxin
      almost 3 years ago by marxin | Reply

      When it comes to more complex verification scripts, I was able to port Vlastimil's page table walker (^1) and a verification script (^2) used for debugging of a customer bug. Please see the following git branch: ^3.

    • marxin
      almost 3 years ago by marxin | Reply

      Misc drgn observations

      • One can write scripts that work for many kernel releases. One can use symbol_name in prog technique or simple wrap a code in try ... catch block and provide a fallback for older/newer releases.
      • The project contains prebuilt vmlinux binaries for various versions (^1) and one can easily run a contrib script in QEMU for a selected Linux version:

      $ python3 -m vmtest.vm -k '5.10.*' python3 -Bm drgn contrib/ps.py Linux version 5.10.166-vmtest18.1default (drgn@drgn) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1 SMP Mon Feb 6 08:12:05 UTC 2023 Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 ro console=0,115200 panic=-1 crashkernel=256M init=/tmp/drgn-vmtest-_6sh_xhu/init x86/fpu: x87 FPU will use FXSAVE BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... PID PPID CPU ST COMM 1 0 6 S init 2 0 15 S [kthreadd] 3 2 0 I [rcu_gp] ...

    Similar Projects

    Improve chore and screen time doc generator script `wochenplaner` by gniebler

    Description

    I wrote a little Python script to generate PDF docs, which can be used to track daily chore completion and screen time usage for several people, with one page per person/week.

    I named this script wochenplaner and have been using it for a few months now.

    It needs some improvements and adjustments in how the screen time should be tracked and how chores are displayed.

    Goals

    • Fix chore field separation lines
    • Change screen time tracking logic from "global" (week-long) to daily subtraction and weekly addition of remainders (more intuitive than current "weekly time budget method)
    • Add logic to fill in chore fields/lines, ideally with pictures, falling back to text.

    Resources

    tbd (Gitlab repo)


    Improve/rework household chore tracker `chorazon` by gniebler

    Description

    I wrote a household chore tracker named chorazon, which is meant to be deployed as a web application in the household's local network.

    It features the ability to set up different (so far only weekly) schedules per task and per person, where tasks may span several days.

    There are "tokens", which can be collected by users. Tasks can (and usually will) have rewards configured where they yield a certain amount of tokens. The idea is that they can later be redeemed for (surprise) gifts, but this is not implemented yet. (So right now one needs to edit the DB manually to subtract tokens when they're redeemed.)

    Days are not rolled over automatically, to allow for task completion control.

    We used it in my household for several months, with mixed success. There are many limitations in the system that would warrant a revisit.

    It's written using the Pyramid Python framework with URL traversal, ZODB as the data store and Web Components for the frontend.

    Goals

    • Add admin screens for users, tasks and schedules
    • Add models, pages etc. to allow redeeming tokens for gifts/surprises
    • …?

    Resources

    tbd (Gitlab repo)


    openQA log viewer by mpagot

    Description

    *** Warning: Are You at Risk for VOMIT? ***

    Do you find yourself staring at a screen, your eyes glossing over as thousands of lines of text scroll by? Do you feel a wave of text-based nausea when someone asks you to "just check the logs"?

    You may be suffering from VOMIT (Verbose Output Mental Irritation Toxicity).

    This dangerous, work-induced ailment is triggered by exposure to an overwhelming quantity of log data, especially from parallel systems. The human brain, not designed to mentally process 12 simultaneous autoinst-log.txt files, enters a state of toxic shock. It rejects the "Verbose Output," making it impossible to find the one critical error line buried in a 50,000-line sea of "INFO: doing a thing."

    Before you're forced to rm -rf /var/log in a fit of desperation, we present the digital antacid.

    No panic: The openQA Log Visualizer (Also known as the "VOMIT-B-Gone 9000")

    This is your web-based hazmat suit for handling toxic log environments. It bravely dives into the chaotic, multi-machine mess of your openQA test runs, finds all the related, verbose logs, and force-feeds them into a parser.

    image

    Goals

    Work on the existing POC openqa-log-visualizer and change it to something usable

    Resources

    openqa-log-visualizer


    early stage kdump support by mbrugger

    Project Description

    When we experience a early boot crash, we are not able to analyze the kernel dump, as user-space wasn't able to load the crash system. The idea is to make the crash system compiled into the host kernel (think of initramfs) so that we can create a kernel dump really early in the boot process.

    Goal for the Hackweeks

    1. Investigate if this is possible and the implications it would have (done in HW21)
    2. Hack up a PoC (done in HW22 and HW23)
    3. Prepare RFC series (giving it's only one week, we are entering wishful thinking territory here).

    update HW23

    • I was able to include the crash kernel into the kernel Image.
    • I'll need to find a way to load that from init/main.c:start_kernel() probably after kcsan_init()
    • I workaround for a smoke test was to hack kexec_file_load() systemcall which has two problems:
      1. My initramfs in the porduction kernel does not have a new enough kexec version, that's not a blocker but where the week ended
      2. As the crash kernel is part of init.data it will be already stale once I can call kexec_file_load() from user-space.

    The solution is probably to rewrite the POC so that the invocation can be done from init.text (that's my theory) but I'm not sure if I can reuse the kexec infrastructure in the kernel from there, which I rely on heavily.

    update HW24

    • Day1
      • rebased on v6.12 with no problems others then me breaking the config
      • setting up a new compilation and qemu/virtme env
      • getting desperate as nothing works that used to work
    • Day 2
      • getting to call the invocation of loading the early kernel from __init after kcsan_init()
    • Day 3

      • fix problem of memdup not being able to alloc so much memory... use 64K page sizes for now
      • code refactoring
      • I'm now able to load the crash kernel
      • When using virtme I can boot into the crash kernel, also it doesn't boot completely (major milestone!), crash in elfcorehdr_read_notes()
    • Day 4

      • crash systems crashes (no pun intended) in copy_old_mempage() link; will need to understand elfcorehdr...
      • call path vmcore_init() -> parse_crash_elf_headers() -> elfcorehdr_read() -> read_from_oldmem() -> copy_oldmem_page() -> copy_to_iter()
    • Day 5

      • hacking arch/arm64/kernel/crash_dump.c:copy_old_mempage() to see if crash system really starts. It does.
      • fun fact: retested with more reserved memory and with UEFI FW, host kernel crashes in init but directly starts the crash kernel, so it works (somehow) \o/
    • TODOs

      • fix elfcorehdr so that we actually can make use of all this...
      • test where in the boot __init() chain we can/should call kexec_early_dump()


    pudc - A PID 1 process that barks to the internet by mssola

    Description

    As a fun exercise in order to dig deeper into the Linux kernel, its interfaces, the RISC-V architecture, and all the dragons in between; I'm building a blog site cooked like this:

    • The backend is written in a mixture of C and RISC-V assembly.
    • The backend is actually PID1 (for real, not within a container).
    • We poll and parse incoming HTTP requests ourselves.
    • The frontend is a mere HTML page with htmx.

    The project is meant to be Linux-specific, so I'm going to use io_uring, pidfs, namespaces, and Linux-specific features in order to drive all of this.

    I'm open for suggestions and so on, but this is meant to be a solo project, as this is more of a learning exercise for me than anything else.

    Goals

    • Have a better understanding of different Linux features from user space down to the kernel internals.
    • Most importantly: have fun.

    Resources