Project Description

The goal of the project is to implement a collection of top-level crash commands in drgn tool. The commands should provide a top-level overview for anybody who opens a kernel core dump. I plan to select a similar set of commands as seen in crash-python tool.

Goal for this Hackweek

Implement basic commands and play with the drgn internals.

Resources

  • https://crash-utility.github.io/help.html
  • https://drgn.readthedocs.io/en/latest/
  • https://crash-python.readthedocs.io/en/latest/index.html

Looking for hackers with the skills:

crash python3 kernel debug

This project is part of:

Hack Week 22

Activity

  • almost 2 years ago: marxin added keyword "crash" to this project.
  • almost 2 years ago: marxin added keyword "python3" to this project.
  • almost 2 years ago: marxin added keyword "kernel" to this project.
  • almost 2 years ago: marxin added keyword "debug" to this project.
  • almost 2 years ago: marxin started this project.
  • almost 2 years ago: marxin originated this project.

  • Comments

    • marxin
      almost 2 years ago by marxin | Reply

      I decided to implement the basic commands as part of contrib (^1) sub-folder of the project. It's the location intended for more complex listing-like (or analysis) scripts and I was able to introduce (or extend) the following commands even though my kernel knowledge is very poor. That's a good sign the drgn tool provides a friendly API and usable helper functions (^2):

      ps (extended to provide memory-related stats ^3):

      PID PPID CPU ST VMS RSS MEM% COMM 1 0 0 S 10.4M 6.5M 0.4 init 2 0 0 S 0 0 0.0 [kthreadd] ... 263 1 4 S 2.4G 163.5M 9.5 python3 264 1 5 S 2.4G 163.5M 9.5 python3 265 1 6 S 2.4G 163.5M 9.5 python3 266 1 10 S 2.4G 163.5M 9.5 python3 267 1 12 S 2.4G 163.5M 9.5 python3 268 1 13 S 2.4G 163.5M 9.5 python3 269 1 14 S 2.4G 163.5M 9.5 python3 270 1 15 S 2.4G 163.5M 9.5 python3 271 1 16 S 2.4G 163.5M 9.5 python3 ...

      sys (newly added as ^4)

      CPUS 16 DATE Fri Jan 27 20:26:24 2023 UPTIME 1 day, 7:29:37 LOAD AVERAGE 0.00, 0.00, 0.00 TASKS 317 NODENAME tw RELEASE 6.1.7-1-default VERSION #1 SMP PREEMPT_DYNAMIC Wed Jan 18 11:12:34 UTC 2023 (872045c) MACHINE x86_64 MEMORY 12.67 GiB

      vmstat (newly added ^5)

      ``` Event Count VMZONESTAT: NRFREEPAGES 512147 NRZONELRUBASE 234271 NRZONEINACTIVEANON 234271 NRZONEACTIVEANON 196 NRZONEINACTIVEFILE 97200 NRZONEACTIVEFILE 110611 NRZONEUNEVICTABLE 1000 NRZONEWRITEPENDING 84 NRMLOCK 0 NRBOUNCE 0 NRZSPAGES 0 NRFREECMAPAGES 0

      VMNODESTAT: NRLRUBASE 234322 NRINACTIVEANON 234322 NRACTIVEANON 196 NRINACTIVEFILE 97200 ... ```

      vmmap (newly added ^6)

      Start End Flgs Offset Dev Inode File path 55dee5284000-55dee53f3000 r-xp 00000000 fd:02 10515 /usr/lib/systemd/systemd 55dee53f3000-55dee5441000 r--p 0016f000 fd:02 10515 /usr/lib/systemd/systemd 55dee5441000-55dee5442000 rw-p 001bd000 fd:02 10515 /usr/lib/systemd/systemd 55dee5f4c000-55dee615d000 rw-p 00000000 00:00 0 7f5fc801c000-7f5fc8024000 r-xp 00000000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8024000-7f5fc8224000 ---p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8224000-7f5fc8225000 r--p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 ...

      mount (newly added ^7):

      Mount Type Devname Dirname ffff8fed001d8500 rootfs rootfs / ffff8fed06a197c0 proc proc /proc ffff8fed06a192c0 sysfs sysfs /sys ffff8fed06a18c80 devtmpfs devtmpfs /dev ffff8fed06a18b40 securityfs securityfs /sys/kernel/security ffff8fed06a19cc0 tmpfs tmpfs /dev/shm ffff8fed06a18500 devpts devpts /dev/pts ffff8fed06a18dc0 tmpfs tmpfs /run ...

      Existing contrib scripts

      There are other existing commands that can:

      • list TCP connections
      • list loaded kernel modules
      • list all the files on a mounted device
      • cgroup 2 listing

    • marxin
      almost 2 years ago by marxin | Reply

      When it comes to more complex verification scripts, I was able to port Vlastimil's page table walker (^1) and a verification script (^2) used for debugging of a customer bug. Please see the following git branch: ^3.

    • marxin
      almost 2 years ago by marxin | Reply

      Misc drgn observations

      • One can write scripts that work for many kernel releases. One can use symbol_name in prog technique or simple wrap a code in try ... catch block and provide a fallback for older/newer releases.
      • The project contains prebuilt vmlinux binaries for various versions (^1) and one can easily run a contrib script in QEMU for a selected Linux version:

      $ python3 -m vmtest.vm -k '5.10.*' python3 -Bm drgn contrib/ps.py Linux version 5.10.166-vmtest18.1default (drgn@drgn) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1 SMP Mon Feb 6 08:12:05 UTC 2023 Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 ro console=0,115200 panic=-1 crashkernel=256M init=/tmp/drgn-vmtest-_6sh_xhu/init x86/fpu: x87 FPU will use FXSAVE BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... PID PPID CPU ST COMM 1 0 6 S init 2 0 15 S [kthreadd] 3 2 0 I [rcu_gp] ...

    Similar Projects

    Symbol Relations by hli

    Description

    There are tools to build function call graphs based on parsing source code, for example, cscope.

    This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.

    Detailed description and Demos can be found in the README file:

    Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.

    Tested with python3.6

    Goals

    Any comments are welcome.

    Resources

    https://github.com/lhb-cafe/SymbolRelations

    symrellib.py: mplements the symbol relation graph and the disassembly parser

    symrel_tracer*.py: implements tracing (-t option)

    symrel.py: "cli parser"


    Symbol Relations by hli

    Description

    There are tools to build function call graphs based on parsing source code, for example, cscope.

    This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.

    Detailed description and Demos can be found in the README file:

    Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.

    Tested with python3.6

    Goals

    Any comments are welcome.

    Resources

    https://github.com/lhb-cafe/SymbolRelations

    symrellib.py: mplements the symbol relation graph and the disassembly parser

    symrel_tracer*.py: implements tracing (-t option)

    symrel.py: "cli parser"


    Enhance UV openQA helper script by mdonis

    Description

    A couple months ago an UV openQA helper script was created to help/automate the searching phase inside openQA for a given MU to test. The script searches inside all our openQA job groups (qam-sle) related with a given MU and generates an output suitable to add (copy & paste) inside the update log.

    This is still a WIP and could use some enhancements.

    Goals

    • Move script from bash to python: this would be useful in case we want to include this into MTUI in the future. The script will be separate from MTUI for now. The idea is to have this as a CLI tool using the click library or something similar.
    • Add option to look for jobs in other sections inside aggregated updates: right now, when looking for regression tests under aggregated updates for a given MU, the script only looks inside the Core MU job group. This is where most of the regression tests we need are located, but some MUs have their regression tests under the YaST/Containers/Security MU job groups. We should keep the Core MU group as a default, but add an option to be able to look into other job groups under aggregated updates.
    • Remove the -a option: this option is used to indicate the update ID and is mandatory right now. This is a bit weird and goes against posix stardards. It was developed this way in order to avoid using positional parameters. This problem should be fixed if we move the script to python.

    Some other ideas to consider:

    • Look into the QAM dashboard API. This has more info on each MU, could use this to link general openQA build results, whether the related RR is approved or not, etc
    • Make it easier to see if there's regression tests for a package in an openQA test build. Check if there's a possibility to search for tests that have the package name in them inside each testsuite.
    • Unit testing?

    More ideas TBD

    Resources

    https://github.com/os-autoinst/scripts/blob/master/openqa-search-maintenance-core-jobs

    https://confluence.suse.com/display/maintenanceqa/Guide+on+how+to+test+Updates

    Post-Hackweek update

    All major features were implemented. Unit tests are still in progress, and project will be moved to the SUSE github org once everything's done. https://github.com/mjdonis/oqa-search


    Selenium with Python by xguo

    Description

    Try to create test case about Selenium base on Python

    Goals

    • Knowledge about Selenium with Python
    • Create new test case about Selenium

    Resources

    https://selenium-python.readthedocs.io/ https://www.selenium.dev/


    Small healthcheck tool for Longhorn by mbrookhuis

    Project Description

    We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.

    As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.

    This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.

    Goal for this Hackweek

    At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.

    Overview

    This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:

    • node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems

    • deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.

    • pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.

    If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.

    The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.

    The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.

    Installation

    To install this project, perform the following steps:

    • Create the directory /opt/k8s-check

    mkdir /opt/k8s-check

    • Copy all the file to this directory and make the following changes:

    chmod +x k8s-check.py


    RISC-V emulator in GLSL capable of running Linux by favogt

    Description

    There are already numerous ways to run Linux and some programs through emulation in a web browser (e.g. x86 and riscv64 on https://bellard.org/jslinux/), but none use WebGL/WebGPU to run the emulation on the GPU.

    I already made a PoC of an AArch64 (64-bit Arm) emulator in OpenCL which is unfortunately hindered by a multitude of OpenCL compiler bugs on all platforms (Intel with beignet or the new compute runtime and AMD with Mesa Clover and rusticl). With more widespread and thus less broken GLSL vs. OpenCL and the less complex implementation requirements for RV32 (especially 32bit integers instead of 64bit), that should not be a major problem anymore.

    Goals

    Write an RISC-V system emulator in GLSL that is capable of booting Linux and run some userspace programs interactively. Ideally it is small enough to work on online test platforms like Shaderoo with a custom texture that contains bootstrap code, kernel and initrd.

    Minimum:

    riscv32 without FPU (RV32 IMA) and MMU (µClinux), running Linux in M-mode and userspace in U-mode.

    Stretch goals:

    FPU support, S-Mode support with MMU, SMP. Custom web frontend with more possibilities for I/O (disk image, network?).

    Resources

    RISC-V ISA Specifications
    Shaderoo
    OpenGL 4.5 Quick Reference Card

    Result as of Hackweek 2024

    WebGL turned out to be insufficient, it only supports OpenGL ES 3.0 but imageLoad/imageStore needs ES 3.1. So we switched directions and had to write a native C++ host for the shaders.

    As of Hackweek Friday, the kernel attempts to boot and outputs messages, but panics due to missing memory regions.

    Since then, some bugs were fixed and enough hardware emulation implemented, so that now Linux boots with framebuffer support and it's possible to log in and run programs!

    The repo with a demo video is available at https://github.com/Vogtinator/risky-v


    Model checking the BPF verifier by shunghsiyu

    Project Description

    BPF verifier plays a crucial role in securing the system (though less so now that unprivileged BPF is disabled by default in both upstream and SLES), and bugs in the verifier has lead to privilege escalation vulnerabilities in the past (e.g. CVE-2021-3490).

    One way to check whether the verifer has bugs to use model checking (a formal verification technique), in other words, build a abstract model of how the verifier operates, and then see if certain condition can occur (e.g. incorrect calculation during value tracking of registers) by giving both the model and condition to a solver.

    For the solver I will be using the Z3 SMT solver to do the checking since it provide a Python binding that's relatively easy to use.

    Goal for this Hackweek

    Learn how to use the Z3 Python binding (i.e. Z3Py) to build a model of (part of) the BPF verifier, probably the part that's related to value tracking using tristate numbers (aka tnum), and then check that the algorithm work as intended.

    Resources


    Contributing to Linux Kernel security by pperego

    Description

    A couple of weeks ago, I found this blog post by Gustavo Silva, a Linux Kernel contributor.

    I always strived to start again into hacking the Linux Kernel, so I asked Coverity scan dashboard access and I want to contribute to Linux Kernel by fixing some minor issues.

    I want also to create a Linux Kernel fuzzing lab using qemu and syzkaller

    Goals

    1. Fix at least 2 security bugs
    2. Create the fuzzing lab and having it running

    The story so far

    • Day 1: setting up a virtual machine for kernel development using Tumbleweed. Reading a lot of documentation, taking confidence with Coverity dashboard and with procedures to submit a kernel patch
    • Day 2: I read really a lot of documentation and I triaged some findings on Coverity SAST dashboard. I have to confirm that SAST tool are great false positives generator, even for low hanging fruits.
    • Day 3: Working on trivial changes after I read this blog post: https://www.toblux.com/posts/2024/02/linux-kernel-patches.html. I have to take confidence with the patch preparation and submit process yet.
      • First trivial patch sent: using strtruefalse() macro instead of hard-coded strings in a staging driver for a lcd display
      • Fix for a dereference before null check issue discovered by Coverity (CID 1601566) https://scan7.scan.coverity.com/#/project-view/52110/11354?selectedIssue=1601566
    • Day 4: Triaging more issues found by Coverity.
      • The patch for CID 1601566 was refused. The check against the NULL pointer was pointless so I prepared a version 2 of the patch removing the check.
      • Fixed another dereference before NULL check in iwlmvmparsewowlaninfo_notif() routine (CID 1601547). This one was already submitted by another kernel hacker :(
    • Day 5: Wrapping up. I had to do some minor rework on patch for CID 1601566. I found a stalker bothering me in private emails and people I interacted with me, advised he is a well known bothering person. Markus Elfring for the record.
    • Wrapping up: being back doing kernel hacking is amazing and I don't want to stop it. My battery pack is completely drained but changing the scope gave me a great twist and I really want to feel this energy not doing a single task for months.

      I failed in setting up a fuzzing lab but I was too optimistic for the patch submission process.

    The patches

    1


    Improve UML page fault handler by ptesarik

    Description

    Improve UML handling of segmentation faults in kernel mode. Although such page faults are generally caused by a kernel bug, it is annoying if they cause an infinite loop, or panic the kernel. More importantly, a robust implementation allows to write KUnit tests for various guard pages, preventing potential kernel self-protection regressions.

    Goals

    Convert the UML page fault handler to use oops_* helpers, go through a few review rounds and finally get my patch series merged in 6.14.

    Resources

    Wrong initial attempt: https://lore.kernel.org/lkml/20231215121431.680-1-petrtesarik@huaweicloud.com/T/


    Create a DRM driver for VGA video cards by tdz

    Yes, those VGA video cards. The goal of this project is to implement a DRM graphics driver for such devices. While actual hardware is hard to obtain or even run today, qemu emulates VGA output.

    VGA has a number of limitations, which make this project interesting.

    • There are only 640x480 pixels (or less) on the screen. That resolution is also a soft lower limit imposed by DRM. It's mostly a problem for desktop environments though.
    • Desktop environments assume 16 million colors, but there are only 16 colors with VGA. VGA's 256 color palette is not available at 640x480. We can choose those 16 colors freely. The interesting part is how to choose them. We have to build a palette for the displayed frame and map each color to one of the palette's 16 entries. This is called dithering, and VGA's limitations are a good opportunity to learn about dithering algorithms.
    • VGA has an interesting memory layout. Most graphics devices use linear framebuffers, which store the pixels byte by byte. VGA uses 4 bitplanes instead. Plane 0 holds all bits 0 of all pixels. Plane 1 holds all bits 1 of all pixels, and so on.

    The driver will probably not be useful to many people. But, if finished, it can serve as test environment for low-level hardware. There's some interest in supporting old Amiga and Atari framebuffers in DRM. Those systems have similar limitations as VGA, but are harder to obtain and test with. With qemu, the VGA driver could fill this gap.

    Apart from the Wikipedia entry, good resources on VGA are at osdev.net and FreeVGA