Project Description
The goal of the project is to implement a collection of top-level crash
commands in drgn
tool. The commands should provide a top-level overview for anybody who opens a kernel core dump. I plan to select a similar set of commands as seen in crash-python
tool.
Goal for this Hackweek
Implement basic commands and play with the drgn
internals.
Resources
- https://crash-utility.github.io/help.html
- https://drgn.readthedocs.io/en/latest/
- https://crash-python.readthedocs.io/en/latest/index.html
This project is part of:
Hack Week 22
Activity
Comments
-
almost 2 years ago by marxin | Reply
I decided to implement the basic commands as part of
contrib
(^1) sub-folder of the project. It's the location intended for more complex listing-like (or analysis) scripts and I was able to introduce (or extend) the following commands even though my kernel knowledge is very poor. That's a good sign thedrgn
tool provides a friendly API and usable helper functions (^2):ps (extended to provide memory-related stats ^3):
PID PPID CPU ST VMS RSS MEM% COMM 1 0 0 S 10.4M 6.5M 0.4 init 2 0 0 S 0 0 0.0 [kthreadd] ... 263 1 4 S 2.4G 163.5M 9.5 python3 264 1 5 S 2.4G 163.5M 9.5 python3 265 1 6 S 2.4G 163.5M 9.5 python3 266 1 10 S 2.4G 163.5M 9.5 python3 267 1 12 S 2.4G 163.5M 9.5 python3 268 1 13 S 2.4G 163.5M 9.5 python3 269 1 14 S 2.4G 163.5M 9.5 python3 270 1 15 S 2.4G 163.5M 9.5 python3 271 1 16 S 2.4G 163.5M 9.5 python3 ...
sys (newly added as ^4)
CPUS 16 DATE Fri Jan 27 20:26:24 2023 UPTIME 1 day, 7:29:37 LOAD AVERAGE 0.00, 0.00, 0.00 TASKS 317 NODENAME tw RELEASE 6.1.7-1-default VERSION #1 SMP PREEMPT_DYNAMIC Wed Jan 18 11:12:34 UTC 2023 (872045c) MACHINE x86_64 MEMORY 12.67 GiB
vmstat (newly added ^5)
``` Event Count VMZONESTAT: NRFREEPAGES 512147 NRZONELRUBASE 234271 NRZONEINACTIVEANON 234271 NRZONEACTIVEANON 196 NRZONEINACTIVEFILE 97200 NRZONEACTIVEFILE 110611 NRZONEUNEVICTABLE 1000 NRZONEWRITEPENDING 84 NRMLOCK 0 NRBOUNCE 0 NRZSPAGES 0 NRFREECMAPAGES 0
VMNODESTAT: NRLRUBASE 234322 NRINACTIVEANON 234322 NRACTIVEANON 196 NRINACTIVEFILE 97200 ... ```
vmmap (newly added ^6)
Start End Flgs Offset Dev Inode File path 55dee5284000-55dee53f3000 r-xp 00000000 fd:02 10515 /usr/lib/systemd/systemd 55dee53f3000-55dee5441000 r--p 0016f000 fd:02 10515 /usr/lib/systemd/systemd 55dee5441000-55dee5442000 rw-p 001bd000 fd:02 10515 /usr/lib/systemd/systemd 55dee5f4c000-55dee615d000 rw-p 00000000 00:00 0 7f5fc801c000-7f5fc8024000 r-xp 00000000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8024000-7f5fc8224000 ---p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 7f5fc8224000-7f5fc8225000 r--p 00008000 fd:02 1181379 /usr/lib64/libffi.so.7.1.0 ...
mount (newly added ^7):
Mount Type Devname Dirname ffff8fed001d8500 rootfs rootfs / ffff8fed06a197c0 proc proc /proc ffff8fed06a192c0 sysfs sysfs /sys ffff8fed06a18c80 devtmpfs devtmpfs /dev ffff8fed06a18b40 securityfs securityfs /sys/kernel/security ffff8fed06a19cc0 tmpfs tmpfs /dev/shm ffff8fed06a18500 devpts devpts /dev/pts ffff8fed06a18dc0 tmpfs tmpfs /run ...
Existing contrib scripts
There are other existing commands that can:
- list TCP connections
- list loaded kernel modules
- list all the files on a mounted device
- cgroup 2 listing
-
almost 2 years ago by marxin | Reply
Misc
drgn
observations- One can write scripts that work for many kernel releases. One can use
symbol_name in prog
technique or simple wrap a code intry ... catch block
and provide a fallback for older/newer releases. - The project contains prebuilt
vmlinux
binaries for various versions (^1) and one can easily run a contrib script in QEMU for a selected Linux version:
$ python3 -m vmtest.vm -k '5.10.*' python3 -Bm drgn contrib/ps.py Linux version 5.10.166-vmtest18.1default (drgn@drgn) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1 SMP Mon Feb 6 08:12:05 UTC 2023 Command line: rootfstype=9p rootflags=trans=virtio,cache=loose,msize=1048576 ro console=0,115200 panic=-1 crashkernel=256M init=/tmp/drgn-vmtest-_6sh_xhu/init x86/fpu: x87 FPU will use FXSAVE BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... PID PPID CPU ST COMM 1 0 6 S init 2 0 15 S [kthreadd] 3 2 0 I [rcu_gp] ...
- One can write scripts that work for many kernel releases. One can use
Similar Projects
Symbol Relations by hli
Description
There are tools to build function call graphs based on parsing source code, for example, cscope
.
This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.
Detailed description and Demos can be found in the README file:
Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.
Tested with python3.6
Goals
Any comments are welcome.
Resources
https://github.com/lhb-cafe/SymbolRelations
symrellib.py: mplements the symbol relation graph and the disassembly parser
symrel_tracer*.py: implements tracing (-t option)
symrel.py: "cli parser"
Enhance UV openQA helper script by mdonis
Description
A couple months ago an UV openQA helper script was created to help/automate the searching phase inside openQA for a given MU to test. The script searches inside all our openQA job groups (qam-sle) related with a given MU and generates an output suitable to add (copy & paste) inside the update log.
This is still a WIP and could use some enhancements.
Goals
- Move script from bash to python: this would be useful in case we want to include this into MTUI in the future. The script will be separate from MTUI for now. The idea is to have this as a CLI tool using the click library or something similar.
- Add option to look for jobs in other sections inside aggregated updates: right now, when looking for regression tests under aggregated updates for a given MU, the script only looks inside the Core MU job group. This is where most of the regression tests we need are located, but some MUs have their regression tests under the YaST/Containers/Security MU job groups. We should keep the Core MU group as a default, but add an option to be able to look into other job groups under aggregated updates.
- Remove the
-a
option: this option is used to indicate the update ID and is mandatory right now. This is a bit weird and goes against posix stardards. It was developed this way in order to avoid using positional parameters. This problem should be fixed if we move the script to python.
Some other ideas to consider:
- Look into the QAM dashboard API. This has more info on each MU, could use this to link general openQA build results, whether the related RR is approved or not, etc
- Make it easier to see if there's regression tests for a package in an openQA test build. Check if there's a possibility to search for tests that have the package name in them inside each testsuite.
- Unit testing?
More ideas TBD
Resources
https://github.com/os-autoinst/scripts/blob/master/openqa-search-maintenance-core-jobs
https://confluence.suse.com/display/maintenanceqa/Guide+on+how+to+test+Updates
Post-Hackweek update
All major features were implemented. Unit tests are still in progress, and project will be moved to the SUSE github org once everything's done. https://github.com/mjdonis/oqa-search
Small healthcheck tool for Longhorn by mbrookhuis
Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py
Selenium with Python by xguo
Description
Try to create test case about Selenium base on Python
Goals
- Knowledge about Selenium with Python
- Create new test case about Selenium
Resources
https://selenium-python.readthedocs.io/ https://www.selenium.dev/
Symbol Relations by hli
Description
There are tools to build function call graphs based on parsing source code, for example, cscope
.
This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.
Detailed description and Demos can be found in the README file:
Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.
Tested with python3.6
Goals
Any comments are welcome.
Resources
https://github.com/lhb-cafe/SymbolRelations
symrellib.py: mplements the symbol relation graph and the disassembly parser
symrel_tracer*.py: implements tracing (-t option)
symrel.py: "cli parser"
Modularization and Modernization of cifs.ko for Enhanced SMB Protocol Support by hcarvalho
Creator:
Enzo Matsumiya ematsumiya@suse.de @ SUSE Samba team
Members:
Henrique Carvalho henrique.carvalho@suse.com @ SUSE Samba team
Description
Split cifs.ko in 2 separate modules; one for SMB 1.0 and 2.0.x, and another for SMB 2.1, 3.0, and 3.1.1.
Goals
Primary
Start phasing out/deprecation of older SMB versions
Secondary
- Clean up of the code (with focus on the newer versions)
- Update cifs-utils
- Update documentation
- Improve backport workflow (see below)
Technical details
Ideas for the implementation.
- fs/smb/client/{old,new}.c to generate the respective modules
- Maybe don't create separate folders? (re-evaluate as things progresses!)
- Remove server->{ops,vals} if possible
- Clean up fs_context.* -- merge duplicate options into one, handle them in userspace utils
- Reduce code in smb2pdu.c -- tons of functions with very similar init/setup -> send/recv -> handle/free flow
- Restructure multichannel
- Treat initial connection as "channel 0" regardless of multichannel enabled/negotiated status, proceed with extra channels accordingly
- Extra channel just point to "channel 0" as the primary server, no need to allocate an extra TCPServerInfo for each one
- Authentication mechanisms
- Modernize algorithms (references: himmelblau, IAKERB/Local KDC, SCRAM, oauth2 (Azure), etc.
Kill DMA and DMA32 memory zones by ptesarik
Description
Provide a better allocator for DMA-capable buffers, making the DMA and DMA32 zones obsolete.
Goals
Make a PoC kernel which can boot a x86 VM and a Raspberry Pi (because early RPi4 boards have some of the weirdest DMA constraints).
Resources
- LPC2024 talk:
- video:
Create a DRM driver for VGA video cards by tdz
Yes, those VGA video cards. The goal of this project is to implement a DRM graphics driver for such devices. While actual hardware is hard to obtain or even run today, qemu emulates VGA output.
VGA has a number of limitations, which make this project interesting.
- There are only 640x480 pixels (or less) on the screen. That resolution is also a soft lower limit imposed by DRM. It's mostly a problem for desktop environments though.
- Desktop environments assume 16 million colors, but there are only 16 colors with VGA. VGA's 256 color palette is not available at 640x480. We can choose those 16 colors freely. The interesting part is how to choose them. We have to build a palette for the displayed frame and map each color to one of the palette's 16 entries. This is called dithering, and VGA's limitations are a good opportunity to learn about dithering algorithms.
- VGA has an interesting memory layout. Most graphics devices use linear framebuffers, which store the pixels byte by byte. VGA uses 4 bitplanes instead. Plane 0 holds all bits 0 of all pixels. Plane 1 holds all bits 1 of all pixels, and so on.
The driver will probably not be useful to many people. But, if finished, it can serve as test environment for low-level hardware. There's some interest in supporting old Amiga and Atari framebuffers in DRM. Those systems have similar limitations as VGA, but are harder to obtain and test with. With qemu, the VGA driver could fill this gap.
Apart from the Wikipedia entry, good resources on VGA are at osdev.net and FreeVGA
Contributing to Linux Kernel security by pperego
Description
A couple of weeks ago, I found this blog post by Gustavo Silva, a Linux Kernel contributor.
I always strived to start again into hacking the Linux Kernel, so I asked Coverity scan dashboard access and I want to contribute to Linux Kernel by fixing some minor issues.
I want also to create a Linux Kernel fuzzing lab using qemu and syzkaller
Goals
- Fix at least 2 security bugs
- Create the fuzzing lab and having it running
The story so far
- Day 1: setting up a virtual machine for kernel development using Tumbleweed. Reading a lot of documentation, taking confidence with Coverity dashboard and with procedures to submit a kernel patch
- Day 2: I read really a lot of documentation and I triaged some findings on Coverity SAST dashboard. I have to confirm that SAST tool are great false positives generator, even for low hanging fruits.
- Day 3: Working on trivial changes after I read this blog post:
https://www.toblux.com/posts/2024/02/linux-kernel-patches.html. I have to take confidence
with the patch preparation and submit process yet.
- First trivial patch sent: using strtruefalse() macro instead of hard-coded strings in a staging driver for a lcd display
- Fix for a dereference before null check issue discovered by Coverity (CID 1601566) https://scan7.scan.coverity.com/#/project-view/52110/11354?selectedIssue=1601566
- Day 4: Triaging more issues found by Coverity.
- The patch for CID 1601566 was refused. The check against the NULL pointer was pointless so I prepared a version 2 of the patch removing the check.
- Fixed another dereference before NULL check in iwlmvmparsewowlaninfo_notif() routine (CID 1601547). This one was already submitted by another kernel hacker :(
- Day 5: Wrapping up. I had to do some minor rework on patch for CID 1601566. I found a stalker bothering me in private emails and people I interacted with me, advised he is a well known bothering person. Markus Elfring for the record.
Wrapping up: being back doing kernel hacking is amazing and I don't want to stop it. My battery pack is completely drained but changing the scope gave me a great twist and I really want to feel this energy not doing a single task for months.
I failed in setting up a fuzzing lab but I was too optimistic for the patch submission process.
The patches
Create DRM drivers for VESA and EFI framebuffers by tdz
Description
We already have simpledrm for firmware framebuffers. But the driver is originally for ARM boards, not PCs. It is already overloaded with code to support both use cases. At the same time it is missing possible features for VESA and EFI, such as palette modes or EDID support. We should have DRM drivers for VESA and EFI interfaces. The infrastructure exists already and initial drivers can be forked from simpledrm.
Goals
- Initially, a bare driver for VESA or EFI should be created. It can take functionality from simpledrm.
- Then we can begin to add additional features. The boot loader can provide EDID data. With VGA hardware, VESA can support paletted modes or color management. Example code exists in vesafb.