The supportconfig tool is a great resource for troubleshooting common system issues on SLES but its functionalities might not be enough to troubleshoot other issues related to cloud solutions. I would like to invite you to contribute on this project by creating new plugins/tools to complement supportconfig's great power and ease the troubleshooting process for SUSE Openstack Cloud product.
Main goal:
This project will be considered as "successful" if we are able to develop and include on the main supportconfig tool, the new features listed below:
Develop some sort of "hb_report" tool for cloud where these could be included:
- Structure the information collected in a better directory structure (directories and subdirectories instead of a huge unique file containing everything). We have some "splitter" tools, which recreate the original directory structure on the server (scsplitter.py) but it would be interesting to make this split structure the default one.
- Include a way to "Trim" or "Toggle" the supportconfig to get the information relevant only to errors that occurred on specific components or dates. This way we would avoid having huge files containing data we don't necessarily need. The idea is to have a nice and easy way how to filter information - by instance id, request id, timestamp or any other attribute added to the "supportconfig" command
- Include commands like "openstack (...) list" and "openstack (...) show $id"
- HA-specific checks (pacemaker and pacemaker-remote if any)
- Services report (up or on error state) - checking status from openstack command, from systemctl status and resource status in cluster; I had a case where a neutron agent(if I remember correctly) was in down ":-(" status while systemctl and crm_mon reported service is up and running
- Database dump
- Switch selected component to debug mode and collects logs from customer actions
- Collect storage background and configuration
- Query API's and generate a report on the activities/request
- Ping endpoints and resolve hostnames as a check
- Adding /var/lib/neutron to supportconfig (Bogdano in Rocket Chat)
Optional Goals:
A tree-like graphical tool (or ASCII art) that shows the complete infrastructure and allows to break each node by component/service then to review config/logs
Getting info from supportconfig as part of "Best Practice" document.
Compare Versions: Versions in support config against current versions in the SCC repos
Currently identified tools which could be included:
SOSREPORT: https://github.com/sosreport/sos: Sos is an extensible, portable, support data collection tool primarily aimed at Linux distributions and other UNIX-like operating systems. Perhaps consider a well-established tool with plugins for every possible situation before implementing our own bicycle
https://github.com/search?utf8=%E2%9C%93&q=supportconfig&type=
ELK Tool: https://github.com/denisok/elk_supportconfig
Support Config Utils from A. Spiers: https://build.opensuse.org/package/show/home:aspiers/supportconfig-utils
Crowbar Macs: https://github.com/aspiers/SUSE-dist/blob/master/bin/crowbar-macs
scsplitter (no link known)
lnav monitoring: https://software.opensuse.org/download.html?project=server:monitoring&package=lnav
This project is part of:
Hack Week 17
Activity
Comments
-
over 7 years ago by aspiers | Reply
Please see https://github.com/aspiers/SUSE-dist/tree/master/bin for several other tools in this space. Unfortunately I will be away on FTO for this hackweek but it would be good to share my thoughts and maybe demo everything I have built before I leave (end of next week).
-
-
Similar Projects
OS self documentation, health check and troubleshooting by roseswe
Project Description
The aim of this hackweek project is to improve the utility "cfg2html" so that it is even more usable under SLES and perhaps also under Rancher.
cfg2html (see also https://github.com/cfg2html/cfg2html) itself is a very mature utility for collecting and documenting information of an operating system like Linux, AIX, HP-UX and others.
Goal for this Hackweek
The aim is to extend cfg2html
- for SLES and SLES-for-SAP apps, high availability
- Improve code for MicroOS 5.x, SUMA, Edge and k8s environments
- fix shellbeauity warnings
- possibly add more plugins
- SUMA/Salt integration to collect.
Resources
Required skills: Bash, shell script and the SUSE products mentioned.
https://github.com/cfg2html/cfg2html
https://www.cfg2html.com/
SUSE Health Check Tools by roseswe
SUSE HC Tools Overview
A collection of tools written in Bash or Go 1.24++ to make life easier with handling of a bunch of tar.xz balls created by supportconfig.
Background: For SUSE HC we receive a bunch of supportconfig tar balls to check them for misconfiguration, areas for improvement or future changes.
Main focus on these HC are High Availability (pacemaker), SLES itself and SAP workloads, esp. around the SUSE best practices.
Goals
- Overall improvement of the tools
- Adding new collectors
- Add support for SLES16
Resources
csv2xls* example.sh go.mod listprodids.txt sumtext* trails.go README.md csv2xls.go exceltest.go go.sum m.sh* sumtext.go vercheck.py* config.ini csvfiles/ getrpm* listprodids* rpmdate.sh* sumxls* verdriver* credtest.go example.py getrpm.go listprodids.go sccfixer.sh* sumxls.go verdriver.go
docollall.sh* extracthtml.go gethostnamectl* go.sum numastat.go cpuvul* extractcluster.go firmwarebug* gethostnamectl.go m.sh* numastattest.go cpuvul.go extracthtml* firmwarebug.go go.mod numastat* xtr_cib.sh*
Bring to Cockpit + System Roles capabilities from YAST by miguelpc
Bring to Cockpit + System Roles features from YAST
Cockpit and System Roles have been added to SLES 16 There are several capabilities in YAST that are not yet present in Cockpit and System Roles We will follow the principle of "automate first, UI later" being System Roles the automation component and Cockpit the UI one.
Goals
The idea is to implement service configuration in System Roles and then add an UI to manage these in Cockpit. For some capabilities it will be required to have an specific Cockpit Module as they will interact with a reasource already configured.
Resources
A plan on capabilities missing and suggested implementation is available here: https://docs.google.com/spreadsheets/d/1ZhX-Ip9MKJNeKSYV3bSZG4Qc5giuY7XSV0U61Ecu9lo/edit
Linux System Roles: https://linux-system-roles.github.io/