Project Description
Nowadays most customers are looking for multi-cloud and container solutions. The main critical point for their business is providing a better service and make the customer happy. The efficiency of the IT Ops team key to the superior customer experience. In most case customers reports the issue and support will fix the issue but support is not aware of the problems (like node failures, resource crunch limits) in the multi-container environment until customers report them. Even though monitoring and alerts systems exist in the current market that only provide alerts when an issue occurs BUT we need smarter solutions to analyze existing systems and predict future anomalies.
The proposed system will do:
- Data collection (unstructured data) from k8s components across the environments
- Identifies the common pattern happens in the failure cases.
- Creates a Knowledge base for the identified patterns with related components . (Structured data)
- Uses a specific data model for the prediction
- Use the output from data model to predict the analysis.
- Send the alerts and reports
This is further classified as 3 main components in the proposed architecture:
- Data collection
- Data Prediction
- Alers & Reports
Resources that can be considered for the analysis and prediction: Storage devices- Capacity, State Network devices ( LB, Firewalls)- Like Link status , Packet drops Compute Nodes: CPU,Memory,I/O, Storage
Solution Approach: -- Create data model -- Scan & Filter Data -- Extract Entity -- Annotate Data and Input to Model -- Process Output from Model -- Notify / Recommend / Self Heal
Goal for this Hackweek
Use existing log collector to collect the data from rancher k8s clusters and come up with a appropriate data model.
https://support.rancher.com/hc/en-us/articles/360039113911-The-Rancher-v2-x-log-collector-script
Resources
ML engineer,
ML, Python, kubernetes, data model, monitoring tools. @
No Hackers yet
Looking for hackers with the skills:
This project is part of:
Hack Week 20
Activity
Comments
Be the first to comment!
Similar Projects
Symbol Relations by hli
Description
There are tools to build function call graphs based on parsing source code, for example, cscope
.
This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.
Detailed description and Demos can be found in the README file:
Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.
Tested with python3.6
Goals
Any comments are welcome.
Resources
https://github.com/lhb-cafe/SymbolRelations
symrellib.py: mplements the symbol relation graph and the disassembly parser
symrel_tracer*.py: implements tracing (-t option)
symrel.py: "cli parser"
Enhance UV openQA helper script by mdonis
Description
A couple months ago an UV openQA helper script was created to help/automate the searching phase inside openQA for a given MU to test. The script searches inside all our openQA job groups (qam-sle) related with a given MU and generates an output suitable to add (copy & paste) inside the update log.
This is still a WIP and could use some enhancements.
Goals
- Move script from bash to python: this would be useful in case we want to include this into MTUI in the future. The script will be separate from MTUI for now. The idea is to have this as a CLI tool using the click library or something similar.
- Add option to look for jobs in other sections inside aggregated updates: right now, when looking for regression tests under aggregated updates for a given MU, the script only looks inside the Core MU job group. This is where most of the regression tests we need are located, but some MUs have their regression tests under the YaST/Containers/Security MU job groups. We should keep the Core MU group as a default, but add an option to be able to look into other job groups under aggregated updates.
- Remove the
-a
option: this option is used to indicate the update ID and is mandatory right now. This is a bit weird and goes against posix stardards. It was developed this way in order to avoid using positional parameters. This problem should be fixed if we move the script to python.
Some other ideas to consider:
- Look into the QAM dashboard API. This has more info on each MU, could use this to link general openQA build results, whether the related RR is approved or not, etc
- Make it easier to see if there's regression tests for a package in an openQA test build. Check if there's a possibility to search for tests that have the package name in them inside each testsuite.
- Unit testing?
More ideas TBD
Resources
https://github.com/os-autoinst/scripts/blob/master/openqa-search-maintenance-core-jobs
https://confluence.suse.com/display/maintenanceqa/Guide+on+how+to+test+Updates
Post-Hackweek update
All major features were implemented. Unit tests are still in progress, and project will be moved to the SUSE github org once everything's done. https://github.com/mjdonis/oqa-search
Selenium with Python by xguo
Description
Try to create test case about Selenium base on Python
Goals
- Knowledge about Selenium with Python
- Create new test case about Selenium
Resources
https://selenium-python.readthedocs.io/ https://www.selenium.dev/
Small healthcheck tool for Longhorn by mbrookhuis
Project Description
We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.
As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.
This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.
Goal for this Hackweek
At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.
Overview
This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:
node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems
deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.
pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.
If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.
The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.
The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.
Installation
To install this project, perform the following steps:
- Create the directory /opt/k8s-check
mkdir /opt/k8s-check
- Copy all the file to this directory and make the following changes:
chmod +x k8s-check.py
FamilyTrip Planner: A Personalized Travel Planning Platform for Families by pherranz
Description
FamilyTrip Planner is an innovative travel planning application designed to optimize travel experiences for families with children. By integrating APIs for flights, accommodations, and local activities, the app generates complete itineraries tailored to each family’s unique interests and needs. Recommendations are based on customizable parameters such as destination, trip duration, children’s ages, and personal preferences. FamilyTrip Planner not only simplifies the travel planning process but also offers a comprehensive, personalized experience for families.
Goals
This project aims to: - Create a user-friendly platform that assists families in planning complete trips, from flight and accommodation options to recommended family-friendly activities. - Provide intelligent, personalized travel itineraries using artificial intelligence to enhance travel enjoyment and minimize time and cost. - Serve as an educational project for exploring Go programming and artificial intelligence, with the goal of building proficiency in both.
Resources
To develop FamilyTrip Planner, the project will leverage: - APIs such as Skyscanner, Google Places, and TripAdvisor to source real-time information on flights, accommodations, and activities. - Go programming language to manage data integration, API connections, and backend development. - Basic machine learning libraries to implement AI-driven itinerary suggestions tailored to family needs and preferences.