Project Description

Nowadays most customers are looking for multi-cloud and container solutions. The main critical point for their business is providing a better service and make the customer happy. The efficiency of the IT Ops team key to the superior customer experience. In most case customers reports the issue and support will fix the issue but support is not aware of the problems (like node failures, resource crunch limits) in the multi-container environment until customers report them. Even though monitoring and alerts systems exist in the current market that only provide alerts when an issue occurs BUT we need smarter solutions to analyze existing systems and predict future anomalies.

The proposed system will do:

  1. Data collection (unstructured data) from k8s components across the environments
  2. Identifies the common pattern happens in the failure cases.
  3. Creates a Knowledge base for the identified patterns with related components . (Structured data)
  4. Uses a specific data model for the prediction
  5. Use the output from data model to predict the analysis.
  6. Send the alerts and reports

This is further classified as 3 main components in the proposed architecture:

  1. Data collection
  2. Data Prediction
  3. Alers & Reports

Resources that can be considered for the analysis and prediction: 
 Storage devices- Capacity, State Network devices ( LB, Firewalls)- Like Link status , Packet drops Compute Nodes: CPU,Memory,I/O, Storage

Solution Approach: -- Create data model -- Scan & Filter Data -- Extract Entity -- Annotate Data and Input to Model -- Process Output from Model -- Notify / Recommend / Self Heal

Goal for this Hackweek

Use existing log collector to collect the data from rancher k8s clusters and come up with a appropriate data model.

https://support.rancher.com/hc/en-us/articles/360039113911-The-Rancher-v2-x-log-collector-script

Resources

ML engineer,

ML, Python, kubernetes, data model, monitoring tools. @

Looking for hackers with the skills:

python3 machinelearning

This project is part of:

Hack Week 20

Activity

  • over 3 years ago: sbabusadhu added keyword "python3" to this project.
  • over 3 years ago: sbabusadhu added keyword "machinelearning" to this project.
  • over 3 years ago: sbabusadhu added keyword "python3" to this project.
  • over 3 years ago: sbabusadhu added keyword "machinelearning" to this project.
  • over 3 years ago: sbabusadhu originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Small healthcheck tool for Longhorn by mbrookhuis

    Project Description

    We have often problems (e.g. pods not starting) that are related to PVCs not running, cluster (nodes) not all up or deployments not running or completely running. This all prevents administration activities. Having something that can regular be run to validate the status of the cluster would be helpful, and not as of today do a lot of manual tasks.

    As addition (read enough time), we could add changing reservation, adding new disks, etc. --> This didn't made it. But the scripts can easily be adopted.

    This tool would decrease troubleshooting time, giving admins rights to the rancher GUI and could be used in automation.

    Goal for this Hackweek

    At the end we should have a small python tool that is doing a (very) basic health check on nodes, deployments and PVCs. First attempt was to make it in golang, but that was taking to much time.

    Overview

    This tool will run a simple healthcheck on a kubernetes cluster. It will perform the following actions:

    • node check: This will check all nodes, and display the status and the k3s version. If the status of the nodes is not "Ready" (this should be only reported), the cluster will be reported as having problems

    • deployment check: This check will list all deployments, and display the number of expected replicas and the used replica. If there are unused replicas this will be displayed. The cluster will be reported as having problems.

    • pvc check: This check will list of all pvc's, and display the status and the robustness. If the robustness is not "Healthy", the cluster will be reported as having problems.

    If there is a problem registered in the checks, there will be a warning that the cluster is not healthy and the program will exit with 1.

    The script has 1 mandatory parameter and that is the kubeconf of the cluster or of a node off the cluster.

    The code is writen for Python 3.11, but will also work on 3.6 (the default with SLES15.x). There is a venv present that will contain all needed packages. Also, the script can be run on the cluster itself or any other linux server.

    Installation

    To install this project, perform the following steps:

    • Create the directory /opt/k8s-check

    mkdir /opt/k8s-check

    • Copy all the file to this directory and make the following changes:

    chmod +x k8s-check.py


    Symbol Relations by hli

    Description

    There are tools to build function call graphs based on parsing source code, for example, cscope.

    This project aims to achieve a similar goal by directly parsing the disasembly (i.e. objdump) of a compiled binary. The assembly code is what the CPU sees, therefore more "direct". This may be useful in certain scenarios, such as gdb/crash debugging.

    Detailed description and Demos can be found in the README file:

    Supports x86 for now (because my customers only use x86 machines), but support for other architectures can be added easily.

    Tested with python3.6

    Goals

    Any comments are welcome.

    Resources

    https://github.com/lhb-cafe/SymbolRelations

    symrellib.py: mplements the symbol relation graph and the disassembly parser

    symrel_tracer*.py: implements tracing (-t option)

    symrel.py: "cli parser"


    Selenium with Python by xguo

    Description

    Try to create test case about Selenium base on Python

    Goals

    • Knowledge about Selenium with Python
    • Create new test case about Selenium

    Resources

    https://selenium-python.readthedocs.io/ https://www.selenium.dev/


    FamilyTrip Planner: A Personalized Travel Planning Platform for Families by pherranz

    Description

    FamilyTrip Planner is an innovative travel planning application designed to optimize travel experiences for families with children. By integrating APIs for flights, accommodations, and local activities, the app generates complete itineraries tailored to each family’s unique interests and needs. Recommendations are based on customizable parameters such as destination, trip duration, children’s ages, and personal preferences. FamilyTrip Planner not only simplifies the travel planning process but also offers a comprehensive, personalized experience for families.

    Goals

    This project aims to: - Create a user-friendly platform that assists families in planning complete trips, from flight and accommodation options to recommended family-friendly activities. - Provide intelligent, personalized travel itineraries using artificial intelligence to enhance travel enjoyment and minimize time and cost. - Serve as an educational project for exploring Go programming and artificial intelligence, with the goal of building proficiency in both.

    Resources

    To develop FamilyTrip Planner, the project will leverage: - APIs such as Skyscanner, Google Places, and TripAdvisor to source real-time information on flights, accommodations, and activities. - Go programming language to manage data integration, API connections, and backend development. - Basic machine learning libraries to implement AI-driven itinerary suggestions tailored to family needs and preferences.