Project Description

Nowadays most customers are looking for multi-cloud and container solutions. The main critical point for their business is providing a better service and make the customer happy. The efficiency of the IT Ops team key to the superior customer experience. In most case customers reports the issue and support will fix the issue but support is not aware of the problems (like node failures, resource crunch limits) in the multi-container environment until customers report them. Even though monitoring and alerts systems exist in the current market that only provide alerts when an issue occurs BUT we need smarter solutions to analyze existing systems and predict future anomalies.

The proposed system will do:

  1. Data collection (unstructured data) from k8s components across the environments
  2. Identifies the common pattern happens in the failure cases.
  3. Creates a Knowledge base for the identified patterns with related components . (Structured data)
  4. Uses a specific data model for the prediction
  5. Use the output from data model to predict the analysis.
  6. Send the alerts and reports

This is further classified as 3 main components in the proposed architecture:

  1. Data collection
  2. Data Prediction
  3. Alers & Reports

Resources that can be considered for the analysis and prediction: 
 Storage devices- Capacity, State Network devices ( LB, Firewalls)- Like Link status , Packet drops Compute Nodes: CPU,Memory,I/O, Storage

Solution Approach: -- Create data model -- Scan & Filter Data -- Extract Entity -- Annotate Data and Input to Model -- Process Output from Model -- Notify / Recommend / Self Heal

Goal for this Hackweek

Use existing log collector to collect the data from rancher k8s clusters and come up with a appropriate data model.

https://support.rancher.com/hc/en-us/articles/360039113911-The-Rancher-v2-x-log-collector-script

Resources

ML engineer,

ML, Python, kubernetes, data model, monitoring tools. @

Looking for hackers with the skills:

python3 machinelearning

This project is part of:

Hack Week 20

Activity

  • about 3 years ago: sbabusadhu added keyword "python3" to this project.
  • about 3 years ago: sbabusadhu added keyword "machinelearning" to this project.
  • about 3 years ago: sbabusadhu added keyword "python3" to this project.
  • about 3 years ago: sbabusadhu added keyword "machinelearning" to this project.
  • about 3 years ago: sbabusadhu originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    Publish two personally productive projects (preferably on github) by dmair

    Since this will be my first real "from the star...


    Create tool to analyze supportconfig to spot common SUSE Manager / Uyuni issues by cbosdonnat

    [comment]: # (Please use the project descriptio...


    TinyTutor - an AI chat-bot powered children's tutor by dmulder

    [comment]: # (Please use the project descriptio...


    A quantum physics experiment puzzle (designed with Google's CP-SAT solver) by moio

    [![link to video player demoing the result](htt...


    Cluster-Tester for SAP HANA System Replication Cluster by fmherschel

    [comment]: # (Please use the project descriptio...


    AI frontend to Bugzilla by paolodepa

    Project Description

    Over the years, our b...