SUSE Hack Week: Learning & using Tensorflow to estimate patch installation times on SUSE Manager

Introduction

TensorFlow™ is an open-source software library for Machine Intelligence written on Python. It was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. (https://www.tensorflow.org/)

Using values recorded by SUSE Manager it should be possible to predict the outcome of certain operations if machine learning is applied. We are especially interested in the time it takes to apply patches to systems. With anecdotal values a neural network should be trained to predict this for future operations. We need do find out which values can and should be provided, which classifier(s) to use, aso.

Goals:

Monday:
- Learn about Tensorflow: Definitions, how to create a model, different frameworks, etc
- Define set of features that can be gathered from the SUSE Manager DB to create our dataset.
- Explore the values of the dataset: Know about min-max values, boundaries, type of data (categorical, continuous).
- Define crossed relation between data (crossed columns).
- Is our dataset good enough?
Tuesday:
- Create and test different tensorflow models: DNNCombinedLinearClassifier, DNNClassifier, etc
- Are those models' estimations good enough?
- Is tensorflow suitable for achiving the project goal? are estimation good enough for us?
- Upload working example.

Outcomes:

Initial dataset was not really good. We modified the SQL query to collect also package ids.
In the past we restricted the dataset to only contain actions for erratas which only contains one package, but the resulting dataset was not big enough.
We implemented a DNNRegressor.
Dataset: COLUMNS = ["server_id","errata_id","nrcpu","mhz","ram","package_id","size","time"] (we only currently use server_id, errata_id, package_id)
Currently the dataset is based patch installation actions which contains only a one single errata but this errata can have multiple packages associated.
We don't know the installation time for a package, because the "time" data we have is for the complete action, so we do a very draft estimation just dividing the total time by the number of packages the errata contains.
Estimations seems to be good enough, of course, the database still needs to be improved as well as the model itself where the feature columns definition can be adjusted to get better results.
Current estimations are good enough to, at least, give an estimation saying if the action you're planning is going to take less than ~10 seconds, ~30 seconds, ~1 minute, ~5 minutes, etc.

Some samples of estimations:

expected -&gt; estimated

0.233874837557475 -&gt; 0.230502188205719
0.233874837557475 -&gt; 0.25423765182495117
0.233874837557475 -&gt; 0.1823016107082367
0.979458148662861 -&gt; 0.8299890756607056
0.979458148662861 -&gt; 0.8462812900543213
0.211660345395406 -&gt; 0.22346541285514832
1.70577935377757 -&gt; 1.9606330394744873
2.60000002384186 -&gt; 2.39455509185791
0.976182460784912 -&gt; 0.1866598129272461
0.976182460784912 -&gt; 0.614652693271637
2.80241966247559 -&gt; 1.0975050926208496
0.6621074676513671 -&gt; 0.6865990161895752
0.0968895809991019 -&gt; 0.041620612144470215
0.0968895809991019 -&gt; 0.1236574649810791
0.0968895809991019 -&gt; 0.05707252025604248
1.3669094741344499 -&gt; 2.2393956184387207
1.3669094741344499 -&gt; 2.2393956184387207

"Actual" vs "Predicted" screenshots:

Screenshot1

Full graph: view full graph here

Next steps:

Refinement of model and dataset
Add actions with multiple errata to the dataset
Implement also a DNNClassifier to directly classifing instead of getting a float number (possible classes: seconds, minutes, hours).
POC of integration with the SUSE Manager UI
Refeed the neural network with the actual results of the new actions on SUSE Manager.
Replace package_id with something consistent across customers (eg: package name)
Try to find a way to avoid averaging the time per package on erratas that point to multiple packages
Estimate the actual action (not per package)

Code repository: Internal GitLab

Looking for hackers with the skills:

tensorflow python machinelearning susemanager

This project is part of:

Hack Week 16

Activity

about 7 years ago: bfilho liked this project.

over 7 years ago: j_renner liked this project.

over 7 years ago: PSuarezHernandez added keyword "tensorflow" to this project.

over 7 years ago: PSuarezHernandez added keyword "python" to this project.

over 7 years ago: PSuarezHernandez added keyword "machinelearning" to this project.

over 7 years ago: PSuarezHernandez added keyword "susemanager" to this project.

over 7 years ago: mdinca liked this project.

over 7 years ago: dmaiocchi liked this project.

over 7 years ago: dmaiocchi disliked this project.

over 7 years ago: dmaiocchi liked this project.

over 7 years ago: mdinca joined this project.

over 7 years ago: PSuarezHernandez liked this project.

over 7 years ago: jochenbreuer joined this project.

over 7 years ago: PSuarezHernandez started this project.

over 7 years ago: PSuarezHernandez originated this project.

Comments

over 7 years ago by PSuarezHernandez | Reply

The outcomes from this HW project has been published!! The project page has been updated to include the results!

Similar Projects

This project is one of its kind!