Bug reports can be a great source of information, but usually finding the information requires extensive work in reading through all of the discussions and understanding the details about it.

Could it be that machine learning can be used to extract meaningful information out of that? That's what this project is about. The idea is to explore some different methods and see what the results are.

Here are some rough ideas on what to try:

  • clustering
  • sentiment analysis
  • filtering

As a dataset, the plan is to collect SLE bugs and openSUSE bugs from our very own bugzilla and use this data to train/validate some models.

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 20

Activity

  • over 1 year ago: ONalmpantis liked this project.
  • over 1 year ago: acho liked this project.
  • over 1 year ago: llansky3 liked this project.
  • almost 2 years ago: jufa liked this project.
  • almost 2 years ago: ories liked this project.
  • almost 2 years ago: mlnoga liked this project.
  • almost 2 years ago: j_renner liked this project.
  • almost 2 years ago: moio liked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel liked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel liked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel disliked this project.
  • almost 2 years ago: hennevogel liked this project.
  • almost 2 years ago: hennevogel liked this project.
  • almost 2 years ago: gboiko started this project.
  • almost 2 years ago: gboiko originated this project.

  • Comments

    • alnovak
      almost 2 years ago by alnovak | Reply

      I see two large sources of data that would be useful to include:

      • supportconfigs - these are either attached to Bugzilla, or available (short-term) on a filesystem - present great overview of our customers' environment

      • L3 metadata - for L3 bugs (~ 3000 / year), there are data that may be highly relevant for the clustering as well, among other:

        • customer identification
        • what PTFs (fixed packages) were delivered in the case, what was the feedback on those

      • gboiko
        almost 2 years ago by gboiko | Reply

        Hi @alnovak

        Thank you for your feedback! I will try to include those as well in the analysis.

        Even though I already had in mind L3 metadata, I haven't thought of supportconfigs, nice hint, thank you!

    • mslacken
      almost 2 years ago by mslacken | Reply

      I had the same idea last year, but did not really succeed. You might want to have a look at: https://github.com/mslacken/ml-bugs I also gave a talk at the Super Compute 2019: https://gitlab.suse.de/mslacken/sc-2019 Feel free to ping me, if you need any additional information.

      • gboiko
        over 1 year ago by gboiko | Reply

        Hi @mslacken

        Thank you for the pointers. I will take them a quick look and then I will certainly ping you about it.

        Enjoy hackweek :)

    Similar Projects

    This project is one of its kind!