Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:

  • Small / unimportant fix
  • Big / important fix
  • Small / important feature
  • Big / important feature

For this project I will

  1. Generate a basic corpus of labeled data from a different set of project related with openSUSE
  2. Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
  3. Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...

Looking for hackers with the skills:

nlp machinelearning git github

This project is part of:

Hack Week 10 Hack Week 11 Hack Week 12

Activity

  • almost 9 years ago: jordimassaguerpla liked this project.
  • over 9 years ago: nicolasbock liked this project.
  • over 10 years ago: ZRen disliked this project.
  • over 10 years ago: ZRen liked this project.
  • over 10 years ago: bkutil liked this project.
  • about 11 years ago: cschum liked this project.
  • about 11 years ago: froh joined this project.
  • about 11 years ago: vitezslav_cizek liked this project.
  • about 11 years ago: froh liked this project.
  • about 11 years ago: oholecek liked this project.
  • about 12 years ago: aplanas liked this project.
  • about 12 years ago: aplanas started this project.
  • about 12 years ago: aplanas added keyword "nlp" to this project.
  • about 12 years ago: aplanas added keyword "machinelearning" to this project.
  • about 12 years ago: aplanas added keyword "git" to this project.
  • about 12 years ago: aplanas added keyword "github" to this project.
  • about 12 years ago: aplanas originated this project.

  • Comments

    • aplanas
      about 11 years ago by aplanas | Reply

      Yeah. Hackweek 10 collied with openSUSE 13.1, so I will try to for on this during this new Hackweek instance : )

    • froh
      about 11 years ago by froh | Reply

      Would it be hard to train for regression fix vs new feature, based on the comment? I'd be curious how much energy project have to put into regression fixes vs feature additions.

    • osynge
      over 10 years ago by osynge | Reply

      Have you considered looking at ELK and integrating this work in the ELK stack.

    Similar Projects

    Song Search with CLAP by gcolangiuli

    Description

    Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

    SUSE Hackweek AI Song Search

    Goals

    Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

    • Music Tagging;
    • Free text search;
    • Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

    The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

    Resources


    go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes

    Description

    The go-git library implements the git internals in pure Go, so that any Go application can handle not only Git repositories, but also lower-level primitives (e.g. packfiles, idxfiles, etc) without needing to shell out to the git binary.

    The focus for this Hackweek is to fast track key improvements for the project ahead of the upstream release of Git V3, which may take place at some point next year.

    Goals

    Stretch goals

    Resources

    • https://github.com/go-git/go-git/
    • https://go-git.github.io/docs/


    Create a page with all devel:languages:perl packages and their versions by tinita

    Description

    Perl projects now live in git: https://src.opensuse.org/perl

    It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.

    I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta

    But I never had the time to automate this.

    I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).

    Goals

    • Have a script that updates a central repository (e.g. https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run)
    • Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)

    Resources


    Mail client with mailing list workflow support in Rust by acervesato

    Description

    To create a mail user interface using Rust programming language, supporting mailing list patches workflow. I know, aerc is already there, but I would like to create something simpler, without integrated protocols. Just a plain user interface that is using some crates to read and create emails which are fetched and sent via external tools.

    I already know Rust, but not the async support, which is needed in this case in order to handle events inside the mail folder and to send notifications.

    Goals

    • simple user interface in the style of aerc, with some vim keybindings for motions and search
    • automatic run of external tools (like mbsync) for checking emails
    • automatic run commands for notifications
    • apply patch set from ML
    • tree-sitter support with styles

    Resources

    • ratatui: user interface (https://ratatui.rs/)
    • notify: folder watcher (https://docs.rs/notify/latest/notify/)
    • mail-parser: parser for emails (https://crates.io/crates/mail-parser)
    • mail-builder: create emails in proper format (https://docs.rs/mail-builder/latest/mail_builder/)
    • gitpatch: ML support (https://crates.io/crates/gitpatch)
    • tree-sitter-rust: support for mail format (https://crates.io/crates/tree-sitter)


    The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

    Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it. A GitHub robot mascot trying to lasso a blue bull with a Kubernetes logo tatooed on it


    The Plan

    Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

    Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:


    The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

    The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

    Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.


    If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

    Why?

    We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

    The Outputs

    ❥ A "State of the Agentic Union" for SUSE engineers, detailing what works, what explodes, and how much coffee we can drink while the robots do the rebasing.

    ❥ Honest, Daily Updates With All the Gory Details