Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:

Small / unimportant fix
Big / important fix
Small / important feature
Big / important feature

For this project I will

Generate a basic corpus of labeled data from a different set of project related with openSUSE
Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...

Join this project Leave this project

Looking for hackers with the skills:

nlp machinelearning git github

This project is part of:

Hack Week 10 Hack Week 11 Hack Week 12

Activity

about 9 years ago: jordimassaguerpla liked this project.

over 9 years ago: nicolasbock liked this project.

almost 11 years ago: ZRen disliked this project.

almost 11 years ago: ZRen liked this project.

almost 11 years ago: bkutil liked this project.

over 11 years ago: cschum liked this project.

over 11 years ago: froh joined this project.

over 11 years ago: vitezslav_cizek liked this project.

over 11 years ago: froh liked this project.

over 11 years ago: oholecek liked this project.

about 12 years ago: aplanas liked this project.

about 12 years ago: aplanas started this project.

about 12 years ago: aplanas added keyword "nlp" to this project.

about 12 years ago: aplanas added keyword "machinelearning" to this project.

about 12 years ago: aplanas added keyword "git" to this project.

about 12 years ago: aplanas added keyword "github" to this project.

about 12 years ago: aplanas originated this project.

Comments

over 11 years ago by aplanas | Reply

Yeah. Hackweek 10 collied with openSUSE 13.1, so I will try to for on this during this new Hackweek instance : )

over 11 years ago by froh | Reply

Would it be hard to train for regression fix vs new feature, based on the comment? I'd be curious how much energy project have to put into regression fixes vs feature additions.

almost 11 years ago by osynge | Reply

Have you considered looking at ELK and integrating this work in the ELK stack.

Similar Projects

machinelearning

Song Search with CLAP by gcolangiuli

Description

Contrastive Language-Audio Pretraining (CLAP) is an open-source library that enables the training of a neural network on both Audio and Text descriptions, making it possible to search for Audio using a Text input. Several pre-trained models for song search are already available on huggingface

Goals

Evaluate how CLAP can be used for song searching and determine which types of queries yield the best results by developing a Minimum Viable Product (MVP) in Python. Based on the results of this MVP, future steps could include:

Music Tagging;
Free text search;
Integration with an LLM (for example, with MCP or the OpenAI API) for music suggestions based on your own library.

The code for this project will be entirely written using AI to better explore and demonstrate AI capabilities.

Result

In this MVP we implemented:

Async Song Analysis with Clap model
Free Text Search of the songs
Similar song search based on vector representation
Containerised version with web interface

We also documented what went well and what can be improved in the use of AI.

You can have a look at the result here:

Future implementation can be related to performance improvement and stability of the analysis.

References

CLAP: The main model being researched;
huggingface: Pre-trained models for CLAP;
Free Music Archive: Creative Commons songs that can be used for testing;

git

Mail client with mailing list workflow support in Rust by acervesato

Description

To create a mail user interface using Rust programming language, supporting mailing list patches workflow. I know, aerc is already there, but I would like to create something simpler, without integrated protocols. Just a plain user interface that is using some crates to read and create emails which are fetched and sent via external tools.

I already know Rust, but not the async support, which is needed in this case in order to handle events inside the mail folder and to send notifications.

Goals

simple user interface in the style of aerc, with some vim keybindings for motions and search
automatic run of external tools (like mbsync) for checking emails
automatic run commands for notifications
apply patch set from ML
tree-sitter support with styles

Resources

ratatui: user interface (https://ratatui.rs/)
notify: folder watcher (https://docs.rs/notify/latest/notify/)
mail-parser: parser for emails (https://crates.io/crates/mail-parser)
mail-builder: create emails in proper format (https://docs.rs/mail-builder/latest/mail_builder/)
gitpatch: ML support (https://crates.io/crates/gitpatch)
tree-sitter-rust: support for mail format (https://crates.io/crates/tree-sitter)

go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes

Description

The go-git library implements the git internals in pure Go, so that any Go application can handle not only Git repositories, but also lower-level primitives (e.g. packfiles, idxfiles, etc) without needing to shell out to the git binary.

The focus for this Hackweek is to fast track key improvements for the project ahead of the upstream release of Git V3, which may take place at some point next year.

Goals

Add support for cloning SHA256 repositories.
Decrease memory churn for very large repositories (e.g. Linux Kernel repository).
Cut the first alpha version for go-git/v6.

Stretch goals

Review and update the official documentation.
Optimise use of go-git in Fleet.
Create RFC/example for go-git plugins to improve extensibility.
Investigate performance bottlenecks for Blame and Status.

Resources

https://github.com/go-git/go-git/
https://go-git.github.io/docs/

Create a page with all devel:languages:perl packages and their versions by tinita

Description

Perl projects now live in git: https://src.opensuse.org/perl

It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.

I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta

But I never had the time to automate this.

I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).

Goals

Have a script that updates a central repository (e.g. https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run)
Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)

Resources

Results

Day 1

First part of the code which retrieves data from https://src.opensuse.org/perl/_ObsPrj with submodules and creates a YAML and a JSON file.
Repo: https://github.com/perlpunk/opensuse-perl-meta
Also a first version of the HTML is live: https://perlpunk.github.io/opensuse-perl-meta/

Day 2

HTML Page has now links to src.opensuse.org and the date of the last update, plus a short info at the top
Code is now 100% covered by tests: https://app.codecov.io/gh/perlpunk/opensuse-perl-meta
I used the modern perl class feature, which makes perl classes even nicer and shorter. See example
Tests
- I tried out the mocking feature of the modern Test2::V0 library which provides call tracking. See example
- I tried out comparing data structures with the new Test2::V0 library. It let's you compare parts of the structure with the like function, which only compares the date that is mentioned in the expected data. example

Day 3

Added various things to the table
- Dependencies column
- Show popup with info for cpanspec, patches and dependencies
- Added last date / commit to the data export.

Plan: With the added date / commit we can now daily check _ObsPrj for changes and only fetch the data for changed packages.

Day 4

git-fs: file system representation of a git repository by fgonzalez

Description

This project aims to create a Linux equivalent to the git/fs concept from git9. Now, I'm aware that git provides worktrees, but they are not enough for many use cases. Having a read-only representation of the whole repository simplifies scripting by quite a bit and, most importantly, reduces disk space usage. For instance, during kernel livepatching development, we need to process and analyze the source code of hundreds of kernel versions simultaneously.This is rather painful with git-worktrees, as each kernel branch requires no less than 1G of disk space.

As for the technical details, I'll implement the file system using FUSE. The project itself should not take much time to complete, but let's see where it takes me.

I'll try to keep the same design as git9, so the file system will look something like:


/mnt/git
      +-- ctl
      +-- HEAD
      |    +-- tree
      |    |    +--files
      |    |    +--in
      |    |    +--head
      |    |
      |    +-- hash
      |    +-- msg
      |    +-- parent
      |
      +-- branch
      |      |
      |      +-- heads
      |      |      +-- master
      |      |            +-- [commit files, see HEAD]
      |      +-- remotes
      |             +-- origin
      |                     +-- master
      |                            +-- [commit files, see HEAD]
      +-- object
            +-- 00051fd3f066e8c05ae7d3cf61ee363073b9535f # blob contents
            +-- 00051fd3f066e8c05ae7d3cf61ee363073b9535c
                  +-- [tree contents, see HEAD/tree]
            +-- 3f5dbc97ae6caba9928843ec65fb3089b96c9283
                  +-- [commit files, see HEAD]

So, if you wanted to look at the commit message of the current branch, you could simply do:

cat /mnt/git/HEAD/msg

No collaboration needed. This is a solo project.

Goals

Implement a working prototype.
Measure and improve the performance if possible. This step will be the most crucial one. User space filesystems are slower by nature.

Resources

https://orib.dev/git9.html

https://docs.kernel.org/filesystems/fuse/fuse.html

github

Is SUSE Trending? Popularity and Developer Sentiment Insight Using Native AI Capabilities by terezacerna

Description

This project aims to explore the popularity and developer sentiment around SUSE and its technologies compared to Red Hat and their technologies. Using publicly available data sources, I will analyze search trends, developer preferences, repository activity, and media presence. The final outcome will be an interactive Power BI dashboard that provides insights into how SUSE is perceived and discussed across the web and among developers.

Goals

Assess the popularity of SUSE products and brand compared to Red Hat using Google Trends.
Analyze developer satisfaction and usage trends from the Stack Overflow Developer Survey.
Use the GitHub API to compare SUSE and Red Hat repositories in terms of stars, forks, contributors, and issue activity.
Perform sentiment analysis on GitHub issue comments to measure community tone and engagement using built-in Copilot capabilities.
Perform sentiment analysis on Reddit comments related to SUSE technologies using built-in Copilot capabilities.
Use Gnews.io to track and compare the volume of news articles mentioning SUSE and Red Hat technologies.
Test the integration of Copilot (AI) within Power BI for enhanced data analysis and visualization.
Deliver a comprehensive Power BI report summarizing findings and insights.
Test the full potential of Power BI, including its AI features and native language Q&A.

Resources

Google Trends: Web scraping for search popularity data
Stack Overflow Developer Survey: For technology popularity and satisfaction comparison
GitHub API: For repository data (stars, forks, contributors, issues, comments).
Gnews.io API: For article volume and mentions analysis.
Reddit: SUSE related topics with comments.

The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

Rancher is a beast of a codebase. Let's investigate if the new 2025 generation of GitHub Autonomous Coding Agents and Copilot Workspaces can actually tame it.

The Plan

Create a sandbox GitHub Organization, clone in key Rancher repositories, and let the AI loose to see if it can handle real-world enterprise OSS maintenance - or if it just hallucinates new breeds of Kubernetes resources!

Specifically, throw "Agentic Coders" some typical tasks in a complex, long-lived open-source project, such as:

❥ The Grunt Work: generate missing GoDocs, unit tests, and refactorings. Rebase PRs.

❥ The Complex Stuff: fix actual (historical) bugs and feature requests to see if they can traverse the complexity without (too much) human hand-holding.

❥ Hunting Down Gaps: find areas lacking in docs, areas of improvement in code, dependency bumps, and so on.

If time allows, also experiment with Model Context Protocol (MCP) to give agents context on our specific build pipelines and CI/CD logs.

Why?

We know AI can write "Hello World." and also moderately complex programs from a green field. But can it rebase a 3-month-old PR with conflicts in rancher/rancher? I want to find the breaking point of current AI agents to determine if and how they can help us to reduce our technical debt, work faster and better. At the same time, find out about pitfalls and shortcomings.

The CONCLUSION!!!

A State of the Union document was compiled to summarize lessons learned this week. For more gory details, just read on the diary below!

issuefs: FUSE filesystem representing issues (e.g. JIRA) for the use with AI agents code-assistants by llansky3

Description

Creating a FUSE filesystem (issuefs) that mounts issues from various ticketing systems (Github, Jira, Bugzilla, Redmine) as files to your local file system.

And why this is good idea?

User can use favorite command line tools to view and search the tickets from various sources
User can use AI agents capabilities from your favorite IDE or cli to ask question about the issues, project or functionality while providing relevant tickets as context without extra work.
User can use it during development of the new features when you let the AI agent to jump start the solution. The issuefs will give the AI agent the context (AI agents just read few more files) about the bug or requested features. No need for copying and pasting issues to user prompt or by using extra MCP tools to access the issues. These you can still do but this approach is on purpose different.

Goals

Add Github issue support
Proof the concept/approach by apply the approach on itself using Github issues for tracking and development of new features
Add support for Bugzilla and Redmine using this approach in the process of doing it. Record a video of it.
Clean-up and test the implementation and create some documentation
Create a blog post about this approach

Resources

There is a prototype implementation here. This currently sort of works with JIRA only.

Looking for hackers with the skills:

This project is part of:

Activity

Comments

over 11 years ago by aplanas | Reply

over 11 years ago by froh | Reply

almost 11 years ago by osynge | Reply

Similar Projects

machinelearning

Song Search with CLAP by gcolangiuli

Description

Goals

Result

References

git

Mail client with mailing list workflow support in Rust by acervesato

Description

Goals

Resources

go-git: unlocking SHA256-based repository cloning ahead of git v3 by pgomes

Description

Goals

Stretch goals

Resources

Create a page with all devel:languages:perl packages and their versions by tinita

Description

Goals

Resources

Results

Day 1

Day 2

Day 3

Day 4

git-fs: file system representation of a git repository by fgonzalez

Description

Goals

Resources

github

Is SUSE Trending? Popularity and Developer Sentiment Insight Using Native AI Capabilities by terezacerna

Description

Goals

Resources

The Agentic Rancher Experiment: Do Androids Dream of Electric Cattle? by moio

The Plan

Why?

The CONCLUSION!!!

issuefs: FUSE filesystem representing issues (e.g. JIRA) for the use with AI agents code-assistants by llansky3

Description

Goals

Resources