Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:
- Small / unimportant fix
- Big / important fix
- Small / important feature
- Big / important feature
For this project I will
- Generate a basic corpus of labeled data from a different set of project related with openSUSE
- Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
- Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...
Looking for hackers with the skills:
This project is part of:
Hack Week 10 Hack Week 11 Hack Week 12
Activity
Comments
Similar Projects
Create a page with all devel:languages:perl packages and their versions by tinita
Description
Perl projects now live in git: https://src.opensuse.org/perl
It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.
I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta
But I never had the time to automate this.
I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).
Goals
- Have a script that updates a central repository (e.g.
https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run) - Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)
Resources