Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:

  • Small / unimportant fix
  • Big / important fix
  • Small / important feature
  • Big / important feature

For this project I will

  1. Generate a basic corpus of labeled data from a different set of project related with openSUSE
  2. Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
  3. Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...

Looking for hackers with the skills:

nlp machinelearning git github

This project is part of:

Hack Week 10 Hack Week 11 Hack Week 12

Activity

  • almost 9 years ago: jordimassaguerpla liked this project.
  • over 9 years ago: nicolasbock liked this project.
  • over 10 years ago: ZRen disliked this project.
  • over 10 years ago: ZRen liked this project.
  • over 10 years ago: bkutil liked this project.
  • about 11 years ago: cschum liked this project.
  • about 11 years ago: froh joined this project.
  • about 11 years ago: vitezslav_cizek liked this project.
  • about 11 years ago: froh liked this project.
  • about 11 years ago: oholecek liked this project.
  • almost 12 years ago: aplanas liked this project.
  • almost 12 years ago: aplanas started this project.
  • almost 12 years ago: aplanas added keyword "nlp" to this project.
  • almost 12 years ago: aplanas added keyword "machinelearning" to this project.
  • almost 12 years ago: aplanas added keyword "git" to this project.
  • almost 12 years ago: aplanas added keyword "github" to this project.
  • almost 12 years ago: aplanas originated this project.

  • Comments

    • aplanas
      about 11 years ago by aplanas | Reply

      Yeah. Hackweek 10 collied with openSUSE 13.1, so I will try to for on this during this new Hackweek instance : )

    • froh
      about 11 years ago by froh | Reply

      Would it be hard to train for regression fix vs new feature, based on the comment? I'd be curious how much energy project have to put into regression fixes vs feature additions.

    • osynge
      over 10 years ago by osynge | Reply

      Have you considered looking at ELK and integrating this work in the ELK stack.

    Similar Projects

    Create a page with all devel:languages:perl packages and their versions by tinita

    Description

    Perl projects now live in git: https://src.opensuse.org/perl

    It would be useful to have an easy way to check which version of which perl module is in devel:languages:perl. Also we have meta overrides and patches for various modules, and it would be good to have them at a central place, so it is easier to lookup, and we can share with other vendors.

    I did some initial data dump here a while ago: https://github.com/perlpunk/cpan-meta

    But I never had the time to automate this.

    I can also use the data to check if there are necessary updates (currently it uses data from download.opensuse.org, so there is some delay and it depends on building).

    Goals

    • Have a script that updates a central repository (e.g. https://src.opensuse.org/perl/_metadata) with metadata by looking at https://src.opensuse.org/perl/_ObsPrj (check if there are any changes from the last run)
    • Create a HTML page with the list of packages (use Javascript and some table library to make it easily searchable)

    Resources