Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:

  • Small / unimportant fix
  • Big / important fix
  • Small / important feature
  • Big / important feature

For this project I will

  1. Generate a basic corpus of labeled data from a different set of project related with openSUSE
  2. Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
  3. Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...

Looking for hackers with the skills:

nlp machinelearning git github

This project is part of:

Hack Week 10 Hack Week 11 Hack Week 12

Activity

  • over 6 years ago: jordimassaguerpla liked this project.
  • almost 7 years ago: nicolasbock liked this project.
  • about 8 years ago: ZRen disliked this project.
  • about 8 years ago: ZRen liked this project.
  • about 8 years ago: bkutil liked this project.
  • over 8 years ago: cschum liked this project.
  • over 8 years ago: froh joined this project.
  • over 8 years ago: vitezslav_cizek liked this project.
  • over 8 years ago: froh liked this project.
  • over 8 years ago: oholecek liked this project.
  • over 9 years ago: aplanas liked this project.
  • over 9 years ago: aplanas started this project.
  • over 9 years ago: aplanas added keyword "nlp" to this project.
  • over 9 years ago: aplanas added keyword "machinelearning" to this project.
  • over 9 years ago: aplanas added keyword "git" to this project.
  • over 9 years ago: aplanas added keyword "github" to this project.
  • over 9 years ago: aplanas originated this project.

  • Comments

    • aplanas
      over 8 years ago by aplanas | Reply

      Yeah. Hackweek 10 collied with openSUSE 13.1, so I will try to for on this during this new Hackweek instance : )

    • froh
      over 8 years ago by froh | Reply

      Would it be hard to train for regression fix vs new feature, based on the comment? I'd be curious how much energy project have to put into regression fixes vs feature additions.

    • osynge
      about 8 years ago by osynge | Reply

      Have you considered looking at ELK and integrating this work in the ELK stack.

    Similar Projects

    Sign me off|in by mkoutny

    [comment]: # (Please use the project descriptio...


    obs_scm_demo by smithfarm

    Project Description

    For a long time, I ha...


    Explore the integration between OBS and GitHub by pdostal

    Project Description

    The goals:

    1) When...


    Containerized git server/client for playground and tutorials by mberti

    [comment]: # (Please use the project descriptio...


    OSSelot collaboration on legal reviews of community packages by lkocman

    [comment]: # (Please use the project descriptio...


    Expanding e2e tests on Windows self-hosted runners via CI by iguimaraes

    Achievements:

    • Documented all necessary s...


    obs_scm_demo by smithfarm

    Project Description

    For a long time, I ha...