Use machine learning and natural language processing techniques to analyze the changes made in a project, and classify them in:
- Small / unimportant fix
- Big / important fix
- Small / important feature
- Big / important feature
For this project I will
- Generate a basic corpus of labeled data from a different set of project related with openSUSE
- Evaluate the best features to make a proper classification: n-gram, PoS tag, TF-IDF (with and without stemmer)
- Evaluate and measure the best classification model: Naive Bayes, Linear SVM, Max Entropy, ...
This project is part of:
Hack Week 10 Hack Week 11 Hack Week 12
This project is one of its kind!