Project Description

Speech Emotion Recognition (SER) uses acoustic/prosodic features of speech to classify words/sentences/audio files into emotions e.g. happiness, anger, sadness etc [1]. Emotions can also be mapped into a 2-dimensional physiological space of emotional positivity(valence) and strength(arousal) [2].

Semantic/lexical information may also be used independently or additionally, but this is not the primary focus here (for a more lexical approach, there is another hackweek project [3])

Goal for this Hackweek

I will attempt to use pyAudioAnalysis [4] and pytorch/scikit-sklearn to do supervised learning of audio files (from well-known audio/emotion databases [5]), mapping them to emotion classes. I will also study how continuous variables such as valence/arousal can be extracted. I will investigate SVMs and neural networks. My initial focus will be to understand the methods at a conceptual level, but I will also try to use a GPU (Nvidia Cuda or AMD Rocm) instead of a CPU if possible.

Unsupervised learning would be a further step. An even further step would be to do this for online/realtime voice recordings such as [6]



[2] POSNER, J., RUSSELL, J., & PETERSON, B. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and Psychopathology, 17(3), 715-734. doi:10.1017/S0954579405050340






Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 20


  • over 3 years ago: stefannica liked this project.
  • over 3 years ago: vliaskovitis originated this project.

  • Comments

    Be the first to comment!

    Similar Projects

    This project is one of its kind!