In this project, I use computational methods to predict human ratings of singing and speech samples based solely on audio input. This work will help future audio researchers assess the quality of their recordings without relying on human listeners. This project consists of two phases. First, over the course of the summer, I have put together a user study online on Qualtrics with participants coming from the NYU Psychology SONA program. I will be gathering 4000 user evaluations of 1000 singing voice and speech recordings. The singing voice recordings are sourced from the DAMP (Digital Archive of Mobile Performances) karaoke dataset, and the spoken recordings are from the LibriSpeech dataset. Users will evaluate recordings, responding to five questions including inquiries about the recording’s quality and the performer’s skill level, likeability, enthusiasm, authenticity, emotional expression, and intensity. The second step will be to use audio signal processing tools and statistical methods to estimate these user ratings based solely on the audio files. This project builds directly upon my dissertation on digital analyses of singing voice and speech, exploring the intersection of music, speech, sound recording, and statistical methods. The singing voice portion of this project will advance the field of musicology and demonstrate the potential for digital humanities methods in the field, while the speech portion of the project holds relevance for the technology industry, and highlights parallels between singing and speaking voices.
Evaluating Voices
A Computational Analysis of Skill and Expression in Singing and Narration