The Human Voice Confers Identity

Barbara Schuppler and Martin Hagmüller are working on the human voice at TU Graz – from very different perspectives and with a particular focus on the female voice.

The voice is an important carrier of one's own identity. Image source: AdobeStock

Language and voice create identity. We communicate using words. However, this topic is about much more than just individual words or sentences. With the way we speak, how our voice sounds and how comfortable we feel, we convey numerous other things paralinguistically – i.e. apart from what we say – such as our self-confidence, our feelings and the presence we project. We can fill spaces with our voice, or lose it because we had to speak too hard to be heard at all. For biological reasons, women and men have very different requirements and challenges. Barbara Schuppler and Martin Hagmüller from the Institute of Signal Processing and Speech Communication at Graz University of Technology (TU Graz) work with voice, language and communication – but on different levels. While Barbara Schuppler primarily conducts research into what constitutes speech and dialogue, Martin Hagmüller tries to give people who have a laryngeal impairment as personal a voice as possible.

Artificially natural dialogues

Among other things, Barbara Schuppler investigates dialogues between people in order to make conversations with social robots more natural. It is important that the system both understands human language and that it can respond in a natural way. “These models are primarily trained with standard language. And especially with English examples,” says Schuppler. “It is not the gender and therefore the pitch of the speaker’s voice that is decisive for comprehension, but factors such as speaking rate, slips of the tongue and dialects.” Over the past few years, she and her team have built up one of the largest databases of spontaneous speech, recording people in different gender combinations in spontaneous conversations without scripts. Based on this data, she was able to significantly reduce the error rate of speech recognition systems for spoken dialects. Barbara Schuppler also analyses how women and men behave differently in conversations: “People behave differently in conversations depending on who they are talking to, and we sometimes adapt more to our dialogue partners and sometimes less,” she explains. “We’ve found out different things in this process. For example, in the recorded data, there was significantly less overlapping speech in the conversations between two women than in those between two men and those between men and women. And there was significantly more laughter in the conversations between women and men than in same-sex conversations.”

If a chatbot is used to practice job interviews, for example, it would be important for the systems of the future to create dialogues that are as real as possible. These systems must then be able to structure dialogues accordingly and recognise how much time should be given to which topic and how turn-taking should take place. “We definitely need more research to be able to better adapt such systems to the social gender of the person they are talking to and make the dialogues more pleasant for everyone.”

A personal voice that matches your own identity

“Laryngeal diseases affect far fewer women than men, but the effects are usually worse for women,” explains Martin Hagmüller. People who no longer have a larynx after an illness only have limited possibilities to speak. What all these possibilities have in common is that the generated voice rarely sounds natural. And definitely not female. “For example, there is an electrolarynx that works very well. But because this problem mainly affects men, it is designed for low frequencies and also sounds very robotic. As we have observed in our projects, this is particularly bad for women because this new voice doesn’t match their physical identity at all.” The problem with this is that because so few women are affected, there has been little research and development into systems for female voices. Hagmüller and his team now want to change this using a new system that could make female and male voices more natural. “We want to use recordings of our own voices to reproduce them realistically.” In these times of social media and audio messages left on phones, there are very often recordings that researchers want to use for this purpose. “We want to achieve direct voice conversion without major latency. And this is precisely the big challenge. The project is just starting and we are excited to see what we can achieve.”

In addition to direct voice conversion, there is another goal. The project aims to make it possible to predict the voice colour or timbre that can be achieved after an upcoming operation or other treatment. This could make it easier to decide in favour of or against a particular intervention.

Female and male voices in the room

The researchers explain that the voice gives us identity and charisma, especially when acoustic conditions are inadequate. “Every sound consists of a fundamental frequency and a multiple of this frequency. If my fundamental frequency is low at 100 Hertz, then I also have components of my voice at 200, 300 and 400 Hertz. But if my basic frequency is higher – say at 200 or 300 Hertz, then I only have components at 300, 600 and 900 Hertz,” says Martin Hagmüller, explaining the fundamentals. What does this mean? Higher voices – typically female voices – carry less energy in the room and it takes considerably more power and effort to fill an entire room acoustically. And the consequences? The speaker sounds strained and becomes slightly hoarse. This in turn influences the effect, the charisma and self-confidence that the person speaking exudes. This also means that acoustically poorly planned rooms or a lack of microphones are more detrimental to the health of women’s vocal cords. A problem that can be easily solved. Martin Hagmüller: “Better acoustics or microphones, even in small rooms, fundamentally help both genders when speaking – it’s always much less strenuous and a better listening experience for the audience.”

Kontakt

Barbara SCHUPPLER
Ass.Prof. Mag.rer.nat. Dr.
b.schupplernoSpam@tugraz.at

Martin HAGMÜLLER
Dipl.-Ing. Dr.techn.
hagmuellernoSpam@tugraz.at

Institute of Signal Processing and Speech Communication
Inffeldgasse 16c | 8010 Graz

The Human Voice Confers Identity