A team of researchers at the University of Eastern Finland say voice biometrics are vulnerable to spoofing attacks. The research warns that the vulnerability of speaker recognition systems poses significant security concerns.
Nowadays, mobile devices are increasingly equipped with applications that function with voice commands, the researchers note. The user is able to dictate messages, translate phrases and do search queries by voice only. The widespread use of electronic services has increased the demand of applications that use voice to recognize the speaker either for authentication purposes or for public safety. However, with the popularity of voice applications, their misuse may also increase.
Voice attacks against speaker recognition can be done using technical means, such as voice conversion, speech synthesis and replay attacks, the researchers say. The scientific community is systematically developing techniques and countermeasures against technically generated attacks. However, voice modifications produced by a human, such as impersonation and voice disguise, cannot be easily detected with the developed countermeasures.
"Voice impersonation is common in the entertainment industry where professionals and amateurs are able to copy voice characteristics and speech behavior of other speakers, usually public figures," according to the research. "An easier way of voice modification is voice disguise where speakers modify their voices to avoid being recognized as themselves. The latter type of modification is common in situations that do not require face-to-face communications and may vary from innocent prank calls to crimes such as blackmailing or threatening calls. Consequently, this issue prompts an interest to improve the robustness of speaker recognition against human-induced voice modifications."
The study analyzed speech from two professional impersonators who mimicked eight Finnish public figures. Additionally, the study of voice disguise included acted speech from 60 Finnish speakers who participated in two recording sessions. The speakers were asked to modify their voices to fake their age, attempting to sound like an old person and like a child. The study found that impersonators were able to fool automatic systems and listeners in mimicking some speakers. In the case of acted speech, a successful strategy for voice modification was to sound like a child, as both automatic systems' and listeners' performance degraded with this type of disguise.