Ever wish you could automatically dub foreign film dialogue into another tongue? Amazon is on the case. In a paper published this week on the preprint server Arxiv.org, researchers from the tech giant detailed a novel “speech-to-speech” pipeline that taps AI to align translated speech with original speech and fine-tune speech duration before adding background noise and reverberation. They say that it improves the perceived naturalness of dubbing and highlights the relative importance of each proposed step.
As the paper’s coauthors note, automatic dubbing involves transcribing speech to text and translating that text into another language before generating speech from the translated text. The challenge isn’t simply conveying the same content of the source audio, but matching the original timbre, emotion, duration, prosody (i.e., patterns of rhythm and sound), background noise, and reverberation.