AI could make dodgy lip sync dubbing
a thing of the past
August 20, 2018
have developed a system using artificial intelligence that can edit the
facial expressions of actors to accurately match dubbed voices, saving
time and reducing costs for the film industry. It can also be used to
correct gaze and head pose in video conferencing, and enables new
possibilities for video postproduction and visual effects.
The technique was developed by an international team led by a group from
the Max Planck Institute for Informatics and including researchers from
the University of Bath, Technicolor, TU Munich and Stanford University.
The work, called Deep Video Portraits, was presented for the first time
at the SIGGRAPH 2018 conference in Vancouver on 16th August.
Unlike previous methods that are focused on movements of the face
interior only, Deep Video Portraits can also animate the whole face
including eyes, eyebrows, and head position in videos, using controls
known from computer graphics face animation. It can even synthesise a
plausible static video background if the head is moved around.
Hyeongwoo Kim from the Max Planck Institute for Informatics explains:
"It works by using model-based 3D face performance capture to record the
detailed movements of the eyebrows, mouth, nose, and head position of
the dubbing actor in a video. It then transposes these movements onto
the 'target' actor in the film to accurately sync the lips and facial
movements with the new audio."
The research is currently at the proof-of-concept stage and is yet to
work at real time, however the researchers anticipate the approach could
make a real difference to the visual entertainment industry.
Professor Christian Theobalt, from the Max Planck Institute for
Informatics, said: "Despite extensive post-production manipulation,
dubbing films into foreign languages always presents a mismatch between
the actor on screen and the dubbed voice.
"Our new Deep Video Portrait approach enables us to modify the
appearance of a target actor by transferring head pose, facial
expressions, and eye motion with a high level of realism."
Co-author of the paper, Dr Christian Richardt, from the University of
Bath's motion capture research centre CAMERA, adds: "This technique
could also be used for post-production in the film industry where
computer graphics editing of faces is already widely used in today's
A great example is 'The Curious Case of Benjamin Button' where the face
of Brad Pitt was replaced with a modified computer graphics version in
nearly every frame of the movie. This work remains a very time-consuming
process, often requiring many weeks of work by trained artists.
"Deep Video Portraits shows how such a visual effect could be created
with less effort in the future. With our approach even the positioning
of an actor's head and their facial expression could be easily edited to
change camera angles or subtly change the framing of a scene to tell the
In addition, this new approach can also be used in other applications,
which the authors show on their project website, for instance in video
and VR teleconferencing, where it can be used to correct gaze and head
pose such that a more natural conversation setting is achieved. The
software enables many new creative applications in visual media
production, but the authors are also aware of the potential of misuse of
modern video editing technology.
Michael Zollhöfer, from Stanford University, explains: "The media
industry has been touching up photos with photo-editing software for
many years, meaning most of us have learned to take what we see in
photos with a pinch of salt. With ever improving video editing
technology, we must also start being more critical about the video
content we consume every day, especially if there is no proof of origin.
We believe that the field of digital forensics should and will receive a
lot more attention in the future to develop approaches that can
automatically prove the authenticity of a video clip. This will lead to
ever better approaches that can spot such modifications even if we
humans might not be able to spot them with our own eyes."
To address this, the research team is using the same technology to
develop in tandem neural networks trained to detect synthetically
generated or edited video at high precision to make it easier to spot
forgeries. The authors have no plans to make the software publicly
available but state that any software implementing the many creative use
cases should include watermarking schemes to clearly mark modifications.