Akaike AI logo
Image/Video Analytics Customer Experience Sales and Marketing Automation Learning and Development

AI and Deepfake Videos: Speech Synthesis and Lip Sync Model

Deepfake videos enable an Asian news broadcaster to bolster its expansion strategy with minimal investment

Designing an AI model for automated lip synchronization and image synthesis.

The Client Asian broadcaster

An Asian broadcaster with 1.3 billion viewers across 173 countries, with channels spanning entertainment, news, and sports wanted to expand its reach. They needed an out-of-the box solution to provide current affairs programs to an audience viewing content in 12 plus regional languages.

Top Benefits

  • Cost cut down on video campaigns
  • Omnichannel content
  • Hyper-personalized content for the audience
Book a Trial

Executive Summary

Industry Overview Disruption

In the post-pandemic ecosystem, with changing consumer habits, the industry is likely to focus on cost-efficiency, revenue enhancement opportunities, and profit protection with greater technology integration. For a 4-5 year period, the revenue growth globally is projected to be at 4.5% CAGR. Asia is expecting growth at 17% CAGR, and India at 11% CAGR over the same period amounting to INR 4.5 Trillion by 2023.

Business Challenge Customer Experience

This prominent English news brand wanted to expand into the fast-growing regional market and establish themselves as a premier news source regionally. They were looking to broadcast news programs that focused on local events and test their regional expansion strategy by re-using newsroom footage with AI-powered synthetic speech and lip-sync.

The Akaike Edge

Inbuilt libraries, DL models with transfer learning capabilities

Impact Delivered

  • Solution deployed in 12 regional languages
  • More than 260,000 videos worked upon
Book a Trial


Blend of Vision AI and Deep Learning

TTS and Video Synthesis

The broadcaster had more than 260,000 hours of video in its archives. Focusing on the reusability of the client’s media assets, from the available video footage, a few of the newsroom’s panel of anchors were selected.
Study and analysis of currently available video footage.

Post video selection

Post video selection, an AI recipe was whipped up for image synthesis and automated lip synchronization which blended Computer Vision, Deep Learning, and GAN (Generative Adversarial Networks) technology.
Designing an AI model for automated lip synchronization and image synthesis.

Custom Speech Solution built as per the speaking face video

The custom solution converted written text to natural-sounding speech. This was achieved by using deep neural networks trained on human speech to create human-like expressive speech. The target speech segment was then accurately adapted to a video with a speaking face using GAN.
The solution is custom-built for different faces using deepfake AI methods.
Find more
AGI Technosys company logo.
Agnext company logo.
Allegis Global Solutions company logo.
Arvind company logo.
Aspec Sciré company logo.
Miko company logo.
Uniphore company logo.
rocketium company logo.
SIEMENS company logo.
SixSense company logo.
Slintel company logo.
TVS company logo.
Author: Tarun Lohchab
Date: September 19, 2022