With time, video manipulation is becoming increasingly easier, especially with the advancement in AI (Artificial Intelligence). Researchers of Max Planck Institute for Informatics, Princeton University, and Stanford University in collaboration with Adobe created a new algorithm that allows changing human speech in the video by changing the transcript text.
Characteristics of a speaker are well-maintained while the video is altered. First, the phonemes and pronunciation of words are analyzed from the original video and then a model is created so the mouth of the speaker replicates those movements accordingly.
After the transcript is edited, the algorithm looks for the segments that comprise of those words’ lip movements. These movements are then replaced with the original.
When a certain part is replaced, it can have several pauses and distortions. To make it look smooth and inflow, the algorithm does its part.
Currently, the algorithm needs a minimum of 40 minutes of the original video for its training. A video (featured below) also has been released in which Stanford’s Ohad Fried explains an easy way to change phrases while maintaining its quality.
The biggest drawback of this tech is that people may spread misinformation/fake videos by editing speeches of politicians, or influential people. However, Fried thinks the photo editing software does the same, and still, we have been living through it.
On the other hand, Fried explained the positive use of it, as it can save the time of re-shooting when there is a minor fumble in the video or any other footage where humans are talking but it needs correction.
He also advice using various methods to keep the original content distinctive than manipulative either by adding watermarks or researchers can create methods to check whether the video has been edited or not. Improved forensic, like digital or non-digital fingerprinting techniques can be useful in identifying the manipulation in videos.
Though researchers are hoping that this algorithm will be used positively, but every thing come with its cons as well.
Read next: This AI-Generated Joe Rogan Voice Sounds Eerily Like the Real Thing
Characteristics of a speaker are well-maintained while the video is altered. First, the phonemes and pronunciation of words are analyzed from the original video and then a model is created so the mouth of the speaker replicates those movements accordingly.
After the transcript is edited, the algorithm looks for the segments that comprise of those words’ lip movements. These movements are then replaced with the original.
When a certain part is replaced, it can have several pauses and distortions. To make it look smooth and inflow, the algorithm does its part.
Currently, the algorithm needs a minimum of 40 minutes of the original video for its training. A video (featured below) also has been released in which Stanford’s Ohad Fried explains an easy way to change phrases while maintaining its quality.
The biggest drawback of this tech is that people may spread misinformation/fake videos by editing speeches of politicians, or influential people. However, Fried thinks the photo editing software does the same, and still, we have been living through it.
On the other hand, Fried explained the positive use of it, as it can save the time of re-shooting when there is a minor fumble in the video or any other footage where humans are talking but it needs correction.
He also advice using various methods to keep the original content distinctive than manipulative either by adding watermarks or researchers can create methods to check whether the video has been edited or not. Improved forensic, like digital or non-digital fingerprinting techniques can be useful in identifying the manipulation in videos.
Though researchers are hoping that this algorithm will be used positively, but every thing come with its cons as well.
Read next: This AI-Generated Joe Rogan Voice Sounds Eerily Like the Real Thing