Abstract:
With accurate lip-sync of speaker independent, we synthesize several high-quality videos for sake
of generating expected target video clip from the composite synthesized video. In our work we
explore several related works of lip-syncing, out-of-sync, talking face generation, speaker
independent target video content creation from input audio stream containing some limitation &
failure and we also implement as better lip-sync by training our models which is not rely on
specific speaker. Besides, in this work we detect key reason for the mentioned problems and
improve the difficult factors with new evaluation strategies then solve as better output returning
like Wav2lip model. By the way, more realistic matched lip-syncing appearance of any individual
speaking video from any voices or input audio clip along with mapping RNN is an incredible
outcome since it can generate proper mouth texture.