Abstract:
The main object of this project is to generate a human face based on a given description.
Generating images from text plays a huge role for public safety. This is a sub domain of
text-to-image synthesis. This project is a part of deep learning. This project uses two latest
architectures for generating images from text. StackGAN and ProGAN for synthesis of the
image from the given description. We are using a dataset named ‘Face2Text’, which
contains 400 facial images and textual captions for each of the image. The key idea is to
grow both the generator and discriminator progressively: starting from a low resolution,
we add new layers that model increasingly fine details as training progresses. The final
generated output is not that high quality but comparable with the given input.