The advancements in artificial intelligence research, particularly in computer
vision, have led to the development of previously unimaginable applications,
such as generating new contents based on text description. In our work we
focused on the text-to-image synthesis applications (TIS) field, to transform
descriptive sentences into a real image. To tackle this issue, we use
unsupervised deep learning networks that can generate high quality images
from text descriptions, provided by eyewitnesses to assist law enforcement
in their investigations, for the purpose of generating probable human faces.
We analyzed a number of existing approaches and chose the best one. Deep
fusion generative adversarial networks (DF-GAN) is the network that
performs better than its peers, at multiple levels, like the generated image
quality or the respect of the giving descriptive text. Our model is trained on
the CelebA dataset and text descriptions (generated by our algorithm using
existing attributes in the dataset). The obtained results from our
implementation show that the learned generative model makes excellent
quantitative and visual performances, the model is capable of generating
realistic and diverse samples for human faces and create a complete portrait
with respect of given text description.