This document proposes AttnGAN, an Attentional Generative Adversarial Network for fine-grained text-to-image generation. AttnGAN uses an attentional generative network with multiple generators that produce higher resolution images at each stage. It attends to relevant words for different image regions using attention models. AttnGAN also uses a Deep Attentional Multimodal Similarity Model to compute an image-text matching loss for training. Experimental results show AttnGAN significantly outperforms previous methods on benchmark datasets.