Text-to-Image Generation is the task of generating an image conditioned on the input text.
An extension to LXMERT with training refinements including: discretizing visual representations, using uniform masking with a large range of masking ratios and aligning the right pre-training datasets to the right objectives which enables it to paint.