Visual Question Answering

Visual Question Answering (VQA) is the task of generating a answer in response to a natural language question about the contents of an image. VQA models are typically trained and evaluated on datasets such as VQA2.0, GQA, Visual7W and VizWiz.

Try it for yourself

1. Upload an Image and enter a Question (or choose from the examples)

2. Run a model

Pythia

Pythia v0.1: the Winning Entry to the VQA Challenge 2018Yu JiangVivek NatarajanXinlei ChenMarcus RohrbachDhruv BatraDevi ParikharXiv2018

This is a modular re-implementation of the bottom-up top-down (up-down) model (Anderson et al) with subtle but important changes to the model architecture and the learning rate schedule, finetuning image features, and adding data augmentation. This model was the winning entry to the VQA Challenge in 2018.