Grounded Situation Recognition is the task of identifying the situation observed in the image and also visually ground the identified roles within the corresponding image.
JSL is a method to simultaneously classify a situation and locate objects in that situation. This allows for a role’s noun and grounding to be conditioned on the nouns and groundings of previous roles and the verb. It also allows features to be shared and potential patterns between nouns and positions to be exploited.