Object detection is the task of identifying and locating objects in an image. Object detection models are typically trained and evaluated on the MS-COCO dataset that has 80 object categories.
As the name “You Only Look Once” suggests, this is a standard single shot model. Only examining the features of the image once allows this model to detect objects incredibly quickly. While this comes at a small cost to accuracy, Yolo is nearly comparable in terms of accuracy to the larger two stage detectors.
This model is an example of a two stage detector, meaning it first selects regions which are likely to contain objects and then locates the objects within these regions. However, unlike some of its predecessors, Faster R-CNN is able to maintain a reasonable speed by sharing image features between its two stages. While much slower than single shot detector like yolo, it has a slightly higher accuracy.