Scale Normalized Image Pyramids with AutoFocus for Object Detection

Singh, Bharat, Najibi, Mahyar, Sharma, Abhishek, Davis, Larry S.

Feb-10-2021–arXiv.org Artificial Intelligence

We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3 times speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end, we adopt a coarse-to-fine approach, and predict the locations and extent of object-like regions which will be processed in successive scales of the image pyramid. Intuitively, it's akin to our active human-vision that first skims over the field-of-view to spot interesting regions for further processing and only recognizes objects at the right resolution. The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.

detection, detector, resolution, (13 more...)

arXiv.org Artificial Intelligence

Feb-10-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas > Travis County
    - Austin (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - Maryland > Prince George's County
    - College Park (0.04)
- Asia
  - Middle East > Iran (0.04)
  - India > Uttarakhand
    - Roorkee (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Machine Learning
      - Performance Analysis > Accuracy (0.46)
      - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found