Paper ID | ARS-4.10 | ||
Paper Title | LEARNING REGIONAL ATTENTION OVER MULTI-RESOLUTION DEEP CONVOLUTIONAL FEATURES FOR TRADEMARK RETRIEVAL | ||
Authors | Osman Tursun, Simon Denman, Sridha Sridharan, Clinton Fookes, Queensland University of Technology, Australia | ||
Session | ARS-4: Re-Identification and Retrieval | ||
Location | Area I | ||
Session Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Storage and Retrieval | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Large-scale trademark retrieval is an important content-based image retrieval task. A recent study shows that off-the-shelf deep features aggregated with Regional-Maximum Activation of Convolutions (R-MAC) achieve state-of-the-art results. However, R-MAC suffers in the presence of background clutter/trivial regions and scale variance, and discards important spatial information. We introduce three simple but effective modifications to R-MAC to overcome these drawbacks. First, we propose the use of both sum and max pooling to minimise the loss of spatial information. We also employ domain-specific unsupervised soft-attention to eliminate background clutter and unimportant regions. Finally, we add multi-resolution inputs to enhance the scale-invariance of R-MAC. We evaluate these three modifications on the million-scale METU dataset. Our results show that all modifications bring non-trivial improvements, and surpass previous state-of-the-art results. |