Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDMLR-APPL-IVSMR-2.4
Paper Title Zero-Shot Object Detection with Transformers
Authors Ye Zheng, Institute of Computing Technology, Chinese Academy of Sciences;University of Chinese Academy of Sciences, China; Li Cui, Institute of Computing Technology, Chinese Academy of Sciences, China
SessionMLR-APPL-IVSMR-2: Machine learning for image and video sensing, modeling and representation 2
LocationArea D
Session Time:Tuesday, 21 September, 15:30 - 17:00
Presentation Time:Tuesday, 21 September, 15:30 - 17:00
Presentation Poster
Topic Applications of Machine Learning: Machine learning for image & video sensing, modeling, and representation
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Deep learning has significantly improved the precision of object detection with abundant labeled data. However, collecting sufficient data and labeling this data is extremely hard. Zero-shot object detection (ZSD) has been proposed to solve this problem which aims to simultaneously recognize and localize both seen and unseen objects. Recently, the transformer and its variant architectures have shown their effectiveness over conventional methods in many natural language processing and computer vision tasks. In this paper, we study the ZSD task and develop a new framework named zero-shot object detection with transformers (ZSDTR). ZSDTR consists of the head network, transformer encoder, transformer decoder and the vision-semantic-attention trail network. We find that the transformer is very effective for improving the ability to recall unseen unseen objects and the tail is used to discriminate seen and unseen objects. As far as we know, our ZSDTR is the first method to use transformer in ZSD task. Extensive experimental results on various zero-shot object detection benchmarks show that our ZSDTR outperforms the current state-of-the-art methods.