IEEE ICIP 2021 || Anchorage, Alaska, USA || 19-22 September 2021

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

ARS-9.4

Paper Title

INTER-MODALITY FUSION BASED ATTENTION FOR ZERO-SHOT CROSS-MODAL RETRIEVAL

Authors

Bela Chakraborty, Peng Wang, Lei Wang, University of Wollongong, Australia

Session

ARS-9: Interpretation, Understanding, Retrieval

Location

Area I

Session Time:

Tuesday, 21 September, 13:30 - 15:00

Presentation Time:

Tuesday, 21 September, 13:30 - 15:00

Presentation

Poster

Topic

Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

Zero-shot cross-modal retrieval (ZS-CMR) performs the task of cross-modal retrieval where the classes of test categories have a different scope than the training categories. It borrows the intuition from zero-shot learning which targets to transfer the knowledge inferred during the training phase for seen classes to the testing phase for unseen classes. It mimics the real-world scenario where new object categories are continuously populating the multi-media data corpus. Unlike existing ZS-CMR approaches which use generative adversarial networks (GANs) to generate more data, we propose Inter-Modality Fusion based Attention (IMFA) and a framework ZS_INN_FUSE(Zero-Shot cross-modal retrieval using INNer product with image-text FUSEd). It exploits the rich semantics of textual data as guidance to infer additional knowledge during the training phase. This is achieved by generating attention weights through the fusion of image and text modalities to focus on the important regions in an image. We carefully create a zero-shot split based on the large-scale MS-COCO and Flickr30k datasets to perform experiments. The results show that our method achieves improvement over the ZS-CMR baseline and self-attention mechanism, demonstrating the effectiveness of inter-modality fusion in a zero-shot scenario.

2021 IEEE International Conference on Image Processing

19-22 September 2021 • Anchorage, Alaska, USA

Imaging Without Borders

2021 IEEE International Conference on Image Processing

19-22 September 2021 • Anchorage, Alaska, USA

My ICIP 2021 Schedule

Paper Detail