Paper ID | ARS-4.9 | ||
Paper Title | SEMANTIC-PRESERVING METRIC LEARNING FOR VIDEO-TEXT RETRIEVAL | ||
Authors | Sungkwon Choo, Seong Jong Ha, Joonsoo Lee, NCSOFT, Republic of Korea | ||
Session | ARS-4: Re-Identification and Retrieval | ||
Location | Area I | ||
Session Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Storage and Retrieval | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Video-text retrieval requires finding an optimal space for comparing the similarity of two different modalities. Most approaches adopt ranking loss as a primary training objective to find the space. The loss is only interested in bringing the samples annotated as pairs closer to each other without considering the semantic relevance of different samples. This rather causes even semantically similar pairs not to get close. To deal with the problem, we propose semantic-preserving metric learning. The proposed method entails the metric space where the similarity ratio between samples is proportional to semantic relevance between annotations. In the extensive experiments on video-text datasets, the proposed method presents a close alignment between the learned metric space and the semantic space. It also demonstrates state-of-the-art retrieval performance. |