Paper ID | ARS-8.2 | ||
Paper Title | UNSUPERVISED DISCRIMINATIVE EMBEDDING FOR SUB-ACTION LEARNING IN COMPLEX ACTIVITIES | ||
Authors | Swetha Sirnam, University of Central Florida, United States; Hilde Kuehne, MIT-IBM Watson AI Lab/CVAI Group, Goethe University Frankfurt, United States; Yogesh S Rawat, Mubarak Shah, University of Central Florida, United States | ||
Session | ARS-8: Image and Video Mid-Level Analysis | ||
Location | Area I | ||
Session Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Mid-Level Analysis | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | This paper proposes a novel approach for unsupervised sub-action learning in complex activities. The proposed method maps both visual and temporal representations to a latent space where the sub-actions are learnt discriminatively in an end-to-end fashion. To this end, we propose to learn sub-actions as latent concepts and a novel discriminative latent concept learning (DLCL) module aids in learning sub-actions. The proposed DLCL module lends on the idea of latent concepts to learn compact representations in the latent embedding space in an unsupervised way. The result is a set of latent vectors that can be interpreted as cluster centers in the embedding space. Our joint embedding learning with discriminative latent concept module is novel which eliminates the need for explicit clustering. We validate our approach on three benchmark datasets and show that the proposed combination of visual-temporal embedding and discriminative latent concepts allow to learn robust action representations in unsupervised setting. |