Paper ID | ARS-7.6 | ||
Paper Title | Disentangling Latent Groups of Factors | ||
Authors | Nakamasa Inoue, Ryota Yamada, Tokyo Institute of Technology, Japan; Rei Kawakami, Ikuro Sato, Tokyo Institute of Technology / Denso IT Laboratory, Inc., Japan | ||
Session | ARS-7: Image and Video Interpretation and Understanding 2 | ||
Location | Area H | ||
Session Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | This paper proposes a framework for training variational autoencoders (VAEs) for image distributions that have latent groups of factors. Our key idea is to introduce a mechanism to predict the factor group an image belongs to while simultaneously disentangling factors in it. More specifically, we propose an architecture consisting of three components: an encoder, a decoder, and a factor-group prediction header. The first two components are trained with a VAE objective, and the last one is trained with the proposed algorithm using the loss of unsupervised contrastive learning. In experiments, we designed a task in which more than one group of factors were entangled by combining multiple datasets and demonstrated the effectiveness of the proposed framework. The Mutual Information Gap score was improved from 0.089 to 0.125 on a merged dataset of Color-dSprites, 3DShapes, and MPI3D. |