Paper ID | ARS-7.3 | ||
Paper Title | CONTEXTUAL LABEL TRANSFORMATION FOR SCENE GRAPH GENERATION | ||
Authors | Wonhee Lee, Samsung Electronics, Republic of Korea; Sungeun Kim, Gunhee Kim, Seoul National University, Republic of Korea | ||
Session | ARS-7: Image and Video Interpretation and Understanding 2 | ||
Location | Area H | ||
Session Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | For scene graph generation, it is crucial to properly understand the relationships of objects within the context of the image. We design a label transformation method using a Transformer-VAE (Variational Autoencoder) structure, which converts bounding box labels into auxiliary labels that contain each object's context in an unsupervised manner. The auxiliary labels are then trained jointly with bounding box labels and relation labels in a multi-task way. Our approach does not require any external datasets or language prior and is applicable to any graph generation models that infer the relationship between pairs of objects. We validate our method's effectiveness and scalability with state-of-the-art scene graph generation models on VRD and VG datasets. |