Paper ID | ARS-2.11 | ||
Paper Title | LEARNING EVENT REPRESENTATIONS FOR TEMPORAL SEGMENTATION OF IMAGE SEQUENCES BY DYNAMIC GRAPH EMBEDDING | ||
Authors | Mariella Dimiccoli, Institut de Robòtica i Informàtica Industrial (CSIC-UPC), Spain; Herwig Wendt, University of Toulouse, France | ||
Session | ARS-2: Image and Video Segmentation | ||
Location | Area I | ||
Session Time: | Monday, 20 September, 15:30 - 17:00 | ||
Presentation Time: | Monday, 20 September, 15:30 - 17:00 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding | ||
Abstract | Recently, self-supervised learning has proved to be effective to learn event representations suitable for temporal segmentation in image sequences, where events are understood as sets of temporally adjacent images semantically perceived as a whole. However, although this approach does not require manual annotations, it is data hungry and suffers from domain adaptation problems. As an alternative, we propose a novel approach for learning event representations named Dynamic Graph Embedding (DGE), that does not require any training set. The assumption underlying our model is that a sequence of images can be represented by a graph that encodes both semantic and temporal similarity. The DGE key novelty is to learn jointly the graph and its graph embedding. At its core, DGE works by iterating over two steps: 1) updating the graph representing the semantic and temporal data similarity based on the current data representation, and 2) updating the data representation to take into account the current data graph structure. Experimental results on EDUBSeg and EDUBSeg-Desc benchmark datasets demonstrate that the proposed DGE leads to event representations effective for temporal segmentation, outperforming the state of the art. Additional experiments on two Human Motion Segmentation datasets demonstrate the generalization capabilities of the proposed DGE. |