IEEE ICIP 2021 || Anchorage, Alaska, USA || 19-22 September 2021

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

ARS-7.9

Paper Title

ACTION RELATIONAL GRAPH FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION

Authors

Yi Cheng, Ying Sun, Dongyun Lin, Joo-Hwee Lim, Institute for Infocomm Research, Singapore

Session

ARS-7: Image and Video Interpretation and Understanding 2

Location

Area H

Session Time:

Wednesday, 22 September, 08:00 - 09:30

Presentation Time:

Wednesday, 22 September, 08:00 - 09:30

Presentation

Poster

Topic

Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

The task of weakly-supervised temporal action localization (WTAL) is to recognize plentiful unstructured actions in untrimmed videos with only video-level class labels. As various actions may occur in an untrimmed video, it is desirable to capture the correlation among different actions to effectively identify the target actions. In this paper, we propose a novel Action Relational Graph Network (ARG-Net) to model the correlation between action labels. Specifically, we build a co-occurrence graph using Graph Convolutional Network (GCN), where the graph nodes and edges are represented by word embedding of action labels and relations between two labels, respectively. Then we apply the GCNs to project the action label embeddings into a set of correlated action classifiers which are multiplied with the learned video representations for video-level classification. To facilitate discriminative video representation learning, we employ the attention mechanism to model the probability of a frame containing action instances. A new Action Normalization Loss (ANL) is proposed to further alleviate the confusion from irrelevant background frames (\ie, frames containing no actions). Experimental results on THUMOS14 and ActivityNet1.2 datasets demonstrate that our ARG-Net outperforms the state-of-the-art methods.

2021 IEEE International Conference on Image Processing

19-22 September 2021 • Anchorage, Alaska, USA

Imaging Without Borders

2021 IEEE International Conference on Image Processing

19-22 September 2021 • Anchorage, Alaska, USA

My ICIP 2021 Schedule

Paper Detail