Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDMLR-APPL-IVASR-1.8
Paper Title HIGH-ORDER JOINT INFORMATION INPUT FOR GRAPH CONVOLUTIONAL NETWORK BASED ACTION RECOGNITION
Authors Wen-Nung Lie, Yong-Jhu Huang, Jui-Chiu Chiang, Zhen-Yu Fang, National Chung Cheng University, Taiwan
SessionMLR-APPL-IVASR-1: Machine learning for image and video analysis, synthesis, and retrieval 1
LocationArea D
Session Time:Monday, 20 September, 13:30 - 15:00
Presentation Time:Monday, 20 September, 13:30 - 15:00
Presentation Poster
Topic Applications of Machine Learning: Machine learning for image & video analysis, synthesis, and retrieval
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Graph Convolution Network (GCN)-based networks for human action recognition, accepting 3D skeleton sequence as input, have gained much attention and good performances recently. In this paper, joint information enhanced with rich higher-order features/attributes is proposed to lift up their recognition performances. All joints in a spatio-temporal skeleton are described in terms of a set of 3-component vectors by referring up to 3 joint neighbors in the spatio-temporal domain. The referred joints are physically connected in spatial or corresponded in temporal domain. Our rich high-order joint information is fed as inputs to two kinds of GCN-based networks in two ways: early fusion and late fusion. Early fusion is to concatenate these 3-components vectors as different channels at input nodes and late fusion is to feed each 3-component vector to a multi-stream GCN network separately and then fuse the output from each stream for action recognition decision. We also propose to cascade a view-adaptive (VA) sub-network to further promote the performance. Experiments show that our approach is capable of boosting the accuracy of original GCN networks in both early or late fusion styles by up to 1.57% and 2.55%, respectively (in cross-subject (CS) protocol) when using NTU RGB-D 60 dataset for evaluations.