Paper ID | ARS-3.6 | ||
Paper Title | Temporal-spatial Deformable Pose Network for Skeleton-based Gesture Recognition | ||
Authors | Honghui Lin, Jiale Cheng, Yu Li, Xin Zhang, South China University of Technology, China | ||
Session | ARS-3: Image and Video Biometric Analysis | ||
Location | Area H | ||
Session Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation | Poster | ||
Topic | Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Gesture recognition is a challenging research topic, and also has wide range of potential applications in our daily life. With the development of hardware and advanced algorithms, we can easily extract skeleton data from video sequences and apply them for the recognition task. In this paper, we propose a novel temporal-spatial deformable pose network to leverage space and time information together. Our proposed network can automatically locate most correlated joints across multiple frames and extract features accordingly. Additionally, we introduce a parallel multi-scale convolutional layer with different dilation rates, which can capture multi-term temporal information efficiently. We have conducted experiments on MSRC-12, ChaLearn 2013, and ChaLearn 2016 datasets and our proposed method outperforms state-of-the-art methods. Moreover, Additional experiments showed that our proposed module is more robust to handle noise data and dynamic gestures with various temporal scales. |