Paper ID | MLR-APPL-IVASR-1.7 | ||
Paper Title | JOINT LEARNING ON THE HIERARCHY REPRESENTATION FOR FINE-GRAINED HUMAN ACTION RECOGNITION | ||
Authors | Mei Chee Leong, Hui Li Tan, Institute for Infocomm Research (I2R), A*STAR, Singapore; Haosong Zhang, Nanyang Technological University, Singapore; Liyuan Li, Institute for Infocomm Research (I2R), A*STAR, Singapore; Feng Lin, Nanyang Technological University, Singapore; Joo-Hwee Lim, Institute for Infocomm Research (I2R), A*STAR, Singapore | ||
Session | MLR-APPL-IVASR-1: Machine learning for image and video analysis, synthesis, and retrieval 1 | ||
Location | Area D | ||
Session Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation Time: | Monday, 20 September, 13:30 - 15:00 | ||
Presentation | Poster | ||
Topic | Applications of Machine Learning: Machine learning for image & video analysis, synthesis, and retrieval | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Fine-grained human action recognition is a core research topic in computer vision. Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym hierarchy representation to achieve effective joint learning and prediction for fine-grained human action recognition. The multi-task network consists of three pathways of SlowOnly networks with gradually increased frame rates for events, sets and elements of fine-grained actions, followed by our proposed integration layers for joint learning and prediction. It is a two-stage approach, where it first learns deep feature representation at each hierarchical level, and is followed by feature encoding and fusion for multi-task learning. Our empirical results on the FineGym dataset achieve a new state-of-the-art performance, with 91.80% Top-1 accuracy and 88.46% mean accuracy for element actions, which are 3.40% and 7.26% higher than the previous best results. |