Paper ID | SS-MRII.5 | ||
Paper Title | A HYBRID TWO-STREAM APPROACH FOR MULTI-PERSON ACTION RECOGNITION IN TOP-VIEW 360 DEGREE VIDEOS | ||
Authors | Karen Stephen, Jianquan Liu, Vivek Barsopia, NEC Corporation, Japan | ||
Session | SS-MRII: Special Session: Models and representations for Immersive Imaging | ||
Location | Area A | ||
Session Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation Time: | Wednesday, 22 September, 08:00 - 09:30 | ||
Presentation | Poster | ||
Topic | Special Sessions: Models and Representations for Immersive Imaging | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Action recognition in top-view 360 degree videos is an emerging research topic in computer vision. Existing work utilizes a global projection method to transform 360 degree video frames to panorama frames for further processing. However, this unwrapping suffers from a problem of geometric distortion i.e., people present near the centre in the 360 degree video frames appear highly stretched and distorted in the corresponding panorama frames (observed in 37.5% of the total panorama frames for 360Action dataset). Thus, recognizing the actions of people who are near the centre becomes difficult, thereby affecting the overall action recognition performance. In this work, we overcome the above challenge by utilizing distortion-free person-centric images of the persons near the centre, extracted directly from the input 360 degree video frames. We propose a simple yet effective hybrid two-stream architecture consisting of a panorama stream and a person-centric stream where predictions from both streams are combined together to detect the overall actions in a video. We perform experiments to validate the efficacy of the proposed method on the recently introduced 360Action dataset and achieve an overall improvement of 2.3% mAP compared to state-of-the-art method and a maximum improvement of 22.7% AP for pickup action, which happens mostly near the centre. |