Paper ID | 3D-3.5 | ||
Paper Title | STAD: STABLE VIDEO DEPTH ESTIMATION | ||
Authors | Hyunmin Lee, Jaesik Park, Pohang University of Science and Technology, Republic of Korea | ||
Session | 3D-3: Stereoscopic and multiview processing | ||
Location | Area J | ||
Session Time: | Wednesday, 22 September, 14:30 - 16:00 | ||
Presentation Time: | Wednesday, 22 September, 14:30 - 16:00 | ||
Presentation | Poster | ||
Topic | Three-Dimensional Image and Video Processing: Stereoscopic and multiview processing and display | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | We present a method for estimating temporally stable depth video from a sequence of images. We extend the prior work aimed at video depth estimation, Neural-RGBD, which proposed to use temporal information by accumulating a depth probability volume over time. We propose three simple yet effective ideas to gain improvement: (1) temporal attention module to select and propagate only the meaningful temporal information, (2) geometric warping operation to warp neighbor features in the way of preserving geometry cues, and (3) scale-invariant loss to relieve the inherent scale ambiguity problem in monocular depth estimation task. We demonstrate the efficiency of proposed ideas by comparing our proposed network STAD with the state-of-the-arts. Moreover, we compare STAD with its per-frame network STAD-frame to show the importance of utilizing temporal information. The experimental results show that STAD significantly improved the baseline accuracy without a large parameter increase. |