Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDMLR-APPL-IVASR-2.8
Paper Title HIERARCHICAL EMBEDDING GUIDED NETWORK FOR VIDEO OBJECT SEGMENTATION
Authors Chin-Hsuan Shih, Wen-Jiin Tsai, National Yang Ming Chiao Tung University, Taiwan
SessionMLR-APPL-IVASR-2: Machine learning for image and video analysis, synthesis, and retrieval 2
LocationArea D
Session Time:Monday, 20 September, 15:30 - 17:00
Presentation Time:Monday, 20 September, 15:30 - 17:00
Presentation Poster
Topic Applications of Machine Learning: Machine learning for image & video analysis, synthesis, and retrieval
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Semi-supervised video object segmentation is to segment the target objects given the ground truth annotation of the first frame. Previous successful methods mostly rely on online learning or static image pre-train to improve accuracy. However, online learning methods require huge time costs at inference time, thus restrict their practical use. Methods with static image pre-train require heavy data augmentation that is complicated and time-consuming. This paper presents a fast Hierarchical Embedding Guided Network (HEGNet) which is only trained on Video Object Segmentation (VOS) datasets and does not utilize online learning. Our HEGNet integrates propagation-based and matching-based methods. It propagates the predicted mask of the previous frame as a soft cue and extracts hierarchical embedding at both deep and shallow layers to do feature matching. The produced label map of the deep layer is also used to guide the matching of the shallow layer. We evaluated our method on the DAVIS-2016 and DAVIS-2017 validation sets and achieved overall scores of 84.9% and 71.9% respectively. Our method surpasses the methods without online learning and static image pre-train and runs at 0.08 seconds per frame.