Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDARS-1.4
Paper Title GUIDANCE AND TEACHING NETWORK FOR VIDEO SALIENT OBJECT DETECTION
Authors Yingxia Jiao, Wuhan University, China; Xiao Wang, Jiangxi University of Finance and Economics, China; Yu-Cheng Chou, Wuhan University, China; Shouyuan Yang, Jiangxi University of Finance and Economics, China; Ge-Peng Ji, Rong Zhu, Ge Gao, Wuhan University, China
SessionARS-1: Object Detection
LocationArea I
Session Time:Tuesday, 21 September, 15:30 - 17:00
Presentation Time:Tuesday, 21 September, 15:30 - 17:00
Presentation Poster
Topic Image and Video Analysis, Synthesis, and Retrieval: Image & Video Interpretation and Understanding
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Owing to the difficulties of mining spatial-temporal cues,the existing approaches for video salient object detection(VSOD) are limited in understanding complex and noisy scenarios, and often fail in inferring prominent objects. Toalleviate such shortcomings, we propose a simple yet effi-cient architecture, termed Guidance and Teaching Network(GTNet), to independently distil effective spatial and temporal cues with implicit guidance and explicit teaching at feature- and decision-level, respectively. To be specific, we (a) introduce temporal modulator to implicitly bridge fea-tures from motion into appearance branch, which is capable of fusing cross-modal features collaboratively, and (b) utilise motion-guided mask to propagate the explicit cues during the feature aggregation. This novel learning strategy achieves satisfactory results via decoupling the complex spatial-temporal cues and mapping informative cues across different modalities. Extensive experiments on three challenging benchmarks show that the proposed method can run at ∼28fpson a single TITAN Xp GPU and perform competitively against 14cutting-edge baselines.