IEEE ICIP 2021 || Anchorage, Alaska, USA || 19-22 September 2021

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

IMT-1.1

Paper Title

Semantic-based Sentence Recognition in Images Using Bimodal Deep Learning

Authors

Yi Zheng, Boston University, United States; Qitong Wang, Virginia Tech, United States; Margrit Betke, Boston University, United States

Session

IMT-1: Computational Imaging Learning-based Models

Location

Area J

Session Time:

Tuesday, 21 September, 08:00 - 09:30

Presentation Time:

Tuesday, 21 September, 08:00 - 09:30

Presentation

Poster

Topic

Computational Imaging Methods and Models: Learning-Based Models

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

The accuracy of computer vision systems that understand sentences in images with text can be improved when semantic information about the text is utilized. Nonetheless, the semantic coherence within a region of text in natural or document images is typically ignored by state-of-the-art systems, which identify isolated words or interpret text word by word. However, when analyzed together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel “Semantic-based Sentence Recognition” (SSR) deep learning model that reads the text in images with the help of understanding context. SSR consists of a Word Ordering and GroupingAlgorithm (WOGA) to find sentences in images and a Sequence-to-Sequence Recognition Correction (SSRC) model to extract semantic information in these sentences to improve their recognition. To show the effectiveness and generality of SSR in recognizing text, we present experiments with three notably distinct datasets, two of which we created ourselves. They respectively contain scanned catalog images of interior designs and photographs of protesters with hand-written signs. Our results show that SSR statistically significantly outperforms a baseline method that uses state-of-the-art single-word-recognition techniques on these three datasets. By successfully combining both computer vision and natural language processing methodologies, we reveal the important opportunity bi-modal deep learning can provide in addressing a task that was previously considered a single-modality computer vision task.

2021 IEEE International Conference on Image Processing

19-22 September 2021 • Anchorage, Alaska, USA

Imaging Without Borders

2021 IEEE International Conference on Image Processing

19-22 September 2021 • Anchorage, Alaska, USA

My ICIP 2021 Schedule

Paper Detail