Conference Proceeding

Multi-oriented text extraction from information graphics

Details

Citation

Böschen F & Scherp A (2015) Multi-oriented text extraction from information graphics. In: Proceedings of the 2015 ACM Symposium on Document Engineering (DocEng '15). 2015 ACM Symposium on Document Engineering, Lausanne, Switzerland, 08.09.2015-11.09.2015. New York: ACM, pp. 35-38. https://doi.org/10.1145/2682571.2797092

Abstract
Existing research on analyzing information graphics assume to have a perfect text detection and extraction available. However, text extraction from information graphics is far from solved. To fill this gap, we propose a novel processing pipeline for multi-oriented text extraction from infographics. The pipeline applies a combination of data mining and computer vision techniques to identify text elements, cluster them into text lines, compute their orientation, and uses a state-of-the-art open source OCR engine to perform the text recognition. We evaluate our method on 121 infographics extracted from an open access corpus of scientific publications. The results show that our approach is effective and significantly outperforms a state-of-the-art baseline.

Keywords
Infographics; OCR; multi-oriented text extraction;

Journal
DocEng 2015 - Proceedings of the 2015 ACM Symposium on Document Engineering

StatusPublished
Publication date31/12/2015
URLhttp://hdl.handle.net/1893/28052
PublisherACM
Place of publicationNew York
ISBN9781450333078
Conference2015 ACM Symposium on Document Engineering
Conference locationLausanne, Switzerland
Dates