iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing

  • In recent years, ◂...▸optical character recognition (OCR) systems have been used to digitally preserve historical archives. To transcribe historical archives into a machine-readable form, first, the documents are scanned, then an OCR is applied. In order to digitize documents without the need to remove them from where they are archived, it is valuable to have a portable device that combines scanning and OCR capabilities. Nowadays, there exist many commercial and open-source document digitization techniques, which are optimized for contemporary documents. However, they fail to give sufficient text recognition accuracy for transcribing historical documents due to the severe quality degradation of such documents. On the contrary, the anyOCR system, which is designed to mainly digitize historical documents, provides high accuracy. However, this comes at a cost of high computational complexity resulting in long runtime and high power consumption. To tackle these challenges, we propose a low power energy-efficient accelerator with real-time capabilities called iDocChip, which is a configurable hybrid hardware-software programmable ◂...▸System-on-Chip (SoC) based on anyOCR for digitizing historical documents. In this paper, we focus on one of the most crucial processing steps in the anyOCR system: Text and Image Segmentation, which makes use of a multi-resolution morphology-based algorithm. Moreover, an optimized FPGA-based hybrid architecture of this anyOCR step along with its optimized software implementations are presented. We demonstrate our results on multiple embedded and general-purpose platforms with respect to runtime and power consumption. The resulting hardware accelerator outperforms the existing anyOCR by 6.2×, while achieving 207× higher energy-efficiency and maintaining its high accuracy.

Download full text files

Export metadata

Additional Services

Search Google Scholar
Metadaten
Author:Menbere Kina Tekleyohannes, Vladimir Rybalkin, Muhammed Mohsin Ghaffar, Javier Alejandro Varela, Norbert Wehn, Andreas Dengel
URN:urn:nbn:de:hbz:386-kluedo-78376
DOI:https://doi.org/10.1007/s10766-020-00690-y
ISSN:1573-7640
Parent Title (English):International Journal of Parallel Programming
Publisher:Springer Nature - Springer
Document Type:Article
Language of publication:English
Date of Publication (online):2024/03/18
Year of first Publication:2021
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Date of the Publication (Server):2024/03/18
Issue:49
Page Number:32
First Page:253
Last Page:284
Source:https://link.springer.com/article/10.1007/s10766-020-00690-y
Faculties / Organisational entities:Kaiserslautern - Fachbereich Elektrotechnik und Informationstechnik
DDC-Cassification:6 Technik, Medizin, angewandte Wissenschaften / 621.3 Elektrotechnik, Elektronik
Collections:Open-Access-Publikationsfonds
Licence (German):Zweitveröffentlichung