Cookies

We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Durham University

Computer Science

Profile

Publication details for Dr Chris Willcocks

Medhat, Fady, Mohammadi, Mahnaz, Jaf, Sardar, Willcocks, Chris, Breckon, Toby, Matthews, Peter, McGough, Andrew Stephen, Theodoropoulos, Georgios & Obara, Boguslaw (2018), TMIXT: A process flow for Transcribing MIXed handwritten and machine-printed Text, IEEE International Conference on Big Data. Seattle, WA, USA, IEEE.

Author(s) from Durham

Abstract

—Text recognition of scanned documents is
usually dependent upon the type of text, being handwritten
or machine-printed. Accordingly, the recognition
involves prior classification of the text category,
before deciding on the recognition method to be applied.
This poses a more challenging task if a document
contains both handwritten and machine-printed text.
In this work, we present a generic process flow for text
recognition in scanned documents containing mixed
handwritten and machine-printed text without the
need to classify text in advance. We have realized the
proposed process flow using several open-source image
processing and text recognition packages. The speed
process and the amount of text documents used in
organization such as defense that can not be processed
by humans without considerable amount of automation,
will be efficiently and effectively handled by this
proposed work flow. The evaluation was performed
using a specially developed variant, presented in this
work, of the IAM handwriting database, where we have
achieved an average transcription accuracy of nearly
80% for pages containing both printed and handwritten
text.