Abstract
Cursive text detection is essential in various fields, includ- ing
document analysis, scene character and optical charac- ter recognition
(OCR). Despite technological advancements, accurate detection and
recognition of cursive text in natu- ral scene images continued to be
difficult because of differ- ences in font sizes, styles, orientations,
alignments, resolu- tions, blurriness, complex backgrounds and
appearance of multilingual text. Urdu is a cursive language widely
spoken in many South-Asian countries. There has been a persistent need
for Urdu text recognition because of its appearance in natural scenes
such as signboards, car number plates, news- papers, magazines, etc.
Moreover, Urdu text detection is challenging due to its complex writing
style, which includes joined writing, variations in the same characters,
numerous ligatures, multiple baselines, and other factors. This paper
proposes two hybrid models for resolution-free cursive text detection
and recognition. Firstly, a convolutional neural net- work (CNN) is used
for text detection, which is repeated with the Visual Geometry Group
(VGG-16). Secondly, for text recognition, Long Short-Term Memory (LSTM)
model is used on the extracted features from CNN and VGG-16 separately.
The proposed hybrid models CNN in combination with LSTM and VGG-16 in
combination with LSTM outperform the ex- isting ones by achieving 91%
and 96% accuracy, respectively.