Segmentation-free optical character recognition for printed Urdu text
© The Author(s) 2017
Received: 25 February 2017
Accepted: 17 August 2017
Published: 6 September 2017
This paper presents a segmentation-free optical character recognition system for printed Urdu Nastaliq font using ligatures as units of recognition. The proposed technique relies on statistical features and employs Hidden Markov Models for classification. A total of 1525 unique high-frequency Urdu ligatures from the standard Urdu Printed Text Images (UPTI) database are considered in our study. Ligatures extracted from text lines are first split into primary (main body) and secondary (dots and diacritics) ligatures and multiple instances of the same ligature are grouped into clusters using a sequential clustering algorithm. Hidden Markov Models are trained separately for each ligature using the examples in the respective cluster by sliding right-to-left the overlapped windows and extracting a set of statistical features. Given the query text, the primary and secondary ligatures are separately recognized and later associated together using a set of heuristics to recognize the complete ligature. The system evaluated on the standard UPTI Urdu database reported a ligature recognition rate of 92% on more than 6000 query ligatures.
KeywordsOptical character recognition Printed Urdu text Ligature Hidden Markov models Clustering
With the tremendous advancements in computation and communication technologies, the amount of information available in the digital form has increased manifolds over the recent years. Consequently, an increased tendency to digitize the existing paper documents in the form of books, magazines, newspapers, and notes has also been observed over the last decade. With this, the need to have efficient Optical Character Recognizers (OCRs) to convert the digitized images into text has increased. OCR is one of the most researched pattern classification problems. Today, commercially mature OCRs are available realizing high recognition rates on a number of scripts, those based on Latin and Chinese alphabets for instance [1, 2]. Despite these developments, OCRs for many languages are yet either to be developed or are in very early stages, and cursive Urdu being one of such example is investigated in our study.
The alphabet of Urdu is a super set of Arabic, borrows some characters from Pashto, and comprises a total of 39 characters. Unlike Pashto and Arabic which are mostly scripted in the Naskh style, Urdu generally employs the Nastaliq script which runs diagonally from right to left. The major challenges offered by Urdu document images include non-uniform inter- and intra-word spacing, overlapping of neighboring and partial words, filled or false loops, and no fixed baseline [2–4]. OCR finds applications in a wide range of problem areas including printed (religious, poetry and literature books, newspapers, passports, utility bills), handwritten (office records, historical manuscripts, input to mobile devices) and mixed (traffic challans, bank checks, driving licenses, hand-filled forms) documents. Other applications include text-guided autonomous vehicle navigation and text-to-speech, text-to-text, and speech-to-text recognition for people having sight, speaking, and hearing disabilities, respectively .
This paper is organized as follows. Section 2 reviews some promising techniques proposed for recognition of text in Urdu and other languages based on similar scripts. Section 3 presents the proposed methodology with in-depth discussion on the training and recognition modules. In Section 4, experiments carried out to validate the proposed technique are described while conclusions are drawn in the last section.
2 Literature review
Character recognition is one of the most investigated pattern classification problems. Recognition systems for Urdu and text in similar languages, however, are yet to mature as opposed to other scripts. Most of the work carried on Urdu either deals with individual characters [6–9, 11, 12] or employs separate recognition of primary and secondary ligatures [5, 10, 13]. Traditionally, recognition techniques can be categorized into analytical and holistic approaches as discussed in the following.
2.1 Analytical approaches
Analytical (segmentation-based) methods rely on segmentation of ligatures into characters either explicitly [14, 15] or implicitly [16–18]. Among these methods, Javed and Hussain  propose a recognition system for Urdu Noori Nastaliq font extracting discrete cosine transform (DCT) features from skeletonized and already segmented characters. Classification is carried using HMMs and less than 20 classes are considered in their study. The system evaluated on 1692 Urdu ligatures achieved a recognition rate of 92.7%. Malik and Faheim  presented a line and character segmentation scheme that mainly relies on projection profiles. The under- and oversegmentations are handled through a set of heuristics. Segmentation accuracy of 99% is reported on a custom dataset of Urdu Batool font. In a similar work, Uddin et al.  presented a novel technique for segmentation of overlapped and joined text lines with subsequent complete ligature extraction from printed Urdu document images. The system evaluated on 30 document images reports line and ligature segmentation accuracies of 98.79 and 92.49% respectively. Hussain et al.  proposed a segmentation-based OCR for printed Urdu Nastaliq ligatures. The original grapheme shape classes are increased from 47 to 250 in order to achieve better recognition. A window is slided over the contour of a segmented grapheme for extraction of DCT low-frequency coefficients. The coefficients serve to train separate HMMs for 250 grapheme classes each having 30 instances. A query ligature is separated into main and secondary bodies and then segmented into graphemes for individual recognition. Once recognized individually, graphemes are joined to form the main body ligatures that are associated with the secondary bodies using a lookup table. System evaluation on 93,018 words (from the Center of Language Engineering (CLE) database) in font size 14 realize 87.76% ligature recognition rate.
A number of recent studies [16, 22] have employed deep learning-based implicit segmentation techniques for recognition of cursive text leaving it to the classifier to implicitly find segmentation cue points of characters. Ahmed et al.  applied Bidirectional Long Short-Term Memory (BLSTM) classifier to the character recognition of Urdu. Using raw pixels, the proposed approach realized 89% accuracy on the UPTI dataset. In a similar study, Naz et al.  employ the more advanced multi-dimensional LSTM (MDLSTM) with Connectionist Temporal Classification (CTC) layer. System trained using set of statistical features extracted from normalized gray scale images report character recognition rate of 96.4% for the clean text part of the UPTI dataset.
Segmentation-based approaches have the advantage of reduced number of training classes that is same as the number of characters (having different context-based letter shapes) in the alphabet. However, segmenting (Urdu and alike) cursive scripts into characters is a challenging task in itself. Recently, implicit segmentation using deep learning has been successfully investigated for recognition of Urdu text [23–26]. These techniques, however, require large training data and employ characters as units of recognition rather than ligatures or words.
2.2 Holistic approaches
Holistic (segmentation-free) approaches employ partial words (ligatures) as units of recognition rather than characters. The ligatures themselves have to be extracted from text lines but since they are not further segmented, these techniques are termed as segmentation-free. Holistic techniques are known to be more robust for Urdu text as reported in a number of studies [5, 9, 10, 12, 19, 27–32].
Among notable holistic approaches, Javed et al.  present a holistic approach for recognition of Urdu text. DCT features extracted from sliding windows are used to train HMM-based classifiers for 1282 high-frequency Urdu ligatures. System evaluation reports 92% recognition rate for 3655 ligatures. In a similar work , modified tesseract OCR engine is adapted for Urdu Nastaliq font. The system with a reduced search space realizes around 97% recognition rate for primary ligatures in font sizes of 14 and 16. Sabbour and Shafait  contributed a large database of Urdu text line images, UPTI, now considered a benchmark for the evaluation of Urdu OCR systems. The recognition methodology relies on extracting shape descriptors from contours of ligature images. Classification using k-nearest neighbor classifier achieved a recognition rate of 89% for 10,000 primary ligatures.
A major drawback of most of the ligature-based systems is the sensitivity to font size and inability to handle dots and diacritics. Among the systems handling these issues, Khattak et al.  proposed a holistic system for separate recognition of primary and secondary ligatures irrespective of font size. Projection profile of edges, concavity, and curvatures features are extracted by sliding the windows right-to-left over the ligature image. Features are fed to the HMM classifiers, training separate HMMs for 2028 unique ligatures. System evaluation carried on 6084 query ligatures reports a recognition rate of 97.93%. The system, however, does not associate primary and secondary ligatures after recognition. Likewise, in continuation of their previous work , Akram et al.  employ the open source tesseract for recognition of multiple font-sized ligatures. The primary and secondary ligature recognition is carried out individually that are associated later to form the complete ligatures. The system evaluated on 224 documents achieves 86.15% end-to-end ligature recognition accuracy. The main drawback of the system is the need of separate training for each font size.
A summary of notable contributions on recognition of Urdu text
Pal and Sarkar 
Shamsher et al. 
Tariq et al. 
Sardar and Wahab 
Nawaz et al. 
Ahmed et al. 
Hussain et al. 
Hassan et al. 
Ahmed et al. 
Naz et al. 
Hussain et al. 
Sabbour and Shafait 
10,000 primary ligatures
Javed and Hussain 
1282 unique primary ligatures
Akram et al. 
1475 unique primary ligatures
Akram et al. 
Javed et al. 
1692 Unique ligatures
Khattak et al. 
2028 Unique ligatures
A critical analysis of the literature on Urdu OCR systems reveals that the problem has attracted significant research attention during the last 10 years. While the initial endeavors primarily focused on recognition of isolated characters [6–8], a number of deep learning-based robust solutions [17, 23–26] have been proposed in the recent years. These methods mainly rely on implicit segmentation of characters and report high recognition rates. However, as discussed earlier, such systems ignore recognition of complete ligatures and require large amount of training data. Ligatures represent the most natural unit of recognition for cursive scripts like Urdu Nastaliq. From the viewpoint of end-to-end recognition systems, ligature recognition rates are more significant as compared to character recognition rates. Urdu has a huge set of unique complete ligatures summing up to around 26,000 . Most of these complete ligatures, however, are very rarely used. Moreover, many complete ligatures only differ by position and the number of dots while the primary component of the ligature remains the same. Splitting ligatures into primary and secondary ligatures results in a significant reduction of the number classes. It has been shown that more than 99% of the complete Urdu corpus can be covered with around 2300 unique primary and secondary ligatures only . Association of secondary ligatures with the primary ligatures after recognition, however, is a very challenging task and has been mostly ignored in ligature-based studies [13, 28, 29] on Urdu OCR systems. The present research aims to develop a robust technique for recognition of primary and secondary ligatures and their association allowing recognition of the complete ligatures.
A scale invariant, statistical features-based holistic OCR system for Urdu Nastaliq font is proposed that employs ligatures as units of recognition.
A semi-automatic and scalable sequential clustering technique is presented to group ligatures into clusters to prepare the training data.
Separate recognition of primary and secondary ligatures is carried through HMMs and recognized ligatures are combined to form the complete ligatures using a comprehensive reassociation technique.
High-ligature recognition rates are realized on a benchmark dataset of Urdu text lines.
3 Proposed methodology
Training is carried out to make the models learn to discriminate between different ligature classes. The first 3000 text lines from the UPTI database are used as the training set in our study. The key steps in training involve extraction of ligatures from text lines, clustering of ligatures and training of hidden Markov models on clusters of ligatures as detailed in the following.
3.1.1 Ligature extraction
3.1.2 Ligature clustering
Training a classifier to recognize classes requires labeled ligature classes. Manually generating and labeling the training data is naturally an expensive solution in terms of time and effort. We, therefore, carry out a semi-automatic clustering of extracted ligatures to generate the training data for the classifiers. Errors in the generated cluster classes are then corrected through visual inspection in order to make clusters error free to serve as training data.
Summary of features employed for clustering of ligatures using DTW
Sum of horizontal edges in each sector
Sum of vertical edges in each sector
Initially, a ligature is randomly chosen and is assumed as mean of the first cluster. Each of the remaining ligatures is then picked one by one and the distance from the mean of each cluster is computed using DTW. If the distance to the nearest cluster is below a predefined threshold, the current ligature is added to the respective cluster and the cluster mean is updated. Otherwise, a new cluster is created with the present ligature as its center. The sequential clustering algorithm does not impose any constraints on the number of clusters and hence the system remains scalable to add more clusters. The algorithm, however, is sensitive to the order in which the ligatures are presented to it. Nevertheless, it should be noted that the idea is to generate an approximate set of clusters which are corrected by visual inspection prior to training the models (as they form training data for the classifiers). To keep only the high-frequency ligature (HFL) clusters, the Unicode associated with each cluster is compared with those in the standard frequency list of HF ligatures compiled by the Center of Language Engineering (CLE) . A cluster that finds a match in the HFL list is kept in the database while the remaining clusters are discarded. The process produced a total of 1525 HFL clusters each containing at least 10 instances.
Character classes with members and respective Unicodes
Table of dots/diacritics with example images and corresponding number values
Absence of dot/diacritic
One dot above
One dot above baseline
One dot below
One dot below baseline
Two dots above
Two dots above baseline
Two dots below
Two dots below baseline
Three dots above
Three dots above baseline
Three dots below
Three dots below baseline
Secondary stroke of “Aik-chashmi-hey” when used as joiner
Secondary stroke of “Gaaf”
Secondary stroke appearing with “Alif” forming “Alif-mad-aa”
Secondary stroke appearing with “Bay” class when used as joiner
Arabic like Thashdid
Secondary stroke of “Thay,” “Rday,” and “Dhaal”
Secondary stroke with “Bay” class, some times appears in isolation
Once the primary and secondary ligatures are grouped into clusters and each cluster is assigned the respective Unicode, we proceed to training models to learn the ligature classes. We have selected to employ hidden Markov models which have been successfully applied to a number of diverse problems including gesture recognition [36–38], speech recognition , handwriting recognition [38, 40], musical score recognition , and optical character recognition [10, 42–44]. The steps of feature extraction from ligature clusters and subsequent training are discussed in the following subsections.
3.1.3 Feature extraction
The set of features extracted from each window/frame includes Hu’s moments , horizontal and vertical projections with respective mean values of two-dimensional fast fourier transform (FFT) energy and Zernike moments  energy. These features are briefly described in the following.
188.8.131.52 Hu’s moments:
Hu further derived a set of seven rotational invariant moments M 0…M 7 for an effective representation of the shapes under study. In our implementation, the seven Hu’s moments are computed from each frame and are employed as features. The computational details of these moments can be found in .
184.108.40.206 Zernike energy features:
Zernike moments of order four [47, 48] are computed from each frame that is resized to 32×32. The computation of Zernike moments comprises three main steps, computing the radial polynomial, computing the basis function of Zernike and computing Zernike moments. Zernike moments for discrete image of symmetric size N×N are computed as follows.
The Zernike energy is then used to find the horizontal projection (the sum of each row in the energy matrix; dimension, 32), vertical projection (the sum of each column in the energy matrix; dimension, 32), and mean values of the two projections (dimension, 2) forming a 66 dimensional Zernike Energy feature vector for each frame.
220.127.116.11 Two-dimensional FFT energy features:
The dot operator with the power of 2 represents the element-wise square of absolute values while division by size of frame (frame size ) normalizes the FFT energy. The sum of each row of FFT energy matrix gives its horizontal projection while the sum of each of column gives a vertical projection of energy. The horizontal and vertical projections and their mean values form a 43 (32 +9+1+1) dimensional FFT energy-based feature vector.
Summary of frame features employed in our study
Horizontal projection of Zernike energy
Vertical projection of Zernike energy
Mean of horizontal projection of Zernike energy
Mean of vertical projection of Zernike energy
Horizontal projection of FFT energy
Vertical projection of FFT energy
Mean of horizontal projection of FFT energy
Mean of vertical projection of FFT energy
3.1.4 Hidden Markov Model (HMM) training
3.2 Ligature recognition
Prior to recognition, the position and access-order information of secondary ligatures with respect to the upper and lower profiles of primary ligature is extracted and stored. Recognition is carried out by feeding the feature sequences extracted from a query ligature to all the trained models. The model that reports the maximum probability characterizes the query ligature. The position and access-order information are later used for association of secondary ligatures with their primary ligature to form the complete ligature, which is eventually validated from a dictionary and is written to a text file.
3.2.1 Secondary ligature position and access-order information
- 1.Primary ligature with no loop(s)
Secondary ligature is considered above the baseline if it is above the upper profile.
Secondary ligature is considered above the baseline if it is in between the upper and lower profiles.
Secondary ligature is considered below the baseline if it is below the lower profile.
- 2.Primary ligature with loop(s)
Secondary ligature is considered above the baseline if it is above the upper profile and below otherwise.
3.2.2 Primary and secondary ligature recognition
After retrieval of position and access-order information, the complete ligature’s primary as well as secondary ligatures are individually fed to the trained HMM classifiers. Recognition is carried out by the HMM producing the highest probability for the queried ligature, the corresponding label is returned as output. These labels are not the complete ligature Unicodes but represent a collection of corresponding character classes’ Unicodes (Table 3) for primary ligatures and the numeric codes (Table 4) for secondary ligatures. A postprocessing of the association of secondary ligatures with the primary ligature and subsequent assignment of each secondary ligature to the character classes of primary ligature is carried to form complete characters and is described as follows.
3.2.3 Secondary ligature association and complete ligature formation
Lookup table for character classes—possible occurrences of dots/diacritics
Once recognized, the Unicode strings of the true characters are concatenated to form the Unicode string of the complete ligature which is written to a text file in UTF-8 after verification from a lexicon.
4 Results and analysis
Recognition rates of (primary, secondary, and punctuation ligature) clusters
Number of ligatures
Total complete ligatures
Complete ligatures (per number of characters) with respective recognition rates
Characters in ligature
Two character ligatures
Three character ligatures
Four character ligatures
Five character ligatures
Six character ligatures
Seven character ligatures
Total true ligatures
Comparison of proposed method with notable studies
Complete ligature recognition
Ahmed et al. 
Hassan et al. 
Naz et al. 
Naz et al. 
Javed et al. 
Akram et al. 
Javed and Hussain 
Khattak et al. 
Sabbour and Shafait 
Akram et al. 
Hussain et al. 
Ligatures with visually similar shapes
False joining of secondary ligatures with the respective or neighboring primary ligature
False re-association of dots/diacritics with the primary ligatures
We presented a holistic optical character recognition system for printed Urdu Nastliq font using statistical features and Hidden Markov Models employing ligatures as units of recognition. The developed system is trained on 1525 unique high-frequency Urdu ligature clusters from the standard UPTI database. The complete ligatures are first split into primary and secondary ligatures and are recognized separately. The secondary ligatures are then associated with the primary ligature using a set of heuristics to recognize complete ligature. The system evaluated through a number of interesting experiments achieved high recognition rates which are comparable to the recent studies on this problem.
In our further study on the subject, we intend to incorporate the entire set of Urdu HFLs (around 2300) to cover almost complete (99%) Urdu vocabulary. Likewise, we presently consider 16 frequently occurring dots and diacritics and this number can be enhanced as well. The postprocessing which associates secondary components with the respective primary components can be further improved to reduce the recognition errors when classifying the true ligatures.
The presented research is not a part of any funded project and is carried out during the PhD research work of Israr Ud-Din.
IS and SK contributed to the algorithmic development while IUD and TA contributed to the implementation and paper writing. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- S Shabbir, I Siddiqi, Optical character recognition system for Urdu words in nastaliq font. Int. J. Adv. Comput. Sci. Appl.7(5), 567–76 (2016).Google Scholar
- S Naz, AI Umar, SB Ahmed, SH Shirazi, MI Razzak, I Siddiqi, in Multi-Topic Conference (INMIC), 2014 IEEE 17th International. An OCR system for printed nasta’liq script: a segmentation based approach (IEEE, Pakistan, 2014), pp. 255–259.Google Scholar
- ST Javed, Investigation into a segmentation based OCR for the nastaleeq writing system, Master’s thesis, National University of Computer and Emerging Sciences Lahore, Pakistan (2007).Google Scholar
- DA Satti, Offline Urdu nastaliq ocr for printed text using analytical approach, upublished master’s thesis, Quaid-i-Azam University Islamabad, Pakistan (2013).Google Scholar
- N Sabbour, F Shafait, in IS&T/SPIE Electronic Imaging. A segmentation-free approach to Arabic and Urdu OCR (International Society for Optics and Photonics, USA, 2013), pp. 86580–86580.Google Scholar
- U Pal, A Sarkar, in proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03). Recognition of printed Urdu Script (UK, 2003), pp. 1183–1187.Google Scholar
- I Shamsher, Z Ahmad, JK Orakzai, A Adnan, OCR for printed Urdu script using feed forward neural network. Proc. World Acad. Sci. Eng. Technol. 23:, 172–175 (2007).Google Scholar
- J Tariq, U Nauman, MU Naru, in Computer Engineering and Technology (ICCET), 2010 2nd International Conference On, 3. Softconverter: a novel approach to construct OCR for printed Urdu isolated characters (IEEE, China, 2010), pp. V3–495.Google Scholar
- S Sardar, A Wahab, in Information and Emerging Technologies (ICIET), 2010 International Conference On. Optical character recognition system for Urdu (IEEE, Pakistan, 2010), pp. 1–5.Google Scholar
- ST Javed, S Hussain, A Maqbool, S Asloob, S Jamil, H Moin, Segmentation free nastalique Urdu OCR. World Acad. Sci. Eng. Technol. 46:, 456–461 (2010).Google Scholar
- Z Ahmad, JK Orakzai, I Shamsher, A Adnan, in Proceedings of World Academy of Science, Engineering and Technology, 26. Urdu nastaleeq optical character recognition (Citeseer, 2007), pp. 249–252.Google Scholar
- T Nawaz, S Naqvi, H ur Rehman, A Faiz, Optical character recognition system for Urdu (naskh font) using pattern matching technique. Int. J. Image Process. (IJIP). 3(3), 92 (2009).Google Scholar
- QUA Akram, S Hussain, A Niazi, U Anjum, F Irfan, in Document Analysis Systems (DAS), 2014 11th IAPR International Workshop On. Adapting tesseract for complex scripts: an example for Urdu nastalique (IEEE, France, 2014), pp. 191–195.View ArticleGoogle Scholar
- Z Ahmad, JK Orakzai, I Shamsher, in Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference On. Urdu compound character recognition using feed forward neural networks (IEEE, China, 2009), pp. 457–462.View ArticleGoogle Scholar
- H Malik, MA Fahiem, in Visualisation, 2009. VIZ’09. Second International Conference In. Segmentation of printed Urdu scripts using structural features (IEEE, 2009), pp. 191–195.Google Scholar
- A Ul-Hasan, SB Ahmed, F Rashid, F Shafait, TM Breuel, in 2013 12th International Conference on Document Analysis and Recognition. Offline printed Urdu nastaleeq script recognition with bidirectional LSTM networks (IEEE, USA, 2013), pp. 1061–1065.View ArticleGoogle Scholar
- S Naz, AI Umar, R Ahmad, SB Ahmed, SH Shirazi, I Siddiqi, MI Razzak, Offline cursive Urdu-nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing. 177:, 228–241 (2016).View ArticleGoogle Scholar
- S Naz, AI Umar, R Ahmad, SB Ahmed, SH Shirazi, MI Razzak, Urdu nastaliq text recognition system based on multi-dimensional recurrent neural network and statistical features. Neural Comput. Appl. 28(2), 1–13 (2015).Google Scholar
- ST Javed, S Hussain, in Iberoamerican Congress on Pattern Recognition. Segmentation based Urdu nastalique OCR (Springer, Cuba, 2013), pp. 41–49.Google Scholar
- Line and ligature segmentation in printed Urdu document images. J. Appl. Environ. Biol. Sc. 6(3S), 114–120 (2016).Google Scholar
- S Hussain, S Ali, QU Akram, Nastalique segmentation-based approach for Urdu OCR. Int. J. Doc. Anal. Recognit. (IJDAR). 18(4), 357–374 (2015).View ArticleGoogle Scholar
- SB Ahmed, S Naz, MI Razzak, SF Rashid, MZ Afzal, TM Breuel, Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Comput. Appl. 27(3), 603–613 (2016).View ArticleGoogle Scholar
- MR Yousefi, MR Soheili, TM Breuel, E Kabir, D Stricker, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference On. Binarization-free OCR for historical documents using LSTM networks (IEEE, France, 2015), pp. 1121–1125.View ArticleGoogle Scholar
- A Ul-Hasan, SS Bukhari, A Dengel, in 2016 12th IAPR Workshop on Document Analysis Systems (DAS). Ocroract: a sequence learning OCR system trained on isolated characters (Greece, 2016), pp. 174–179.Google Scholar
- R Messina, J Louradour, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference On. Segmentation-free handwritten Chinese text recognition with LSTM-RNN (IEEE, France, 2015), pp. 171–175.View ArticleGoogle Scholar
- A Ray, S Rajeswar, S Chaudhury, in Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference On. Text recognition using deep BLSTM networks (IEEE, India, 2015), pp. 1–6.Google Scholar
- M Akram, S Hussain, in Proceedings of the 8th Workshop on Asian Language Resources. Word segmentation for Urdu OCR system (Beijing, 2010), pp. 88–94.Google Scholar
- Q Akram, S Hussain, F Adeeba, S Rehman, M Saeed, in the Proceedings of Conference on Language and Technology. (CLT 14). Framework of Urdu nastalique optical character recognition system (Karachi, 2014).Google Scholar
- IU Khattak, I Siddiqi, S Khalid, C Djeddi, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference On. Recognition of Urdu ligatures-a holistic approach (IEEE, France, 2015), pp. 71–75.View ArticleGoogle Scholar
- MW Sagheer, CL He, N Nobile, CY Suen, in Pattern Recognition (ICPR), 2010 20th International Conference On. Holistic urdu handwritten word recognition using support vector machine (IEEE, Turkey, 2010), pp. 1900–1903.View ArticleGoogle Scholar
- SA Sattar, S Haque, MK Pathan, in Proceedings of the 46th Annual Southeast Regional Conference on XX. Nastaliq optical character recognition (ACM, USA, 2008), pp. 329–331.View ArticleGoogle Scholar
- R Hussain, HA Khan, I Siddiqi, K Khurshid, A Masood, in 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). Keyword based information retrieval system for Urdu document images (IEEE, Thailand, 2015), pp. 27–33.View ArticleGoogle Scholar
- GS Lehal, in Proceeding of the Workshop on Document Analysis and Recognition. Choice of recognizable units for Urdu OCR (ACM, India, 2012), pp. 79–85.View ArticleGoogle Scholar
- A Bensefia, T Paquet, L Heutte, A writer identification and verification system. Pattern Recognit. Lett. 26(13), 2080–2092 (2005).View ArticleMATHGoogle Scholar
- I Siddiqi, N Vincent, Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features. Pattern Recognit. 43(11), 3853–3865 (2010).View ArticleMATHGoogle Scholar
- CW Ng, S Ranganath, Real-time gesture recognition system and application. Image Vis. Comput. 20(13), 993–1007 (2002).View ArticleGoogle Scholar
- J Triesch, C von der Malsburg, Classification of hand postures against complex backgrounds using elastic graph matching. Image Vis. Comput. 20(13), 937–943 (2002).View ArticleGoogle Scholar
- HS Yoon, J Soh, YJ Bae, HS Yang, Hand gesture recognition using combined features of location, angle and velocity. Pattern Recognit. 34(7), 1491–1501 (2001).View ArticleMATHGoogle Scholar
- XD Huang, Y Ariki, MA Jack, Hidden Markov Models for Speech Recognition, vol. 2004 (Edinburgh university press, Edinburgh, 1990).Google Scholar
- E Kavallieratou, E Stamatatos, N Fakotakis, G Kokkinakis, in International Conference on Pattern Recognition, 15. Handwritten character segmentation using transformation-based learning (Spain, 2000), pp. 63–637.Google Scholar
- B Pardo, W Birmingham, in Proceeding of the National Conference on Artificial Intelligence, 20. Modeling form for on-line following of musical performances (USA, 2005), p. 1018.Google Scholar
- T Plotz, GA Fink, Markov models for offline handwriting recognition: a survey. Int. J. Document Anal. Recognit. (IJDAR). 12(4), 269–298 (2009).View ArticleGoogle Scholar
- A Khemiri, AK Echi, A Belaid, M Elloumi, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference On. Arabic handwritten words offline recognition based on HMMS and DBNS (IEEE, France, 2015), pp. 51–55.View ArticleGoogle Scholar
- E Chammas, C Mokbel, L Likforman-Sulem, in Document Analysis and Recognition (ICDAR), 2015 13th International Conference On. Arabic handwritten document preprocessing and recognition (IEEE, France, 2015), pp. 451–455.View ArticleGoogle Scholar
- M-K Hu, Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory. 8(2), 179–187 (1962).View ArticleMATHGoogle Scholar
- D Yu, H Yan, Separation of touching handwritten multi-numeral strings based on morphological structural features. Pattern Recognit. 34(3), 587–599 (2001).View ArticleMATHGoogle Scholar
- A Tahmasbi, F Saki, SB Shokouhi, Classification of benign and malignant masses based on Zernike moments. J. Comput. Biol. Med. 41(8), 726–735 (2011).View ArticleGoogle Scholar
- F Saki, A Tahmasbi, H Soltanian-Zadeh, SB Shokouhi, Fast opposite weight learning rules with application in breast cancer diagnosis. J. Comput. Biol. Med. 43(1), 32–41 (2013).View ArticleGoogle Scholar
- GS Lehal, in Document Analysis and Recognition (ICDAR), 2013 12th International Conference On. Ligature segmentation for Urdu OCR (IEEE, USA, 2013), pp. 1130–1134.View ArticleGoogle Scholar