A novel method for straightening curved text-lines in stylistic documents
© Singh et al.; licensee Springer. 2014
Received: 17 October 2013
Accepted: 2 July 2014
Published: 19 July 2014
Stylistic text can be found on sign boards, street and organizations boards and logos, bulletin boards, announcements, advertisements, dangerous goods plates, warning notices, etc. In stylistic text images, text-lines within an image may have different orientations such as curved in shape or not be parallel to each other. As a result, extraction and subsequent recognition of individual text-lines and words in such images is a difficult task. In this paper, we propose a novel scheme for straightening of curved text-lines using the concept of dilation, flood-fill, robust thinning, and B-spline curve-based fitting. In the proposed scheme, at first, dilation is applied on individual text-lines to cover the area within a certain boundary. Next, thinning is applied to get the path of the text, approximate the path using the B-spline, find the angle between the normal at a point on the curve and the vertical line, and finally visit each point on the text and rotate by their corresponding angles. The proposed methodology is tested on variety of text images containing text-lines in Devanagari, English, and Chinese scripts which is evaluated on the basis of visual perception and the mean square error (MSE) calculation. MSE is calculated by line fitting applied on input and output images. On the basis of evaluation results obtained in our experiments, the proposed method is promising.
KeywordsOCR Dilation B-spline Flood-fill Feature extraction Recognition Segmentation Thinning
A large volume of research effort has been dedicated to OCR systems. Numbers of algorithms [1–6] are available for this purpose, and many commercial OCR systems [7, 8] are now available in the market but most of these systems can recognize only text images having straight text-lines (horizontal) and designed only for a specific script or language. On the other hand, there are few limitations in regard to the source materials and character formatting which make feature extraction and recognition difficult.
II. Related works
Many pieces of works are available on the document image recognition [2, 5, 6] having straight text-lines and words. But in the literature, there are only few works available towards the recognition of stylistic documents [19–44].
In 1991, Xie et al.  proposed a pattern recognition system invariant of translation, scale change, and rotation transformation of pattern and 97% recognition accuracy is obtained from the 10 Arabic numerals. Another work on English stylistic text recognition due to Tang et al.  used a translation-ring-projection algorithm to handle the multi-oriented English alphabets in 1991.
In 2000, one work on English stylistic text recognition is due to Adam et al.  in which an approach of recognition of multi-oriented and multi-scaled character in engineering drawings is proposed. Fourier-Mellin transform is used to recognize the characters. This approach is limited to the recognition of few characters and it is time consuming. Further in 2001, Yang et al.  proposed an approach of three stages for multi-oriented Chinese character recognition where features are mainly based on geometric measures of the foreground pixels of the characters. It is limited to Chinese language only.
In 2003, Hase et al.  proposed a multi-oriented character handling approach based on the character types such as inclined, horizontal, vertical, curved, etc., and it considers character realignment horizontally and then for recognition. The main drawback of Hase et al.  approach is the distortion due to realignment of curved text. For rotated and/or inclined English character recognition, Hase et al.  used a parametric Eigen-space-based approach. This method is also limited to English character recognition but not used in variation of font style, size, and multi-script.
In 2005, Pal et al.  proposed a recognition-based approach to handle Indian multi-oriented and curved text. It is based on the water reservoir concept for segmentation of characters from stylistic documents without any skew correction. Next, individual characters are recognized. This approach is limited to recognize Bangla and Devanagari script text only. In 2006, Hayashi et al.  proposed a rotation invariant Arabic numerals recognition system where a numeral is divided into elementary sub-patterns like straight line, C-shaped line, and 0-shaped line using thinning algorithm, and then recognized based on different features like curvature, angle information, length, arc-length, etc. of the sub-pattern. In 2006, Pal et al.  proposed a method towards the recognition of multi-oriented and multi-sized English characters based on the modified quadratic discriminate function (MQDF). The main drawback of the proposed approach is that it cannot distinguish similar looking character such as ‘b’ and ‘q’ , ‘p’ and ‘d’ , ‘n’ and ‘u’ , etc. This is because of the use of rotational invariant features.
In 2007, Monwar et al.  proposed an approach of recognizing printed alpha-numeric character of different angle and in this approach, each character is described by a small set of two-dimensional characteristic views of different angles for feature extraction. In 2008, Roy et al.  proposed an approach towards recognition of English character in graphical documents containing multi-scale and multi-oriented text. For recognition of such multi-scale and multi-oriented characters, a support vector machine (SVM)-based scheme is presented.
In 2010, Pal et al.  proposed an approach for the recognition of multi-oriented Bangla and Devnagari characters. Although it is also a recognition-based system, this system fails in confusing characters of Bangla and Devnagari script.
In 2011, Chiang et al.  proposed a general text recognition technique to handle non-homogeneous text by exploiting dynamic character grouping criteria based on the character sizes and maximum desired string curvature. In 2011, Shivakumara et al.  proposed an approach to detect multi-oriented text in videos. The input image is first filtered with Fourier-Laplacian, and K-means clustering is then used to identify candidate text regions based on the maximum difference. The skeleton of each connected component helps to separate the different text strings from each other. Finally, text string straightness and edge density are used for false-positive elimination. Shivakumara et al.  method is limited to English language text orientation.
All the approaches discussed earlier based on the recognition of multi-oriented characters and are limited to proper script recognition. There is not a single approach in the literature which is based on the straightening of curved text-lines or words and script independence. In contrast, an approach for curved text-line straightening is proposed in this work which can handle multi-font size and type and multi-script text-lines in a single document.
III. Proposed approach
Curved or stylistic text present in document images poses problems in segmentation and recognition. So, before recognition, it is important to straighten the text-lines. This paper attempts to present a method, which is based on the following important steps:
The image is inverted, ‘1’ as black and ‘0’ as white for the MATLAB perspective.
Erosion is applied on dilated image using disk-based structuring element of size ‘17’.
Morphological operation of thinning is applied on eroded image with ‘10’ iterations.
The output image is again inverted for the better working of further steps of curve straightening.
In this step, curve fitting is applied on the result obtained from the previous step. For the curve fitting, B-spline [35–41] curve is used instead of polynomial curve to approximate the pixel data. Polynomial curve fails to approximate complex curve as shown in Figure 2. B-spline and Bezier curves have very similar form but Bezier curve contains more information. Approximating simple curve using polynomial curve suffers with Runge’s phenomenon [42–44]. So, when we approximate the higher-degree pixel data, the accuracy does not always increase. Hence, in the proposed approach, B-spline curve is used, which is free from Runge's phenomenon even at higher degrees. B-spline curve is described as follows:
Where N i , p (u)s are B-spline basis functions of degree p.
Where P0, P1, ……, P h are the h + 1 unknown control points.
Since N and Q are known, solving this system of linear equations for P gives the desired control points.
However, segmentation step produces the following errors in the extraction of individual text-line from multi-line images due to the following problems arising during the morphological operations such as:
If the inter-character spacing is large in stylistic text-lines. Although, it is assumed that the text image to be processed is based on the isolated character recognition. So, if inter-character spacing is large, the recognition rate will not be more affected.
When two or more text-lines are very close to each other, it will result in merged text-lines. In our experiments, we have taken only those images which contain sufficient gap between text-lines due to the limitation of morphology.
IV. Experimental results and discussions
This method is evaluated on the basis of visual perception and the mean square error calculation. Mean square error is calculated by line fitting applied on input and output images with the help of following equations:
to approximate the given set of data (x1, y1), (x2, y2), ….., (x n , y n ), where n ≥ 2.
Evaluation results with respect to used measure
Document image (bmp)
Line fitting error
Percentage error removed
In this paper, we presented a curved text-line straightening (correction) technique over the text images having wide variations in terms of font, layout, and size of machine printed as well as handwritten Devanagari, English, and Chinese text-lines. The proposed straightening algorithm is tested on 140 images but more test samples can reveal more output cases in terms of merits or demerits of proposed algorithm. However, the results of few images needs to be further corrected for the better performance of OCR systems. Although we have done experiments only on Devanagari, English, and Chinese text-lines, but proposed algorithm can also be useful to handle all type of languages and scripts such as Brahmi, Grantha, Sinhalese, Bali, etc. The main contribution of this work is in the development of language/script-independent OCR systems.
In contrast, an approach for curved text-line straightening is proposed in this work which can handle multi-font size and type and multi-script text-lines within a single document. The proposed approach is limited to straightening of text-lines and words only and also cannot work well on text-lines not having gap between text-lines. Selection of size of structuring element is set manually in our proposed approach. Automation of structuring element's size selection is required for the enhancement of the accuracy of the proposed approach.
To the best of our knowledge, not a single method proposed to correct the orientation of curved text-line has been reported.
Most of the works reported on Indian languages are on straight text-line documents. Elaborated studies on curved, multi-oriented, or skewed text-line documents are not much undertaken by the researchers in the development of script/language-independent OCR systems.
- Singh BM, Mittal A, Ghosh D: An evaluation of different feature extractors and classifiers for offline handwritten Devanagari character recognition. Journal of Pattern Recognition and Research 2011, 06(2):269-277. 10.13176/11.302View ArticleGoogle Scholar
- Ghosh D, Dube T, Shivaprasad AP: Script recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32: 2142-2161.View ArticleGoogle Scholar
- Pal U, Chaudhuri BB: Indian script character recognition: a survey. Pattern Recognition 2004, 37: 1887-1899. 10.1016/j.patcog.2004.02.003View ArticleGoogle Scholar
- Doermann D, Liang J, Li H: Progress in camera-based document image analysis. Proceedings of the 7thInternational Conference on Document Analysis and Recognition (ICDAR), IEEE, Volume 1 2003, 606-610.Google Scholar
- Arica N, Yarman-Vural FT: An overview of character recognition focused on off-line handwriting. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications And Reviews 2001, 31(2):216-233.View ArticleGoogle Scholar
- Amin A: Off-line Arabic character-recognition: the state of the art. Pattern Recognition 1998, 31: 517-530. 10.1016/S0031-3203(97)00084-8View ArticleGoogle Scholar
- ABBYY Software. http://www.finereader.com
- Free OCR Software. http://www.softi.co.uk
- Kumano S, Miyamoto K, Tamagawa M, Ikeda H, Kan K: Development of container identification mark recognition system. Trans. Inst. Electron. Inform. Commun. Eng. D-I 2001, J84D-II(6):1073-1083.Google Scholar
- Cui Y, Huang Q: Character extraction of license plates from video. Proceeding of International Conference on CVPR 1997, 502-507.Google Scholar
- Hua S, Liu W, Zhang HJ: Automatic performance evaluation for video text detection. In International Conference on Document Analysis and Recognition (ICDAR). Seattle, WA, USA; 2001:545-550.Google Scholar
- Jain AK, Yu B: Automatic text location in images and video frames. Pattern Recognition 1998, 31(12):2055-2076. 10.1016/S0031-3203(98)00067-3View ArticleGoogle Scholar
- Lim Y, Choi S, Lee S: Text extraction in MPEG compressed video for content-based indexing. Proceeding of 15th International Conference on Pattern Recognition (ICPR), Volume 4 2000, 409-412.Google Scholar
- Pavlidis T: Recognition of printed text under realistic conditions. Pattern Recognition Letters 1993, 14(4):317-326. 10.1016/0167-8655(93)90097-WMathSciNetView ArticleGoogle Scholar
- Sato T, Kanade T, Hughes EK, Smith MA: Video OCR for digital news archives. IEEE International Workshop on Content-Based Access of Image and Video Database 1998.Google Scholar
- Sun Q, Lu Y: Text location in camera-captured guide post images. Proceeding of Chinese Conference on Pattern Recognition (CCPR), IEEE 2010, 1-4.Google Scholar
- Xilin C, Jie Y, Jing Z, Waibel A: Automatic detection and recognition of signs from natural scenes. IEEE Trans. Image Process 2004, 13(1):87-99. 10.1109/TIP.2003.819223View ArticleGoogle Scholar
- Bascon SM, Arroyo SL, Jimenez PG, Moreno HG, Ferreras LF: Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems 2007, 8(2):264-278.View ArticleGoogle Scholar
- Xie Q, Kobayashi A: A construction of pattern recognition system invariant of translation, scale-change and rotation transformation of pattern. Transactions of the Society of Instrument and Control Engineers 1991, 27: 1167-1174.View ArticleGoogle Scholar
- Tang YY, Cheng HD, Suen CY: Translation-ring-projection (TRP) algorithm and its VLSI implementations. Character and Hand-writing Recognition, World Scientific 1991, 25-56.View ArticleGoogle Scholar
- Adam S, Ogier JM, Carlon C, Mullot R, Labiche J, Gardes J: Symbol and character recognition: application to engineering drawing. International Journal of Document Analysis and Recognition 2000, 3: 89-101.View ArticleGoogle Scholar
- Yang TN, Wang SD: A rotation invariant printed Chinese character recognition system source. Pattern Recognition Letters 2001, 22: 85-95. 10.1016/S0167-8655(00)00089-1MATHView ArticleGoogle Scholar
- Hase H, Shinokawa T, Yoneda M, Suen CY: Recognition of rotated characters by Eigen-space. Proceeding of 7th International Conference on Document Analysis and Recognition (ICDAR) 2003, 731-735.Google Scholar
- Pal U, Tripathy N: Recognition of Indian multi-oriented and curved text. Proceedings of the 8th International Conference on Document Analysis and Recognition (ICDAR) 2005, 141-145.Google Scholar
- Hayashi T, Takagi N: A consideration on rotation invariant character recognition. World Automation Congress 2006, 1-6.Google Scholar
- Pal U, Kimura F, Roy K, Pal T: Recognition of English multi-oriented characters. Proceedings of the International Conference on Pattern Recognition 2006, 873-876.Google Scholar
- Monwar M, Haque W, Paul PP: A new approach for rotation invariant optical character recognition using Eigen digit. Proceedings of the Canadian Conference on Electrical and Computer Engineering 2007, 1317-1320.Google Scholar
- Roy PP, Pal U, Lladós J, Kimura F: Convex Hull based approach for multi-oriented character recognition from graphical documents. Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), IEEE 2008, 1-4.Google Scholar
- Pal U, Roy PP, Tripathy N, Llados J: Multi-oriented Bangla and Devnagari text recognition. Pattern Recognition 2010, 43: 4124-4136. 10.1016/j.patcog.2010.06.017MATHView ArticleGoogle Scholar
- Chiang YY, Knoblock CA: Recognition of multi-oriented, multi-sized, and curved text. Proceeding of International Conference on Document Analysis and Recognition (ICDAR) 2011, 1399-1403.Google Scholar
- Shivakumara P, Phan TQ, Tan CL: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33(2):412-419.View ArticleGoogle Scholar
- Gatos B, Pratikakis I, Perantonis SJ: Adaptive degraded document image binarization. Pattern Recognition 2006, 39: 317-327. 10.1016/j.patcog.2005.09.010MATHView ArticleGoogle Scholar
- Gonzalez RC, Woods RE: Digital image processing. (DIP/3e), Pearson Education Asia 3rd edition. 2008.Google Scholar
- Singh BM, Goswami S, Goyal P, Mittal A: A robust thinning algorithm for straightening of curved text line. In Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011), December 20–22, Advances in Intelligent and Soft Computing . Springer 2012, 131: 903-910.Google Scholar
- Boor CD: A practical guide to splines. Springer Verlag, Berlin - Heidelberg - New York; 1978:113-114.MATHView ArticleGoogle Scholar
- Knott GD: Interpolating cubic splines. Springer, Springer; 2000:151.MATHView ArticleGoogle Scholar
- Lee ETY: A simplified B-spline computation routine. Computing Springer-Verlag 1982, 29(4):365-371.MATHGoogle Scholar
- Lee ETY: Comments on some B-spline algorithms. Computing, Springer-Verlag 1986, 36(3):229-238.MATHGoogle Scholar
- Brinks R: On the convergence of derivatives of B-splines to derivatives of the Gaussian function. Comp. Appl. Math 2008, 27: 1.MathSciNetView ArticleGoogle Scholar
- Prautzsch H, Boehm W, Paluszny M: Bezier and B-spline techniques. Springer, Berlin-Heidelberg-New York; 2002:60-66.MATHView ArticleGoogle Scholar
- Splitting a uniform B-spline curve: Splitting a uniform B-spline curve. ( http://www.idav.ucdavis.edu/education/CAGDNotes/Quadratic-Uniform-B-Spline-Curve-Splitting/Quadratic-Uniform-B-Spline-Curve-Splitting.html
- Runge C: Uber empirische Funktionen und die Interpolation zwischen aquidistanten Ordinaten. Zeitschrift für Mathematik und Physik 1901, 46: 224-243. available at http://www.archive.org (http://www.archive.org/details/zeitschriftfrma12runggoog), 1901MATHGoogle Scholar
- Berrut JP, Trefethen LN: Barycentric lagrange interpolation. SIAM Review 2004, 46: 501-517. http://dx.doi.org/10.1137%2FS0036144502417715), (http://www.worldcat.org/issn/1095-7200 10.1137/S0036144502417715MATHMathSciNetView ArticleGoogle Scholar
- Dahlquist G, Bjork A: Equidistant Interpolation and the Runge Phenomenon. Numerical Methods 1974, 101-103.Google Scholar
- Deyi Zhang BS: Least squares approximation by splines with free knots: optimization by hybrid of global and local search. A Thesis In Mathematics and Statistics 2010. 2010Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.