Abstract—In this paper a line based script identification using a hierarchical classification scheme is proposed to identify the Indian scripts includes Hindi, Gurumukhi and Bangla. We model the problem as topological, structural classification problem and examine the features inspired by human visual perception. Our basic algorithm uses different feature set at different level of classifier to optimize the tradeoff between accuracy and speed. The feature extraction is done on the subsets of image which in turn increases the performance of algorithm. The proposed system attains overall classification accuracy of 90% over the 2500+ text image data set.
Index Terms—Feature extraction, hierarchical classification, script identification.
Bhupendra Kumar is from IIIT Allahabad with the specialization in wireless communication and computing. India.
Tushar Patnaik is leading the consortium based project “Development of Robust Document Analysis and Recognition System for Printed Indian Scripts”. India.
Aniket Bera is a graduate student of Jaypee Institute of Information Technology. India.
Cite: Bhupendra Kumar, Aniket Bera, and Tushar Patnaik, "Line Based Robust Script Identification for Indian Languages," International Journal of Information and Electronics Engineering vol. 2, no. 2, pp. 189-192, 2012.