Few-Shot Hindi Handwritten Text Recognition Using Meta-Optimized Res-Net-Transformer Networks
A Md Tahseen Equbal
M.Tech Student, Department of Computer Science and Engineering
All Saints College of Technology, Bhopal, India
Affiliated to Rajiv Gandhi Proudyogiki Vishwavidyalaya (RGPV)
mdtahseen278@gmail.com
B Prof. Sarwesh Site
Associate Professor, Department of Computer Science and Engineering
All Saints College of Technology, Bhopal, India
Affiliated to Rajiv Gandhi Proudyogiki Vishwavidyalaya (RGPV)
er.sarwesh@gmail.com
ABSTRACT
Handwritten Text Recognition (HTR) in Indic scripts such as Hindi presents significant challenges due to diverse handwriting styles, limited annotated data, and the inherent complexity of Devanagari script. This dissertation proposes a novel few-shot learning framework titled RT-MAML (Res-Net-Transformer with Model-Agnostic Meta-Learning) to address these challenges. The architecture combines a Res-Net18 encoder for effective visual feature extraction and a Transformer-based decoder for sequence modeling, trained using a meta-learning strategy to enable rapid adaptation to new writers with minimal data. Unlike conventional CTC-based HTR systems, our model incorporates a cross-entropy- based sequence decoder with beam search, enhancing recognition accuracy. The approach is evaluated on the IIIT-HW Hindi handwritten word dataset and demonstrates a Character Error Rate (CER) of 6% and Word Error Rate (WER) of 13.0%, outperforming several existing baselines, including CNN- BiLSTM-CTC and Vision Transformer models. Meta-optimization through MAML improves the model's generalization to unseen writing styles, while techniques such as orthogonality regularization and curriculum learning further refine the learning process. The study highlights the potential of combining deep visual encoders, attention-based decoding, and meta-learning for low-resource script recognition tasks. Future work will explore expanding this framework to multilingual handwritten datasets and integrating large language models for semantic-aware decoding. This research contributes to the advancement of intelligent handwriting recognition systems in Indian languages under limited- resource settings.
Keywords: Handwritten Text Recognition (HTR), Hindi Script, Res-Net,Transformer Decoder, Connectionist Temporal Classification (CTC), Beam Search Decoding, Character Error Rate (CER), Word Error Rate (WER), Indic NLP, Deep Learning, Regularization, Script Complexity,