Palm leaf Digitization: A Blockchain-based approach for finding Hidden Treasures
Dr. R. PREMA1, Jallepalli Aditya Sai 2, Kamireddy Hemanth 3
1Assisstant professor, Department of Computer Science and Engineering, SCSVMV, Kanchipuram
2 B.E graduate(IV year), Department of Computer Science and Engineering, SCSVMV, Kanchipuram
3B.E graduate(IV year), Department of Computer Science and Engineering, SCSVMV, Kanchipuram
Abstract - Many hidden treasures and useful information can be found in palm-leaf manuscripts. They contain information regarding medicine, astronomy, music, culture, etc. Manuscripts are written in different languages and different scripts. Identification of characters and linguistic scripts in palm-leaf manuscripts is a time-consuming and challenging process due to the manuscript's extreme fragility and susceptibility to deterioration from either natural or human-made activity. Characters, script, or language identification in palm leaf manuscripts is helpful for extracting useful information for knowledge retrieval and dissemination, and hence, this has created interest for many researchers in knowledge retrieval from palm leaf manuscripts since the last decade. Hence, it is necessary for the digitization and storing of manuscripts. The main aim of this work is to automatically identify, detect and recognize the metadata (Telugu and Tamil) from the digitized palm leaf manuscripts. Easy OCR is the step that helps in perceiving the text in pictures, tags, and so on. We used a Generative Pre-trained Transformer (GPT) for detecting or predicting the missing data from the text collected in the second stage. We relied upon trained language models based on image and character recognition. Each model is trained using GPT for the detection and prediction of the missing words or characters. For the translation of the extracted text from the existing language to the user’s language, we have used Google Translator. Finally, all the metadata is secured in Blockchain for ensuring tamper-proof data. Metadata Blockchain Model (MDBC) ensures that the metadata, translated and transliterated data is not erased and secured in a trusted environment.
Key Words: Blockchain Technology; Easy OCR; Generative Pre-trained Transformer; Google Translation, Optical Character Recognition.