Towards Intelligent Legal Information Retrieval a Transformer Based Framework
Karthik Surya N A
Department Of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, Tamil Nadu, India
n.a.karthiksurya@gmail.com
Vedabhishekh T
Department Of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, Tamil Nadu, India
vedabhi246@gmail.com
Mrs. Vidhya Muthulakshimi
Assistant Professor
Department of Artificial Intelligence and Data Science
Panimalar Engineering College Chennai, Tamil Nadu, India vidhyamuthulakshimi@gmail.com
Purushothaman G
Department Of Artificial Intelligence and Data Science
Panimalar Institute of Technology Chennai, Tamil Nadu, India
g.purushoth643@gmail.com
Abstract — In the evolving landscape of legal technology, retrieving relevant laws and case judgments efficiently remains a critical challenge due to the complexity, ambiguity, and contextual nature of legal language. Traditional keyword-based legal search engines often fail to capture the semantic relevance required for precise legal reasoning. This paper introduces a modern Transformer-based Legal Information Retrieval System tailored to the Indian legal domain, leveraging Retrieval-Augmented Generation (RAG) architecture. The proposed system integrates Google's Gemma 2B-IT language model with semantic embedding techniques using all-MiniLM-L6-v2 for dense vector indexing and similarity matching. A curated corpus of Indian statutes and over 4,000 case judgments is pre-processed, indexed, and embedded using LlamaIndex to enable contextual document retrieval. Queries from users are interpreted semantically and matched with the most relevant legal content before being synthesized into a natural language response. The system demonstrates significant improvements in relevance and response quality compared to rule-based approaches and generic LLM outputs. This research aims to empower legal professionals, students, and the public by providing fast, accurate, and interpretable legal insights through AI, reducing the dependency on manual legal research and improving access to justice.
Keywords - Legal Information Retrieval, Retrieval Augmented Generation (RAG), Gemma 2B-IT, Semantic Search, Legal NLP, Case Law Matching, Deep Learning, Legal Document Understanding, Transformer Models.