Fine-Tuning Small LLMs for High-Quality Semantic Search: A Cost-Efficient Alternative to Foundation Models
PURIPANDA SHARAT CHANDRA
1Artificial Intelligence And Machine Learning & R V College of Engineering
Abstract - Large language models (LLMs) have demonstrated remarkable performance in natural language understanding, yet their deployment for real-time semantic search and recommendation tasks remains impractical due to significant computational demands. This paper introduces a cost-efficient framework for fine-tuning small-scale models tailored for high-quality semantic movie recommendation. We leverage Gemma 3, a compact generative model, to produce enriched natural language descriptions of movies from structured metadata, and Granite Embedder, a lightweight transformer-based encoder, to compute dense vector representations for semantic similarity retrieval. Fine-tuning is performed using contrastive learning on curated triplet datasets derived from public movie data sources, enabling the model to learn meaningful semantic distances between similar and dissimilar movie entries.
Our pipeline developed using Python with Hugging Face Transformers, PyTorch, Qdrant supports end-to-end generation, embedding, and retrieval of semantically similar movies. All experiments were conducted on an AWS EC2 instance equipped with a 24 GB GPU, allowing for efficient training and inference at scale. We demonstrate a notable improvement in recommendation quality, with Recall@10 increasing from 0.56 to 0.81, and mean cosine similarity between relevant movie vectors improving from 0.43 to 0.72 after fine-tuning. A sample system output, such as “If you enjoyed avengers:age of ultron, you might love eternals for its mind-bending story and similar sci-fi execution,” showcases the model’s contextual sensitivity and domain-specific relevance. This research highlights a scalable, low-cost alternative to large foundation models for semantic search and recommendation tasks, effective as of June 2, 2025.
Key Words: Semantic Search, Fine-Tuning, Small Language Models, Vector Embeddings, Cost-Efficiency.