AI-Powered Web Scraping and Parsing: A Browser Extension Using LLMs for Adaptive Data Extraction
Shubham Nevgi
Information Technology
Finolex Academy of Management and Technology
Ratnagiri,India
nevgishubham03@gmail.com
Sahil Kadam
Information Technology
Finolex Academy of Management and Technology
Ratnagiri,India
sahilk1503@gmail.com
Sahil Haldankar
Information Technology
Finolex Academy of Management and Technology
Ratnagiri,India
sahilph46@gmail.com
Sakshi Jadhav
Information Technology
Finolex Academy of Management and Technology
Ratnagiri,India
sakshijadhav0316@gmail.com
Prof. Rashmi More
Information Technology
Finolex Academy of Management and Technology
Ratnagiri,India
rashmi.more@famt.ac.in
Abstract— In the era of information overload, the need for extracting meaningful and structured data from unstructured web sources has grown significantly. Traditional web scraping tools often require significant manual effort to parse and format data, especially when dealing with complex or dynamic websites. To address this challenge, this project presents a Generative AI-based Web Scraping Browser Extension, an innovative tool that combines the power of browser automation, HTML parsing, and generative artificial intelligence to extract and interpret data intelligently. This browser extension allows users to input any URL and extract structured information from the web page using an intuitive interface. Unlike traditional scrapers that rely heavily on predefined rules or regular expressions, the system uses Generative AI models to understand the structure and context of web content. The backend, developed using FastAPI, integrates BeautifulSoup and Selenium for handling both static and dynamic web pages, while AI parsing is powered by transformer models (e.g., LLaMA 3.3). The data extracted can be visualized in a tabular format and downloaded in multiple formats, including CSV, JSON, XML, and Excel. One of the major highlights of the system is its ability to learn patterns from previously scraped data and intelligently adapt to new page layouts, significantly reducing the need for manual intervention. This enhances productivity and provides an accessible solution for both technical and non-technical users who need structured data for research, analytics, or business intelligence.
Keywords— Web Scraping, Generative AI, Data Extraction, FastAPI, Selenium, AI Parsing, Browser Extension, Automation.