ScrapeFlow: An AI-Augmented Visual Web Scraping and Data Workflow Platform
Prof. Seema Pawar
Dept. of Artificial Intelligence and Data Science Vasantdada Patil Pratishthan’s College of Engineering Mumbai, India
Ayush Gupta
Dept. of Artificial Intelligence and Data Science Vasantdada Patil Pratishthan’s College of Engineering Mumbai, India
Omkar Manjrekar
Dept. of Artificial Intelligence and Data Science Vasantdada Patil Pratishthan’s College of Engineering Mumbai, India
Ritesh Trimukhe
Dept. of Artificial Intelligence and Data Science Vasantdada Patil Pratishthan’s College of Engineering Mumbai, India
Ganesh Dubey
Dept. of Artificial Intelligence and Data Science Vasantdada Patil Pratishthan’s College of Engineering Mumbai, India
Abstract—Building a complete web data pipeline today means choosing between two incompatible options: API orchestrators like n8n and Zapier offer visual DAG editors but no native browser automation, while scraping frameworks such as Apify provide headless browser control at the cost of requiring devel-oper expertise [4] [5]. Prior academic work on web automation scripting [1], natural-language parameterization [2], and hierar-chical scraping visualization [3] advanced individual concerns yet remained dependent on brittle DOM-structural selectors and did not converge visual workflow design, browser execution, and AI-driven extraction into a single system. To close this gap, we built ScrapeFlow—an open-source platform on Next.js 16 that brings together a typed-handle DAG editor, a phase-based Puppeteer execution engine with merge-point-aware branching and three loop variants, and a provider-agnostic AI layer covering seven LLM backends including local Ollama. A capability evaluation against eight commercial platforms across ten feature dimensions shows that ScrapeFlow is the only system satisfying all dimensions simultaneously, and a case study demonstrates an end-to-end pipeline from AI-powered data extraction through sentiment analysis to conditional webhook delivery. Taken together, these results suggest that semantic web orchestration is a practical design approach—one that gives non-technical users access to enterprise-grade control-flow without writing a single line of code.
Index Terms—web scraping, workflow automation, visual programming, large language models, directed acyclic graph, browser automation