“Website scrapping using request library and beautiful Soup for extracting data of malicious website”
Ms. Shreya Thakur
MSc. Cyber Forensics
Shri Vaishnav Vidyapeeth Vishwavidyalaya, Indore, Madhya Pradesh, India
Ms. Shilpa Jossy
Assistant Professor, Department of Forensic Science
Shri Vaishnav Vidyapeeth Vishwavidyalaya, Indore, Madhya Pradesh, India
Abstract
The main objective of this research is to leverage Python modules like Requests and BeautifulSoup for web scraping, which allows data extraction from malicious websites. The Requests package makes use of HTTP requests to make it easier to retrieve web pages, while BeautifulSoup is used to parse and browse HTML content so that important information can be extracted quickly. The process entails actions like locating potentially harmful URLs, examining webpage components, and extracting pertinent information including URLs, IP addresses, and possibly malicious scripts. As part of the study, the extracted data will also be stored in an organized format for further examination in the field of digital forensics.
The project shows how online scraping may be used for cybersecurity, giving analysts and researchers important information about malware distribution channels, phishing URLs, and possible dangers. The outcomes highlight how useful these tools are for automating data collecting, which can improve threat intelligence and help identify cyberthreats early on. The ethical and legal aspects of online scraping are emphasized as being dependent on legal concerns, particularly in delicate situations.
To sum up, integrating BeautifulSoup with the Requests library provides a useful method for obtaining useful information from harmful websites, which can help cybersecurity experts reduce risks. To strengthen cybersecurity defenses even further, future study may examine vulnerability scanning of the gathered data using programs like Nessus.
Keywords
Web Scraping, Malicious Websites, Request Library, BeautifulSoup, Cybersecurity, Digital Forensics, Data Extraction, Threat Intelligence, HTML Parsing, Vulnerability Scanning.