Data Poison Detection in Automation Vehicles
CH. China Subba Reddy1, K.Jhansi2, M.Raghu varna3, S.Praveen4,D.Vazeer Hussain5
1CH.China Subba Reddy CSE & Joginpally B.R. Engineering College
2K.Jhansi CSE & Joginpally B.R. Engineering College
3M.Raghu varna CSE & Joginpally B.R. Engineering College ge
4S.Praveen CSE & Joginpally B.R. Engineering College
5D.Vazeer Hussain CSE & Joginpally B.R. Engineering College
-
Abstract - The rise of autonomous vehicles (AVs) has transformed modern transportation, offering enhanced safety, efficiency, and convenience. However, these intelligent systems rely heavily on machine learning models trained on vast datasets, which makes them vulnerable to data poisoning attacks. Data poisoning is a form of adversarial attack where malicious data is injected into the training set to manipulate the behavior of a model, potentially leading to catastrophic consequences in real-world applications.
This project, titled "Data Poison Detection in Autonomous Vehicles", aims to develop a reliable method to identify and mitigate poisoned data during the training phase of autonomous driving models. Using the German Traffic Sign Recognition Benchmark (GTSRB) dataset, we simulate data poisoning by injecting adversarial patterns into a subset of training images. We then apply feature extraction techniques, including Histogram of Oriented Gradients (HOG) and color histograms, to transform the image data into numerical feature vectors.
A Support Vector Machine (SVM) classifier, integrated with a standardization pipeline, is trained to differentiate between clean and poisoned data. The model is evaluated using accuracy, confusion matrix, and classification reports. The proposed system successfully detects poisoned data with high precision, indicating its potential for real-world deployment in AV training pipelines.
Our results highlight the importance of pre-training data validation and propose an effective approach to enhance the robustness of autonomous vehicle systems against poisoning attacks. Future improvements could include real-time detection during data collection and integrating deep learningbased anomaly detection techniques.