DATA POISONING ATTACKS ON REGRESSION MODELS AND THEIR DEFENSES
Spoorti P Patgar
Akshata Jawade
Abhishek R Karanje
Student, Dept. of Cyber Security
APS College of Engineering
spoortipatgar@gmail.com
akshatajawade12@gmail.com
abhishekrk0001@gmail.com
Abstract
Machine learning models are widely deployed in critical domains such as healthcare, finance, transportation, and cyber-physical systems, making their security and reliability a paramount concern. Among the most significant threats to these systems are data poisoning attacks, which compromise model integrity by injecting malicious or manipulated data into training datasets, leading to degraded performance and unreliable predictions. While data poisoning has been extensively studied in classification tasks, its impact on regression models — which are equally critical for applications such as medication dosage management, power supply regulation, and financial forecasting — remains comparatively underexplored.
Experimental evidence consistently demonstrates that even a minimal fraction of poisoned data, as low as 2%, can drastically increase prediction error by up to 150% in mean squared error (MSE), underscoring the severity of this threat. Common attack strategies include label manipulation, feature perturbation, adversarial data injection, and optimization-based black-box attacks that
operate without prior knowledge of the target model. In response to these vulnerabilities, researchers have proposed several defense mechanisms, including Trimmed Loss, Differential Privacy, and the novel Iterative Trim (I Trim) method, which effectively detects and removes poisoned samples without requiring prior knowledge of attack intensity.
Additionally, adaptive strategies such as dynamic network structure adjustment and adaptive learning weights have demonstrated strong potential in reducing the influence of poisoned data during training, thereby preserving model accuracy and robustness. Evaluations conducted across diverse datasets confirm that these defense frameworks significantly mitigate poisoning effects while maintaining practical model performance. These findings collectively emphasize the urgent need for secure data pipelines, robust learning algorithms, and adaptive defense strategies to safeguard machine learning systems against evolving adversarial threats in real-world applications.
Keywords: Data Poisoning, Regression Models, Adversarial Machine Learning, Outlier Detection, RANSAC, Robust Learning.