Detection of Data Manipulation in Datasets Using Machine Learning





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Download 16
File Size 493.37 KB
File Count 1
Create Date 18/06/2025
Last Updated 18/06/2025

Download

Description

Detection of Data Manipulation in Datasets Using Machine Learning

RIMSHA ARFEEN1, RAJKUMAR KENCHA2, SRI LATHA ARIGE3,

MOHAMMAD ABDUL RASHEED4, P.BALAKISHAN5

1,4 UG STUDENT,CSE Department & Jyothishmathi Institute of Technology and Science

5 ASSOCIATE PROFESSOR, CSE Department & Jyothishmathi Institute of Technology and Science

Abstract - Data integrity is pivotal for achieving model performance and delicacy and for making believable opinions in moment's data wisdom and analytics environment. This design enforced and estimated a machine literacy- driven frame that can descry data tampering through a generative analysis of a structured dataset in its original and acclimated countries. By transubstantiating both datasets to match their structure, and calculating a point-full difference vector, the system estimated and linked possible tampering in the acclimated dataset through statistical analysis on named ordered features, similar as Interquartile Range( IQR), entropy analysis, and Original Outlier Factor( LOF). These named features were also drafted into a Random Forest classifier that directly labelled each record as either tampered or not tampered. The end product showed significant pledge in landing anomalies, similar as outliers, null inserts, mismatching types and subtle shifts in value. The results indicated high perfection and recall on a range of manipulated datasets. Through successive trial, the system is promising for data confirmation and examination and the expansion of forensic auditing systems. This result is modular and scalable, which gives the added benefit of sound data integrity in critical means like finance, healthcare, and defense.

Key Words: Data manipulation detection, data quality, anomaly detection, Interquartile Range (IQR), Local Outlier Factor (LOF), entropy analysis, skewness imputation, Shannon entropy, outlier detection, Random Forest, supervised classification, feature engineering, descriptive feature extraction, difference vectors, machine learning pipeline, validate data, data forensics, structured data comparison, ETL validation, automated dataset checking, and classification accuracy.

Detection of Data Manipulation in Datasets Using Machine Learning

Detection of Data Manipulation in Datasets Using Machine Learning

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Detection of Data Manipulation in Datasets Using Machine Learning

Detection of Data Manipulation in Datasets Using Machine Learning

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us