Data Duplication Detection and Removal System Using Machine Learning





Find us on Google Scholar

Peer Review Policy
Article Processing Charges
Publication Procedure
Research Topics
FAQ
Copyright Infringement
Refund and Cancellation Policy

Find us on Google Scholar

Peer Review Policy

Article Processing Charges

Publication Procedure

Research Topics

FAQ

Refund and Cancellation Policy

Version
Download 257
File Size 246.27 KB
File Count 1
Create Date 05/05/2025
Last Updated 06/05/2025

Download

Description

Data Duplication Detection and Removal System Using Machine Learning

ANSH BALGOTRA

Department of Information technology, Maharaja Agrasen Institute of Technology, New Delhi, India anshbalgotra@gmail.com

Abstract— The problem of missing data is a critical issue in various domains, as it can lead to inaccurate analysis and flawed decision-making. Traditional methods for handling missing values have been replaced by machine learning techniques, which offer more efficient solutions. Research in this area has explored various approaches to data imputation, analyzing their strengths and limitations. A systematic literature review of studies from 2016 to 2021 identified key factors influencing the effectiveness of thesemethods, providing valuable insights for researchers and data analysts. In parallel, the rapid expansion of data storage and processing has led to challenges in managing large -scale information, particularly in deduplication. Duplicate data, originating from multiple sources, complicates storage efficiency and retrieval accuracy. Cloud service providers have adopted data deduplication techniques to optimize storage costs and bandwidth usage. However, the conflict between encryption for security and deduplication efficiency presents a challenge. To address this, hybrid chunking methods, such as the Two Threshold Two Divisor (TTTD) and Dynamic Prime Coding (DPC) algorithm, have been proposed. These techniques improve deduplication performance while balancing security requirements. Furthermore, entity resolution plays a crucial role in information integration, aiming to consolidate and organize data from diverse sources. Deduplication, as a key step in this process, enhances data quality by identifying and eliminating redundant records. Research in this domain spans machine learning, data mining, and information retrieval, focusing on both supervised and unsupervised approaches. By analyzing various methodologies, researchers can refine existing techniques to improve accuracy, processing speed, and computational efficiency. Overall, advancements in machine learning, deduplication, and entity resolution contribu te to more effective data management, addressing challenges in missing data imputation, secure deduplication, and large-scale information integration.

Keywords— Missing Data, Data Quality, Machine Learning, Processing Speed, Computational Efficiency, Structured Data, Unstructured Data, Database Management, Encryption, Accuracy, Performance

Data Duplication Detection and Removal System Using Machine Learning

Data Duplication Detection and Removal System Using Machine Learning

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us

Data Duplication Detection and Removal System Using Machine Learning

Data Duplication Detection and Removal System Using Machine Learning

What is DOI

Site Map

Frequently Asked Questions

Why IJSREM?

Publication Time Period

Publication Procedure

Processing Fee's

Follow Us

Working Hours

Contact Us