Ripple: Localized Content-Centric Deduplication at the Edge
Mamidi Chinmayi, department of CSE, GNITC-HYD, 22-5H3, 22wj1a05h3@gniindia.org
Madupalli Venu, department of CSE, GNITC-HYD, 22-5H0, 22wj1a05h0@gniindia.org
Mohammad Azhar, department of CSE, GNITC-HYD, 23-521, 23wj5a0521@gniindia.org
Karinga Shankar, Assistant Professor, department of CSE, GNITC-HYD, shankar.csegnitc@gniindia.org
Abstract – Edge computing has become an important foundation for many latency-sensitive applications because it enables faster data access and reduces the amount of traffic sent to distant cloud servers. The rapid growth of data generated at the edge places significant pressure on the limited storage capacity of edge servers, making efficient resource management a major challenge. While existing solutions attempt to improve storage efficiency through techniques such as optimized data placement, partitioning, and data sharing, they often overlook the issue of data redundancy. Multiple copies of similar data may be stored across edge nodes, leading to unnecessary storage consumption and increased network overhead. Data deduplication at the edge can help address this problem by ensuring that only unique data is stored. Many current deduplication approaches rely on centralized coordination, which is not always suitable for real-world edge environments where a central controller may be unavailable or may introduce a single point of failure.
To overcome these limitations, this work proposes Ripple, a decentralized edge-based data deduplication framework that allows each edge server to perform deduplication independently. Ripple maintains a local data index on every edge node, enabling servers to detect duplicate data chunks, eliminate redundant copies while maintaining low latency, and preserve data availability through controlled replication strategies. The system is implemented using a full-stack Java architecture that integrates a RESTful service layer, a distributed storage layer, and a lightweight coordination protocol for exchanging deduplication metadata among edge nodes. Experimental evaluations using trace-driven workloads on a real-world edge testbed demonstrate the effectiveness of the proposed approach. The results show that Ripple reduces average data retrieval latency by approximately 60% and improves the deduplication ratio by up to 17% compared with existing edge and edge-assisted cloud deduplication techniques. These findings indicate that decentralized deduplication can significantly enhance storage efficiency while maintaining system reliability and quality of service in next-generation edge computing environments.
Key Words: Edge computing, data redundancy, data retrieval latency, data deduplication, data index.