HaRD: a heterogeneity-aware replica deletion for HDFS

DC FieldValueLanguage
dc.contributor.authorCiritoglu, Hilmi Egemen-
dc.contributor.authorMurphy, John-
dc.contributor.authorThorpe, Christina-
dc.date.copyright2019 the Authorsen_US
dc.identifier.citationJournal of Big Dataen_US
dc.description.abstractThe Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alter the replication factor of files dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we first confirm that reducing the replication factor causes unbalanced data distribution when using Hadoop’s default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneity-aware replica deletion scheme (HaRD). HaRD considers the nodes’ processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively.en_US
dc.description.sponsorshipEuropean Commission - European Regional Development Funden_US
dc.description.sponsorshipScience Foundation Irelanden_US
dc.rightsThe Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.en_US
dc.subjectHadoop distributed file system (HDFS)en_US
dc.subjectReplication factoren_US
dc.subjectReplica management frameworken_US
dc.subjectSoftware performanceen_US
dc.titleHaRD: a heterogeneity-aware replica deletion for HDFSen_US
dc.typeJournal Articleen_US
dc.statusPeer revieweden_US
dc.citation.otherArticle Number: 94en_US
dc.neeo.contributorCiritoglu|Hilmi Egemen|aut|-
dc.description.adminCheck citation details during checkdate report - ACen_US
item.fulltextWith Fulltext-
Appears in Collections:Computer Science Research Collection
Files in This Item:
File Description SizeFormat 
s40537-019-0256-6(1).pdf1.61 MBAdobe PDFDownload
Show simple item record

Page view(s)

checked on Apr 3, 2020


checked on Apr 3, 2020

Google ScholarTM



This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.