Repository logo
  • Log In
    New user? Click here to register.Have you forgotten your password?
University College Dublin
  • Colleges & Schools
  • Statistics
  • All of DSpace
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. College of Science
  3. School of Computer Science
  4. Computer Science Research Collection
  5. HaRD: a heterogeneity-aware replica deletion for HDFS
 
  • Details
Options

HaRD: a heterogeneity-aware replica deletion for HDFS

File(s)
FileDescriptionSizeFormat
Download s40537-019-0256-6(1).pdf1.57 MB
Author(s)
Ciritoglu, Hilmi Egemen 
Murphy, John 
Thorpe, Christina 
Uri
http://hdl.handle.net/10197/11328
Date Issued
21 October 2019
Date Available
20T13:14:59Z March 2020
Abstract
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets reliably on clusters of commodity machines. The HDFS takes advantage of replication to serve data requested by clients with high throughput. Data replication is a trade-off between better data availability and higher disk usage. Recent studies propose different data replication management frameworks that alter the replication factor of files dynamically in response to the popularity of the data, keeping more replicas for in-demand data to enhance the overall performance of the system. When data gets less popular, these schemes reduce the replication factor, which changes the data distribution and leads to unbalanced data distribution. Such an unbalanced data distribution causes hot spots, low data locality and excessive network usage in the cluster. In this work, we first confirm that reducing the replication factor causes unbalanced data distribution when using Hadoop’s default replica deletion scheme. Then, we show that even keeping a balanced data distribution using WBRD (data-distribution-aware replica deletion scheme) that we proposed in previous work performs sub-optimally on heterogeneous clusters. In order to overcome this issue, we propose a heterogeneity-aware replica deletion scheme (HaRD). HaRD considers the nodes’ processing capabilities when deleting replicas; hence it stores more replicas on the more powerful nodes. We implemented HaRD on top of HDFS and conducted a performance evaluation on a 23-node dedicated heterogeneous cluster. Our results show that HaRD reduced execution time by up to 60%, and 17% when compared to Hadoop and WBRD, respectively.
Sponsorship
European Commission - European Regional Development Fund
Science Foundation Ireland
Type of Material
Journal Article
Publisher
Springer
Journal
Journal of Big Data
Volume
6
Issue
1
Copyright (Published Version)
2019 the Authors
Keywords
  • Hadoop distributed fi...

  • Replication factor

  • Replica management fr...

  • Software performance

DOI
10.1186/s40537-019-0256-6
Dataset(s)
https://www.ncdc.noaa.gov/cdo-web/datasets
Language
English
Status of Item
Peer reviewed
ISSN
2196-1115
This item is made available under a Creative Commons License
https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
Owning collection
Computer Science Research Collection
Scopus© citations
4
Acquisition Date
Jan 28, 2023
View Details
Views
709
Last Month
12
Acquisition Date
Jan 28, 2023
View Details
Downloads
150
Last Week
4
Last Month
8
Acquisition Date
Jan 28, 2023
View Details
google-scholar
University College Dublin Research Repository UCD
The Library, University College Dublin, Belfield, Dublin 4
Phone: +353 (0)1 716 7583
Fax: +353 (0)1 283 7667
Email: mailto:research.repository@ucd.ie
Guide: http://libguides.ucd.ie/rru

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement