Options
Distributed Spatial Data Clustering as a New Approach for Big Data Analysis
Date Issued
2017-08-20
Date Available
2019-03-21T15:28:48Z
Abstract
In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases: the first phase executes a clustering algorithm on local data, assuming that the datasets was already distributed among the system processing nodes. The second phase deals with the local clusters aggregation to generate global clusters. This approach not only generates local clusters on each processing node in parallel, but also facilitates the formation of global clusters without prior knowledge of the number of the clusters, which many partitioning clustering algorithm require. In this study, this approach was applied on spatial datasets. The pro- posed aggregation phase is very efficient and does not involve the exchange of large amounts of data between the processing nodes. The experimental results show that the approach has super-linear speed-up, scales up very well, and can take advantage of the recent programming models, such as MapReduce model, as its results are not affected by the types of communications.
Sponsorship
Science Foundation Ireland
Type of Material
Conference Publication
Publisher
Springer
Journal
Communications in Computer and Information Science
Volume
845
Start Page
38
End Page
56
Copyright (Published Version)
2018 Springer Nature Singapore
Language
English
Status of Item
Peer reviewed
Conference Details
The 15th Australasian Data Mining Conference, Melbourne, Australia, 19-20 August 2017
This item is made available under a Creative Commons License
File(s)
No Thumbnail Available
Name
insight_publication.pdf
Size
739.72 KB
Format
Adobe PDF
Checksum (MD5)
4ef315ff97a95e1e867d39d7fa8e9b56
Owning collection