Study of Distributed Dynamic Clustering Framework for Spatial Data Mining

Files in This Item:
File Description SizeFormat 
Bendechache_ucd_5090D_10195.pdf10.44 MBAdobe PDFDownload
Title: Study of Distributed Dynamic Clustering Framework for Spatial Data Mining
Authors: Bendechache, Malika
Permanent link: http://hdl.handle.net/10197/10614
Date: 2017
Online since: 2019-05-22T12:59:17Z
Abstract: The amount of data generated per year will reach more than 44, 000 billion gigabytes in 2020, ten times more than in 2003 and this is likely to continue according to the current trends. This means more than 10, 000gigabytes per person and per year of data were generated by the daily life. Therefore, the term of "Big Data" was introduced. Big Data refers to very large datasets that are collected from different fields, which heterogeneous and continue to grow at rapid pace. Analysing and extracting relevant information from these datasets is one of the biggest challenges due to their needs to huge storage capacity, processing power, efficient mining algorithms to deal not only with the size but also with heterogeneity, noise, and their learning capacity. These require architectural modifications in the data storage and in the data management, as well as the development of new algorithms for efficient Big Data mining. In fact, the analysis of Big Data requires powerful, scalable, and accurate data analytics techniques that the traditional data mining and machine learning do not have as a whole. Therefore, new data analytics frameworks are needed to deal with the Big Data challenges such as volume, velocity, veracity, variety of the data. Distributed data mining constitutes a promising approach for Big Data analytics, as datasets are usually produced in distributed locations, and processing them on their local sites will reduce significantly the response times, communications, etc. In this thesis, we developed and implemented a data mining framework that can analyse Big Data within a reasonable response time, produce accurate results, and use existing and current computing and storage infrastructure, such as cloud computing. The framework is distributed and deals with issues of high-performance computing. The proposed approach was developed and implemented for spatial data mining. It is general and can handle very large data and deals with data heterogeneity and velocity of the datasets. The approach consists of two phases. The first phase generates local models and the second one tends to aggregate the local results to obtain global models. It is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. The approach was thoroughly tested and compared to well-known clustering algorithms. The results show that the approach not only produces high-quality results compared to the existing approaches but also has super-linear speed-up and scales up very well by taking advantage of theHadoop MapReduce paradigm.
Type of material: Doctoral Thesis
Publisher: University College Dublin. School of Computer Science  
Qualification Name: Ph.D.
Copyright (published version): 2017 the author
Keywords: Big DataDBSCANDistributed ClusteringDynamic K-meansParallel ClusteringSpatial Data Mining
Other versions: http://dissertations.umi.com/ucd:10195
Language: en
Status of Item: Peer reviewed
Appears in Collections:Computer Science Theses

Show full item record

Google ScholarTM

Check


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.