Distributed Clustering Algorithm for Spatial Data Mining

Bendechache, MalikaMalikaBendechacheKechadi, TaharTaharKechadiChen, Chong ChengChong ChengChen2015-04-302015-04-302015http://hdl.handle.net/10197/6526International conference on Integrated Geo-spatial Information Technology and its Application to Resource and Environmental Management towards GEOSS (IGIT 2015), Alba Regia Technical Faculty of Óbuda University, Hungary, 16-17 January 2015Distributed data mining techniques and mainly distributed clustering are widely used in last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach analyses the datasets on their locations the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect knowledge. In this paper we propose a new clustering approach for very large spatial datasets that are heterogeneous and distributed. The approach is based on K-means Algorithm but it generates the number of global clusters dynamically. It is not necessary to fix the number of clusters. Moreover, this approach uses a very sophisticated aggregation phase. The aggregation phase is designed in such away that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. Preliminary results show that the proposed approach scales up well in terms of running time, and result quality, we also compared it to two other clustering algorithms BIRCH and CURE and we show clearly this approach is much more efficient than the two algorithms.enMachine learningStatisticsSpatial dataClusteringDistributed miningData analysisK-meanDistributed Clustering Algorithm for Spatial Data MiningConference Publication2015-03-19https://creativecommons.org/licenses/by-nc-nd/3.0/ie/