Options
High-Level Data Partitioning for Parallel Computing on Heterogeneous Hierarchical HPC Platforms
File(s)
File | Description | Size | Format | |
---|---|---|---|---|
UCD-CSI-2011-10.pdf | 1.61 MB |
Author(s)
Date Issued
2011
Date Available
11T11:00:50Z August 2021
Abstract
The current state and foreseeable future of high performance scientific computing (HPC) can be described in
three words: heterogeneous, parallel and distributed. These three simple words have a great impact on the architecture and design of HPC platforms and the creation and execution of efficient algorithms and programs designed to run on them. As a result of the inherent heterogeneity, parallelism and distribution which promises to continue to pervade scientific computing in the coming years, the issue of data distribution and therefore data partitioning is unavoidable. This data distribution and partitioning is due to the inherent parallelism of almost all scientific computing platforms. Cluster computing has become all but ubiquitous with the development of clusters of clusters and grids becoming increasingly popular. Even at a lower level, high performance symmetric multiprocessor (SMP) machines, General Purpose Graphical Processing Unit
(GPGPU) computing, and multiprocessor parallel machines play an important role. At a very low level, multicore technology is now widespread, increasing in heterogeneity, and promises to be omnipresent in the near future. The prospect of prevalent manycore architectures will inevitably bring yet more heterogeneity.
Scientific computing is undergoing a paradigm shift like none before. Only a decade ago most high performance scientific architectures were homogeneous in design and heterogeneity was seen as a difficult and somewhat limiting feature of some architectures. However this past decade has seen the rapid development of architectures designed not only to exploit heterogeneity but architectures designed to be heterogeneous. Grid and massively distributed computing has led the way on this front. The current shift is moving from this to architectures that are not heterogeneous by definition, but heterogeneous by necessity. Cloud and exascale computing architectures and platforms are not designed to be heterogeneous as much as they are heterogeneous by definition. Indeed such architectures cannot be homogeneous on any large (and useful) scale. In fact more and more researchers see heterogeneity as the natural state of computing. Further to hardware advances, scientific problems have become so large that the use of more than one of any of the above platforms in parallel has become necessary, if not unavoidable. Problems such as climatology and projects including the Large Hadron Collider necessitate the use of extreme-scale parallel platforms, often encompassing more than one geographically central supercomputer or cluster. Even at the core level large
amounts of information must be shared efficiently. One of the greatest difficulties in solving problems on such
architectures is the distribution of data between the different components in a way that optimizes runtime. There have been numerous algorithms developed to do so over the years. Most seek to optimize runtime by reducing the total volume of communication between processing entities. Much research has been conducted to do so between distinct processors or nodes, less so between distributed clusters. This report presents new data partitioning algorithms for matrix and linear algebra operations. These algorithms would in fact work with little or no modification for any application with similar communication patterns. In practice these partitionings
distribute data between a small number of computing entities, each of which can have great computational power themselves, and an even greater aggregate power. These partitionings may also be deployed in a hierarchical manner, which allows the flexibility to be employed in a great range of problem domains and computational platforms. These partitionings, in hybrid form, working together with more traditional partitionings, minimize the total volume of communication between entities in a manner proven to be optimal. This is done regardless of the power ratio that exists between the entities, thus minimizing execution time. There is also no restriction on the algorithms or methods employed on the clusters themselves locally, thus
maximizing flexibility. Finally, most heterogeneous algorithms and partitionings are designed by modifying existing homogeneous ones. With this in mind the ultimate contribution of this report is to demonstrate that non-traditional and perhaps unintuitive algorithms and partitionings designed with heterogeneity in mind from the start can result in better, and in many cases optimal, algorithms and partitionings for heterogeneous platforms. The importance of this given the current outlook for, and trends in, the future of high performance scientific computing is obvious.
Sponsorship
Science Foundation Ireland
Other Sponsorship
University College Dublin School of Computer Science and Informatics
Type of Material
Technical Report
Publisher
University College Dublin. School of Computer Science and Informatics
Series
UCD CSI Technical Reports
UCD-CSI-2011-10
Copyright (Published Version)
2011 the Authors
Language
English
Status of Item
Not peer reviewed
This item is made available under a Creative Commons License
Owning collection
Views
278
Acquisition Date
Mar 22, 2023
Mar 22, 2023
Downloads
29
Last Week
1
1
Last Month
2
2
Acquisition Date
Mar 22, 2023
Mar 22, 2023