High-Level Data Partitioning for Parallel Computing on Heterogeneous Hierarchical HPC Platforms
|Title:||High-Level Data Partitioning for Parallel Computing on Heterogeneous Hierarchical HPC Platforms||Authors:||Becker, Brett A.||Permanent link:||http://hdl.handle.net/10197/12404||Date:||2011||Online since:||2021-08-11T11:00:50Z||Abstract:||The current state and foreseeable future of high performance scientific computing (HPC) can be described in three words: heterogeneous, parallel and distributed. These three simple words have a great impact on the architecture and design of HPC platforms and the creation and execution of efficient algorithms and programs designed to run on them. As a result of the inherent heterogeneity, parallelism and distribution which promises to continue to pervade scientific computing in the coming years, the issue of data distribution and therefore data partitioning is unavoidable. This data distribution and partitioning is due to the inherent parallelism of almost all scientific computing platforms. Cluster computing has become all but ubiquitous with the development of clusters of clusters and grids becoming increasingly popular. Even at a lower level, high performance symmetric multiprocessor (SMP) machines, General Purpose Graphical Processing Unit (GPGPU) computing, and multiprocessor parallel machines play an important role. At a very low level, multicore technology is now widespread, increasing in heterogeneity, and promises to be omnipresent in the near future. The prospect of prevalent manycore architectures will inevitably bring yet more heterogeneity. Scientific computing is undergoing a paradigm shift like none before. Only a decade ago most high performance scientific architectures were homogeneous in design and heterogeneity was seen as a difficult and somewhat limiting feature of some architectures. However this past decade has seen the rapid development of architectures designed not only to exploit heterogeneity but architectures designed to be heterogeneous. Grid and massively distributed computing has led the way on this front. The current shift is moving from this to architectures that are not heterogeneous by definition, but heterogeneous by necessity. Cloud and exascale computing architectures and platforms are not designed to be heterogeneous as much as they are heterogeneous by definition. Indeed such architectures cannot be homogeneous on any large (and useful) scale. In fact more and more researchers see heterogeneity as the natural state of computing. Further to hardware advances, scientific problems have become so large that the use of more than one of any of the above platforms in parallel has become necessary, if not unavoidable. Problems such as climatology and projects including the Large Hadron Collider necessitate the use of extreme-scale parallel platforms, often encompassing more than one geographically central supercomputer or cluster. Even at the core level large amounts of information must be shared efficiently. One of the greatest difficulties in solving problems on such architectures is the distribution of data between the different components in a way that optimizes runtime. There have been numerous algorithms developed to do so over the years. Most seek to optimize runtime by reducing the total volume of communication between processing entities. Much research has been conducted to do so between distinct processors or nodes, less so between distributed clusters. This report presents new data partitioning algorithms for matrix and linear algebra operations. These algorithms would in fact work with little or no modification for any application with similar communication patterns. In practice these partitionings distribute data between a small number of computing entities, each of which can have great computational power themselves, and an even greater aggregate power. These partitionings may also be deployed in a hierarchical manner, which allows the flexibility to be employed in a great range of problem domains and computational platforms. These partitionings, in hybrid form, working together with more traditional partitionings, minimize the total volume of communication between entities in a manner proven to be optimal. This is done regardless of the power ratio that exists between the entities, thus minimizing execution time. There is also no restriction on the algorithms or methods employed on the clusters themselves locally, thus maximizing flexibility. Finally, most heterogeneous algorithms and partitionings are designed by modifying existing homogeneous ones. With this in mind the ultimate contribution of this report is to demonstrate that non-traditional and perhaps unintuitive algorithms and partitionings designed with heterogeneity in mind from the start can result in better, and in many cases optimal, algorithms and partitionings for heterogeneous platforms. The importance of this given the current outlook for, and trends in, the future of high performance scientific computing is obvious.||Funding Details:||Science Foundation Ireland||Funding Details:||University College Dublin School of Computer Science and Informatics||Type of material:||Technical Report||Publisher:||University College Dublin. School of Computer Science and Informatics||Series/Report no.:||UCD CSI Technical Reports; UCD-CSI-2011-10||Copyright (published version):||2011 the Authors||Keywords:||Parallel computing; Heterogeneous computing; High performance computing; Scientific computing; Data partitioning; Minimising communication; Matrix-matrix multiplication||Other versions:||https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html||Language:||en||Status of Item:||Not peer reviewed||This item is made available under a Creative Commons License:||https://creativecommons.org/licenses/by-nc-nd/3.0/ie/|
|Appears in Collections:||Computer Science and Informatics Technical Reports|
Show full item record
If you are a publisher or author and have copyright concerns for any item, please email email@example.com and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.