Parallel Basic Linear Algebra Subprograms for Heterogeneous Computational Clusters of Multicore Processors
|Title:||Parallel Basic Linear Algebra Subprograms for Heterogeneous Computational Clusters of Multicore Processors||Authors:||Alonso, Pedro; Reddy, Ravi; Lastovetsky, Alexey||Permanent link:||http://hdl.handle.net/10197/12377||Date:||2009||Online since:||2021-08-05T09:47:22Z||Abstract:||In this document, we describe two strategies of distribution of computations that can be used to implement parallel solvers for dense linear algebra problems for Heterogeneous Computational Clusters of Multicore Processors (HCoMs). These strategies are called Heterogeneous Process Distribution Strategy (HPS) and Heterogeneous Data Distribution Strategy (HDS). They are not novel and have already been researched thoroughly. However, the advent of multicores necessitates enhancements to them. We conduct experiments using six applications utilizing the various distribution strategies to perform parallel matrix-matrix multiplication (PMM) on a local HCoM. The first application calls ScaLAPACK PBLAS routine PDGEMM, which uses the traditional homogeneous strategy of distribution of computations. The second application is an MPI application, which utilizes HDS to perform the PMM. The application requires an input, which is the two-dimensional processor grid arrangement to use during the execution of the PMM. The third application is also an MPI application but that uses HPS to perform the PMM. The application requires two inputs, which are the number of threads to run per process and the two-dimensional process grid arrangement to use during the execution of the PMM. The fourth application is the HeteroMPI application using the HDS strategy. It calls the HeteroMPI group management routines to determine the optimal two-dimensional processor grid arrangement and uses it during the execution of the PMM. The fifth application is the HeteroMPI application using the HPS strategy. It calls the HeteroMPI group management routines to determine the optimal twodimensional process grid arrangement and uses it during the execution of the PMM. The final application is the Heterogeneous ScaLAPACK application, which applies the HPS strategy and reuses the ScaLAPACK PBLAS routine PDGEMM. For the last two applications, the number of threads to run per process must be preconfigured. We compare the results of execution of these six applications. The results reveal that the two strategies can compete with each other. The MPI applications employing HDS perform the best since they fully exploit the increased thread-level parallelism (TLP) provided by the multicore processors. However, for large problem sizes, the non-cartesian nature of the data distribution may lead to excessive communications that can be very expensive. For such cases, the HPS strategy has been shown to equal and even out-perform the HDS strategy. We also conclude that HeteroMPI is a valuable tool to implement heterogeneous parallel algorithms on HCoMs because it provides desirable features that determine optimal values of the algorithmic parameters such as the total number of processors and the 2D processor grid arrangement.||Type of material:||Technical Report||Publisher:||University College Dublin. School of Computer Science and Informatics||Series/Report no.:||UCD CSI Technical Reports; ucd-csi-2009-1b||Copyright (published version):||2009 the Authors||Keywords:||High performance computing; Multicore processors; Linear algebra problems; Parallel matrix multiplication algorithms||Other versions:||https://web.archive.org/web/20080226040105/http:/csiweb.ucd.ie/Research/TechnicalReports.html||Language:||en||Status of Item:||Not peer reviewed||This item is made available under a Creative Commons License:||https://creativecommons.org/licenses/by-nc-nd/3.0/ie/|
|Appears in Collections:||Computer Science and Informatics Technical Reports|
Show full item record
If you are a publisher or author and have copyright concerns for any item, please email firstname.lastname@example.org and the item will be withdrawn immediately. The author or person responsible for depositing the article will be contacted within one business day.