Now showing 1 - 10 of 15
  • Publication
    SmartGridRPC: The new RPC model for high performance Grid computing
    (University College Dublin. School of Computer Science and Informatics, 2009-10) ; ; ; ;
    The paper presents the SmartGridRPC model, an extension of the GridRPC model, which aims to achieve higher performance. The traditional GridRPC provides a programming model and API for mapping individual tasks of an application in a distributed Grid environment, which is based on the client-server model characterised by the star network topology. SmartGridRPC provides a programming model and API for mapping a group of tasks of an application in a distributed Grid environment, which is based on the fully connected network topology. The SmartGridRPC programming model and API, its implementation in SmartGridSolve and its performance advantages over the GridRPC model are outlined in this paper. In addition, experimental results using a real-world application are also presented.
      122
  • Publication
    A Comparative Study of Methods for Measurement of Energy of Computing
    Energy of computing is a serious environmental concern and mitigating it is an important technological challenge. Accurate measurement of energy consumption during an application execution is key to application-level energy minimization techniques. There are three popular approaches to providing it: (a) System-level physical measurements using external power meters; (b) Measurements using on-chip power sensors and (c) Energy predictive models. In this work, we present a comprehensive study comparing the accuracy of state-of-the-art on-chip power sensors and energy predictive models against system-level physical measurements using external power meters, which we consider to be the ground truth. We show that the average error of the dynamic energy profiles obtained using on-chip power sensors can be as high as 73% and the maximum reaches 300% for two scientific applications, matrix-matrix multiplication and 2D fast Fourier transform for a wide range of problem sizes. The applications are executed on three modern Intel multicore CPUs, two Nvidia GPUs and an Intel Xeon Phi accelerator. The average error of the energy predictive models employing performance monitoring counters (PMCs) as predictor variables can be as high as 32% and the maximum reaches 100% for a diverse set of seventeen benchmarks executed on two Intel multicore CPUs (one Haswell and the other Skylake). We also demonstrate that using inaccurate energy measurements provided by on-chip sensors for dynamic energy optimization can result in significant energy losses up to 84%. We show that, owing to the nature of the deviations of the energy measurements provided by on-chip sensors from the ground truth, calibration can not improve the accuracy of the on-chip sensors to an extent that can allow them to be used in optimization of applications for dynamic energy. Finally, we present the lessons learned, our recommendations for the use of on-chip sensors and energy predictive models and future directions.
      434Scopus© Citations 43
  • Publication
    Grid-Enabled Hydropad: a Scientific Application for Benchmarking GridRPC-Based Programming Systems
    (University College Dublin. School of Computer Science and Informatics, 2008-12-12) ;
    GridRPC is a standard API that allows an application to easily interface with a Grid environment. It implements a remote procedure call with a single task map and client-server communication model. In addition to non-performance-related benefits, scientific applications having large computation and small communication tasks can also obtain important performance gains by being implemented in GridPRC. However, such convenient applications are not representative of the majority of scientific applications and therefore cannot serve as fair benchmarks for comparison of the performance of different GridRPC-based systems. In this paper, we present Hydropad, a real life astrophysical simulation, which is composed of tasks that have a balanced ratio between computation and communication. While Hydropad is not the ideal application for performance benefits from its implementation with GridRPC middleware, we show how even its performance can be improved by using GridSolve and SmartGridSolve. We believe that the Grid-enabled Hydropad is a good candidate application to benchmark GridRPC-based programming systems in order to justify their use for high performance scientific computing.
      90
  • Publication
    Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters
    We present a new data partitioning strategy for parallel computing on three interconnected clusters. This partitioning has two advantages over existing partitionings. First it can reduce communication time due to a lower total volume of communication and a more efficient communication schedule. When the network topology is a linear array this partitioning always results in a lower total volume of communication compared to existing partitionings, provided the most powerful node is at the center of the array. When the topology is fully connected this partitioning results in a lower total volume of communication for all but a few power ratios. Second, it allows for the overlapping of communication and computation. These two inherent advantages work together to reduce overall execution time significantly.
    Scopus© Citations 12  318
  • Publication
    A Parallel Algorithm for the Solution of the Deconvolution Problem in Heterogeneous Networks
    (University College Dublin. School of Computer Science and Informatics, 2005-12-19) ; ;
    In this work we present two parallel algorithms for the solution of a given least squares problem with structured matrices. This problem arises in many applications most related to digital signal processing, an example is given. Both parallel algorithms have been designed to speed–up the sequential one in a heterogeneous network of computers. They differ from the approximation followed to implement parallel algorithms on heterogeneous networks of computers known as HeHo and HoHe strategies. However, our study goes beyond the practical usefulness of our heterogeneous parallel application. One one hand, the results obtained validates the recent developed HeteroMPI as a very useful tool for programming heterogeneous parallel algorithms. On the other hand, although HeteroMPI has initially been designed to apply the HeHo strategy, we propose a way this tool can be used in the HoHe strategy. Pros and cons of the use of HeteroMPI for both strategies will be deeply study through the application example.
      111
  • Publication
    Matrix Multiplication on Two Interconnected Processors
    This paper presents a new partitioning algorithm to perform matrix multiplication on two interconnected heterogeneous processors. Data is partitioned in a way which minimizes the total volume of communication between the processors compared to more general partitionings, resulting in a lower total execution time whenever the power ratio between the processors is greater than 3:1. The algorithm has interesting and important applicability, particularly as the top-level partitioning in a hierarchal algorithm that is to perform matrix multiplication on two interconnected clusters of computers.
    Scopus© Citations 9  359
  • Publication
    A Novel Statistical Learning-Based Methodology for Measuring the Goodness of Energy Profiles of Applications Executing on Multicore Computing Platforms
    Accurate energy profiles are essential to the optimization of parallel applications for energy through workload distribution. Since there are many model-based methods available for efficient construction of energy profiles, we need an approach to measure the goodness of the profiles compared with the ground-truth profile, which is usually built by a time-consuming but reliable method. Correlation coefficient and relative error are two such popular statistical approaches, but they assume that profiles be linear or at least very smooth functions of workload size. This assumption does not hold true in the multicore era. Due to the complex shapes of energy profiles of applications on modern multicore platforms, the statistical methods can often rank inaccurate energy profiles higher than more accurate ones and employing such profiles in the energy optimization loop of an application leads to significant energy losses (up to 54% in our case). In this work, we present the first method specifically designed for goodness measurement of energy profiles. First, it analyses the underlying energy consumption trend of each energy profile and removes the profiles that exhibit a trend different from that of the ground truth. Then, it ranks the remaining energy profiles using the Euclidean distances as a metric. We demonstrate that the proposed method is more accurate than the statistical approaches and can save a significant amount of energy.
    Scopus© Citations 3  103
  • Publication
    Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution
    The problem of matrix partitioning for parallel matrix-matrix multiplication on heterogeneous processors has been extensively studied since the mid 1990s. During this time, previous research focused mainly on the design of efficient partitioning algorithms, optimally or sub-optimally partitioning matrices into rectangles. The optimality of the rectangular partitioning shape itself has never been studied or even seriously questioned. The accepted approach is that consideration of non-rectangular shapes will not significantly improve the optimality of the solution, but can significantly complicate the partitioning problem, which is already NP-complete even for the restricted case of rectangular shapes. There is no published research, however, supporting this approach. The shape of the globally optimal partitioning, and how the best rectangular partitioning compares with this global optimum, are still wide open problems. Solution of these problems will decide if new partitioning algorithms searching for truly optimal, and not necessarily rectangular, solutions are needed. This paper presents the first results of our research on the problem of optimal partitioning shapes for parallel matrix-matrix multiplication on heterogeneous processors. Namely, the case of two interconnected processors is comprehensively studied. We prove that, depending on performance characteristics of the processors and the communication link, the globally optimal partitioning will have one of just two well-specified shapes, one of which is rectangular and the other is non-rectangular. The theoretical analysis is conducted using an original mathematical technique proposed in the paper. It is shown that the technique can also be applied in the case of arbitrary numbers of processors. While comprehensive analysis of the cases of three and more processors is more complicated and the subject for future work, the paper does prove the optimality of some particular non-rectangular partitioning shapes f- r some combinations of performance characteristics of heterogeneous processors and communication links. The paper also presents experimental results demonstrating that the optimal non-rectangular partitioning can significantly outperform the optimal rectangular one on real-life heterogeneous HPC platforms.
    Scopus© Citations 12  333
  • Publication
    A Non-Intrusive and Incremental Approach to Enabling Direct Communications in RPC-based Grid Programming Systems
    (University College Dublin. School of Computer Science and Informatics, 2005-04)
    This paper advocates a non-intrusive and incremental approach to enabling existing Grid programming systems with new features. In particular, it presents a software component enabling NetSolve applications with direct communications between remote tasks. The software component is a supplementary one working on the top of the basic NetSolve system. Its design also allows remote tasks to be freely mixed in a single application, independent on whether each particular task is enabled for direct communications or not. Experiments with this software are also presented.
      91
  • Publication
    Heterogeneous PBLAS: A Set of Parallel Basic Linear Algebra Subprograms for Heterogeneous Computational Clusters
    (University College Dublin. School of Computer Science and Informatics, 2008) ; ;
    We present a package, called Heterogeneous PBLAS (HeteroPBLAS), which is built on top of PBLAS and provides optimized parallel basic linear algebra subprograms for Heterogeneous Computational Clusters. We present the user interface and the software hierarchy of the first research implementation of HeteroPBLAS. This is the first step towards the development of a parallel linear algebra package for Heterogeneous Computational Clusters. We demonstrate the efficiency of the HeteroPBLAS programs on a homogeneous computing cluster and a heterogeneous computing cluster.
      52