Now showing 1 - 6 of 6
No Thumbnail Available
Publication

Graph Partitioning for Reconfigurable Topology

2012-08-16, Ajwani, Deepak, Ali, Shoukat, Morrison, John P.

Optical circuit switches have recently been proposed as a low-cost, low-power and high-bandwidth alternative to electronic switches for the design of high-performance compute clusters. An added advantage of these switches is that they allow for a reconfiguration of the network topology to suit the requirements of the application. To realize the full potential of a high-performance computing system with a reconfigurable interconnect, there is a need to design algorithms for computing a topology that will allow for a high-throughput load distribution, while simultaneously partitioning the computational task graph of the application for the computed topology. In this paper, we propose a new framework that exploits such reconfigurable interconnects to achieve these interdependent goals, i.e., to iteratively co-optimize the network topology configuration, application partitioning and network flow routing to maximize throughput for a given application. We also present a novel way of computing a high-throughput initial topology based on the structural properties of the application to seed our co-optimizing framework. We show the value of our approach on synthetic graphs that emulate the key characteristics of a class of stream computing applications that require high throughput. Our experiments show that the proposed technique is fast and computes high-quality partitions of such graphs for a broad range of hardware parameters that varies the bottleneck from computation to communication.

No Thumbnail Available
Publication

Seeds for a heterogeneous interconnect

2013-05-24, Hackett, Adam, Ajwani, Deepak, Ali, Shoukat, Kirkland, Steve, Morrison, John P.

Traditionally, a parallel application is partitioned, mapped and then routed on a network of compute nodes where the topology of the interconnection network is known beforehand and is homogeneous. However, such homogeneity in interconnects is rarely required or needed for several important classes of applications. Nevertheless such interconnects are designed this way, i.e., with redundant links, to accommodate the communication patterns of a wide range of applications. However, with recent advances in technology for optical circuit switches, it is now possible to construct network with much fewer links, and to make the link endpoints configurable to suit the communication pattern of a given application. While this is economical (saving both links and the power to run them), it raises the difficult problem of how to configure the network and how to reconfigure it quickly when the application's communication pattern changes. Since the space of all configurable topologies is large and determining the quality of a topology is a time-consuming process, it is not feasible to explore the entire space. One way of dealing with this limitation is to start the search from a "good" initial topology and then conduct a restricted search around it. The success of such a strategy crucially depends on the choice of the initial or seed topology. In the past, such an initial topology was computed by mimicking the communication requirements of the application. In this paper, we propose a different approach by showing that interconnect topologies such as chordal rings(circulant graphs) chosen based on metrics such as bisection width and average shortest path length can provide a better starting point. The topology obtained by searching around such an initial topology provides almost as good a performance as an application-specific initial topology, and the search time is significantly reduced.

No Thumbnail Available
Publication

A Network Configuration Algorithm Based on Optimization of Kirchhoff Index

2013-07-30, Hackett, Adam, Ajwani, Deepak, Ali, Shoukat, et al.

Traditionally, a parallel application is partitioned, mapped and then routed on a network of compute nodes where the topology of the interconnection network is fixed and known beforehand. Such a topology often comes with redundant links to accommodate the communication patterns of a wide range of applications. With recent advances in technology for optical circuit switches, it is now possible to construct a network with much fewer links, and to make the link endpoints configurable to suit the communication pattern of a given application. While this is economical (saving both links and the power to run them), it raises the difficult problem of how to configure the network and how to reconfigure it quickly when the application's communication pattern changes. In this paper, we propose the Kirchhoff index (KI) of a certain weighted graph related to the interconnection network as a proxy for its communication throughput. Our usage of this metric is based on a theoretical analogy between resistances in an electrical network and communication loads in the interconnection network. We show how mathematical techniques for reducing KI can be used to configure a network in a dramatically shorter time as compared to the current state-of-the-art scheme.

No Thumbnail Available
Publication

Generating synthetic task graphs for simulating stream computing systems

2013-10, Ajwani, Deepak, Ali, Shoukat, Katrinis, Kostas, et al.

Stream-computing is an emerging computational model for performing complex operations on and across multi-source, high-volume data flows. The pool of mature publicly available applications employing this model is fairly small, and therefore the availability of workloads for various types of applications is scarce. Thus, there is a need for synthetic generation of large-scale workloads to drive simulations and estimate the performance of stream-computing applications at scale. We identify the key properties shared by most task graphs of stream-computing applications and use them to extend known random graph generation concepts with stream computing specific features, providing researchers with realistic input stream graphs. Our graph generation techniques serve the purpose of covering a disparity of potential applications and user input. Our first "domain-specific" framework exhibits high user-controlled configurability while the second "application- agnostic" framework focuses solely on emulating the key properties of general stream-computing systems, at the loss of domain-specific fine-tuning. © 2013 Elsevier Inc. All rights reserved.

No Thumbnail Available
Publication

Preferences in college applications - a nonparametric Bayesian analysis of top-10 rankings

2010-12-10, Ali, Alnur, Murphy, Thomas Brendan, Meila, Marina, Chen, Harr

Applicants to degree courses in Irish colleges and universities rank up to ten degree courses from a list of over five hundred. These data provide a wealth of information concerning applicant degree choices. A Dirichlet process mixture of generalized Mallows models are used to explore data from a cohort of applicants. We find strong and diverse clusters, which in turn gains us important insights into the workings of the system. No previously tried models or analysis technique are able to model the data with comparable accuracy.

No Thumbnail Available
Publication

Co-optimizing application partitioning and network topology for a reconfigurable interconnect

2016-10, Ajwani, Deepak, Hackett, Adam, Ali, Shoukat

To realize the full potential of a high-performance computing system with a reconfigurable interconnect, there is a need to design algorithms for computing a topology that will allow for a high-throughput load distribution, while simultaneously partitioning the computational task graph of the application for the computed topology. In this paper, we propose a new framework that exploits such reconfigurable interconnects to achieve these interdependent goals, i.e., to iteratively co-optimize the network topology configuration, application partitioning and network flow routing to maximize throughput for a given application. We also present a novel way of computing a high-throughput initial topology based on the structural properties of the application to seed our co-optimizing framework. We show the value of our approach on synthetic graphs that emulate the key characteristics of a class of stream computing applications that require high throughput. Our experiments show that the proposed technique is fast and computes high-quality partitions of such graphs for a broad range of hardware parameters that varies the bottleneck from computation to communication. Finally, we show how using a particular topology as a seed to our framework significantly reduces the time to compute the final topology.