Now showing 1 - 9 of 9
  • Publication
    I/O-efficient approximation of graph diameters by parallel cluster growing - A first experimental study
    A fundamental step in the analysis of a massive graph is to compute its diameter. In the RAM model, the diameter of a connected undirected unweighted graph can be efficiently 2-approximated using a Breadth-First Search (BFS) traversal from an arbitrary node. However, if the graph is stored on disk, even an external memory BFS traversal is prohibitive, owing to the large number of I/Os it incurs. Meyer (2008) proposed a parametrized algorithm to compute an approximation of graph diameter with fewer I/Os than that required for exact BFS traversal of the graph. The approach is based on growing clusters around randomly chosen vertices `in parallel' until their fringes meet. We present an implementation of this algorithm and compare it with some simple heuristics and external-memory BFS in order to determine the trade-off between the approximation ratio and running-time achievable in practice. Our experiments show that with carefully chosen parameters, the new approach is indeed capable to produce surprisingly good diameter approximations in shorter time. We also confirm experimentally, that there are graph-classes where the parametrized approach runs into bad approximation ratios just as the theoretical analysis in (Meyer, 2008) suggests.
  • Publication
    Breadth first search on massive graphs
    (American Mathematical Society, 2009) ; ; ;
    We consider the problem of Breadth First Search (BFS) traversal on massive sparse undirected graphs. Despite the existence of simple linear time algorithms in the RAM model, it was considered non-viable for massive graphs because of the I/O cost it incurs. Munagala and Ranade [29] and later Mehlhorn and Meyer [27] gave efficient algorithms (refered to as MR BFS and MM BFS, respectively) for computing BFS level decompositions in an external memory model. Ajwani et al. [3] implemented MR BFS and the randomized variant of MM BFS using the external memory library STXXL and gave a comparative study of the two algorithms on various graph classes. In this paper, we review and extend that result demonstrating the effectiveness and viability of the BFS implementations on various other synthetic and real world benchmarks. Furthermore, we present the implementation of the deterministic variant of MM BFS and show that in most cases, it outperforms the randomized variant.
  • Publication
    I/O-efficient Hierarchical Diameter Approximation
    Computing diameters of huge graphs is a key challenge in complex network analysis. As long as the graphs fit into main memory, diameters can be efficiently approximated (and frequently even exactly determined) using heuristics that apply a limited number of BFS traversals. If the input graphs have to be kept and processed on external storage, even a single BFS run may cause an unacceptable amount of time-consuming I/O-operations. Meyer [17] proposed the first parameterized diameter approximation algorithm with fewer I/Os than that required for exact BFS traversal. In this paper we derive hierarchical extensions of this randomized approach and experimentally compare their trade-offs between actually achieved running times and approximation ratios. We show that the hierarchical approach is frequently capable of producing surprisingly good diameter approximations in shorter time than BFS. We also provide theoretical and practical insights into worst-case input classes.
      420Scopus© Citations 3
  • Publication
    Improved external memory BFS implementations
    (Society for Industrial and Applied Mathematics Philadelphia, 2007-01-06) ; ;
    Breadth first search (BPS) traversal on massive graphs in external memory was considered non-viable until recently, because of the large number of I/Os it incurs. Ajwani et al. [3] showed that the randomized variant of the o(n) I/O algorithm of Mehlhorn and Meyer [24] (MM.BFS) can compute the BPS level decomposition for large graphs (around a billion edges) in a few hours for small diameter graphs and a few days for large diameter graphs. We improve upon their implementation of this algorithm by reducing the overhead associated with each BPS level, thereby improving the results for large diameter graphs which are more difficult for BPS traversal in external memory. Also, we present the implementation of the deterministic variant of MM.BFS and show that in most cases, it outperforms the randomized variant. The running time for BPS traversal is further improved with a heuristic that preserves the worst case guarantees of MM_BFS, Together, they reduce the time for BFS on large diameter graphs from days shown in [3] to hours. In particular, on line graphs with random layout on disks, our implementation of the deterministic variant of MM_BFS with the proposed heuristic is more than 75 times faster than the previous best result for the randomized variant of MM.BFS in [3].
  • Publication
    Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design
    Initially used in digital audio players, digital cameras, mobile phones, and USB memory sticks, flash memory may become the dominant form of end-user storage in mobile computing, either completely replacing the magnetic hard disks or being an additional secondary storage. We study the design of algorithms and data structures that can exploit the flash memory devices better. For this, we characterize the performance of NAND flash based storage devices, including many solid state disks. We show that these devices have better random read performance than hard disks, but much worse random write performance. We also analyze the effect of misalignments, aging and past I/O patterns etc. on the performance obtained on these devices. We show that despite the similarities between flash memory and RAM (fast random reads) and between flash disk and hard disk (both are block based devices), the algorithms designed in the RAM model or the external memory model do not realize the full potential of the flash memory devices. We later give some broad guidelines for designing algorithms which can exploit the comparative advantages of both a flash memory device and a hard disk, when used together.
      458Scopus© Citations 36
  • Publication
    An I/O-efficient distance oracle for evolving real-world graphs
    Computing shortest path distance is a fundamental primitive in many graph applications. On graphs that do not fit in the main memory of the computing device, computing such distances requires hours to months even with the best I/O-efficient shortest path implementations. For applications requiring many such shortest path distances, one would ideally like to preprocess the input graph into a space-efficient data structure I/O-efficiently, such that the distance queries can be answered with a small additive distortion using only O(1) I/Os. Furthermore, in a batch setting, one would like to answer O(n) such distance queries in Õ(n/B) I/Os. In this paper, we focus on engineering an I/O-efficient distance oracle for large graphs that model real-world interactions. Our engineered oracle (i) preprocesses graphs with multi-billion edges in less than an hour using a single core of a typical PC, (ii) answers online shortest path queries in milliseconds using a SSD, (iii) answers batched shortest path queries using HDDs with an average time per query of a few microseconds, (iv) results in a highly accurate shortest path estimate and (v) uses space linear in the number of nodes. Our implementation creates small oracle labels (i.e., they can still be kept in internal memory for rather large graphs) but also efficiently handles the case when both the graph and these labels have to reside on external storage. Dynamic settings where new edges are continuously inserted into the graph are efficiently supported, too.
      445Scopus© Citations 2
  • Publication
    A computational study of external-memory BFS algorithms
    Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent external-memory BFS algorithms for general graphs. With our STXXL based implementations exploiting pipelining and disk-parallelism, we were able to compute the BFS level decomposition of a web-crawl based graph of around 130 million nodes and 1.4 billion edges in less than 4 hours using single disk and 2.3 hours using 4 disks. We demonstrate that some rather simple external-memory algorithms perform significantly better (minutes as compared to hours) than internal-memory BFS, even if more than half of the input resides internally.
      453Scopus© Citations 45
  • Publication
    Average-Case Behavior of k-Shortest Path Algorithms
    The k-shortest path problem is a generalization of the fundamental shortest path problem, where the goal is to compute k simple paths from a given source to a target node, in non-decreasing order of their weight. With numerous applications modeling various optimization problems and as a feature in some learning systems, there is a need for efficient algorithms for this problem. Unfortunately, despite many decades of research, the best directed graph algorithm still has a worst-case asymptotic complexity of Õ(k n(n + m)). In contrast to the worst-case complexity, many algorithms have been shown to perform well on small diameter directed graphs in practice. In this paper, we prove that the average-case complexity of the popular Yen’s algorithm on directed random graphs with edge probability p = Ω(log n)/n in the unweighted and uniformly distributed weight setting is O(kmlog n), thus explaining the gap between the worst-case complexity and observed empirical performance. While we also provide a weaker bound of O(kmlog4 n) for sparser graphs with p ≥ 4/n, we show empirical evidence that the stronger bound should also hold in the sparser setting. We then prove that Feng’s directed k-shortest path algorithm computes the second shortest path in expected O(m) time on random graphs with edge probability p = Ω(log n)/n. Empirical evidence suggests that the average-case result for the Feng's algorithm holds even for k > 2 and sparser graphs.
      1019Scopus© Citations 2
  • Publication
    On Computational Models for Flash Memory Devices
    Flash memory-based solid-state disks are fast becoming the dominant form of end-user storage devices, partly even replacing the traditional hard-disks. Existing two-level memory hierarchy models fail to realize the full potential of flash-based storage devices. We propose two new computation models, the general flash model and the unit-cost model, for memory hierarchies involving these devices. Our models are simple enough for meaningful algorithm design and analysis. In particular, we show that a broad range of existing external-memory algorithms and data structures based on the merging paradigm can be adapted efficiently into the unit-cost model. Our experiments show that the theoretical analysis of algorithms on our models corresponds to the empirical behavior of algorithms when using solid-state disks as external memory.
      434Scopus© Citations 14