Now showing 1 - 3 of 3
  • Publication
    Geometric Algorithms for Private-Cache Chip Multiprocessors
    We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors.
      359Scopus© Citations 11
  • Publication
    Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures
    (Springer, 2013-09-04) ;
    In this paper, we perform an empirical evaluation of the Parallel External Memory (PEM) model in the context of geometric problems. In particular, we implement the parallel distribution sweeping framework of Ajwani, Sitchinava and Zeh to solve batched 1-dimensional stabbing max problem. While modern processors consist of sophisticated memory systems (multiple levels of caches, set associativity, TLB, prefetching), we empirically show that algorithms designed in simple models, that focus on minimizing the I/O transfers between shared memory and single level cache, can lead to efficient software on current multicore architectures. Our implementation exhibits significantly fewer accesses to slow DRAM and, therefore, outperforms traditional approaches based on plane sweep and two-way divide and conquer.
      289Scopus© Citations 2
  • Publication
    I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors
    The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of O(sort P(N) + K/PB) for a number of problems on axis-aligned objects, P denotes the number of cores/processors, B denotes the number of elements that fit in a cache line, N and K denote the sizes of the input and output, respectively, and sort P(N) denotes the I/O complexity of sorting N items using P processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of O((N + K)/PB), where K is the sum of the counts of all the ranges. The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.
      452Scopus© Citations 4