Now showing 1 - 3 of 3
  • Publication
    Multi-Layer-Mesh: A Novel Topology and SDN-based Path Switching for Big Data Cluster Networks
    Big Data technologies and tools have being used for the past decade to solve several scientific and industry problems, with Hadoop/YARN becoming the ”de facto” standard for these applications, although other technologies run on top of it. As any other distributed application, those big data technologies rely heavily on the network infrastructure to read and move data from hundreds or thousands of cluster nodes. Although these technologies are based on reliable and efficient distributed algorithms, there are scenarios and conditions that can generate bottlenecks and inefficiencies, i.e., when a high number of concurrent users creates data access contention. In this paper, we propose a novel network topology called MultiLayer-Mesh and a path switching algorithm based on SDN, that can increase the performance of a big data cluster while reducing the amount of utilized resources (network equipment), in turn reducing the energy and cooling consumption. A thorough simulation-based evaluation of our algorithms shows an average improvement in performance of 31.77% and an average decrease in resource utilization of 36.03% compared to a traditional SpineLeaf topology, in the selected test scenarios.
    Scopus© Citations 4  522
  • Publication
    BDTest, a System to Test Big Data Frameworks
    Testing Big Data Processing systems is a challenging task as these systems are usually distributed on various virtual machines (potentially hosted by remote servers). In this poster we present a platform for testing non-functional properties of Big Data framework and a first implementation with Hadoop, a well known big data management and processing platform.
      475Scopus© Citations 2
  • Publication
    BigDataNetSim: A Simulator for Data and Process Placement in Large Big Data Platforms
    Big Data platforms are convoluted distributed systems which commonly comprise skill- and labour-intensive solution development to treat inherent Big Data application challenges. Several tools have been proposed to help developers and engineers to overcome the involved complexities in coordinating the execution of plenty processes/threads on multiple machines. However, no work so far has been able to combine both an accurate representation of Big Data jobs and realistic modeling of the behaviour of Big Data platforms at scale, including networking elements and data and job placement. In this paper, we propose BigDataNetSim, the first simulator which models accurately all the main components of the data movements in Big Data platforms (e.g., HDFS, YARN/MapReduce, network topologies, switching/routing protocols) in a large scale system. BigDataNetSim can serve as a valuable tool for engineering Big Data solutions, which includes set-up of systems, prototyping of jobs, and improvement of components/algorithms for Big Data platforms. We also demonstrate that BigDataNetSim can simulate a real Hadoop cluster with a high degree of accuracy in terms of data and job placements, being able to scale up to very large systems.
      683Scopus© Citations 4