Now showing 1 - 7 of 7
  • Publication
    Multi-Layer-Mesh: A Novel Topology and SDN-based Path Switching for Big Data Cluster Networks
    Big Data technologies and tools have being used for the past decade to solve several scientific and industry problems, with Hadoop/YARN becoming the ”de facto” standard for these applications, although other technologies run on top of it. As any other distributed application, those big data technologies rely heavily on the network infrastructure to read and move data from hundreds or thousands of cluster nodes. Although these technologies are based on reliable and efficient distributed algorithms, there are scenarios and conditions that can generate bottlenecks and inefficiencies, i.e., when a high number of concurrent users creates data access contention. In this paper, we propose a novel network topology called MultiLayer-Mesh and a path switching algorithm based on SDN, that can increase the performance of a big data cluster while reducing the amount of utilized resources (network equipment), in turn reducing the energy and cooling consumption. A thorough simulation-based evaluation of our algorithms shows an average improvement in performance of 31.77% and an average decrease in resource utilization of 36.03% compared to a traditional SpineLeaf topology, in the selected test scenarios.
      516Scopus© Citations 4
  • Publication
    Scalable Correlation-aware Virtual Machine Consolidation Using Two-phase Clustering
    (Institute of Electrical and Electronic Engineers (IEEE), 2015-07-24) ; ; ;
    Server consolidation is the most common and effective method to save energy and increase resource utilization in data centers, and virtual machine (VM) placement is the usual way of achieving server consolidation. VM placement is however challenging given the scale of IT infrastructures nowadays and the risk of resource contention among co-located VMs after consolidation. Therefore, the correlation among VMs to be co-located need to be considered. However, existing solutions do not address the scalability issue that arises once the number of VMs increases to an order of magnitude that makes it unrealistic to calculate the correlation between each pair of VMs. In this paper, we propose a correlation-aware VM consolidation solution ScalCCon1, which uses a novel two-phase clustering scheme to address the aforementioned scalability problem. We propose and demonstrate the benefits of using the two-phase clustering scheme in comparison to solutions using one-phase clustering (up to 84% reduction of execution time when 17, 446 VMs are considered). Moreover, our solution manages to reduce the number of physical machines (PMs) required, as well as the number of performance violations, compared to existing correlation-based approaches.
      585Scopus© Citations 11
  • Publication
    A Fair Comparison of VM Placement Heuristics and a More Effective Solution
    (Institute of Electrical and Electronic Engineers (IEEE), 2014-06-27) ; ; ;
    Data center optimization, mainly through virtual machine (VM) placement, has received considerable attention in the past years. A lot of heuristics have been proposed to give quick and reasonably good solutions to this problem. However it is difficult to compare them as they use different datasets, while the distribution of resources in the datasets has a big impact on the results. In this paper we propose the first benchmark for VM placement heuristics and we define a novel heuristic. Our benchmark is inspired from a real data center and explores different possible demographics of data centers, which makes it suitable when comparing the behaviour of heuristics. Our new algorithm, RBP, outperforms the state-of-the-art heuristics and provides close to optimal results quickly.
    Scopus© Citations 11  473
  • Publication
    iVMp: an Interactive VM Placement Algorithm for Agile Capital Allocation
    (Institute of Electrical and Electronic Engineers (IEEE), 2013-06-03) ; ; ; ;
    Server consolidation is an important problem in any enterprise, where capital allocators (CAs) must approve any cost saving plans involving the acquisition or allocation of new assets and the decommissioning of inefficient assets. Our paper describes iVMp an interactive VM placement algorithm, that allows CAs to become 'agile' capital allocators that can interactively propose and update constraints and preferences as placements are recommended by the system. To the best of our knowledge this is the first time that this interactive VM placement recommendation problem has been addressed in the academic literature. Our results show that the proposed algorithm finds near optimal solutions in a highly efficient manner.
      412Scopus© Citations 6
  • Publication
    ROThAr: Real-time On-line Traffic Assignment with Load Estimation
    (Institute of Electrical and Electronic Engineers (IEEE), 2013-11-01) ; ;
    More and more drivers use on-board units to help them navigate in the increasing urbanised environment they live and work in. These system (e.g., routing applications on smart phones) are now very often on-line, and use information from the traffic situation (e.g., accidents, congestion) to get the best route. We can now envisage a world where all trips are assigned and updated by such an on-line system, making the best routing decisions based on traffic conditions. The problem is that current systems consider only 'local' elements (e.g., driver preference and current traffic condition) and do not make routing decisions from a global perspective. This can lead to a lot of similar routing assignments that could lead to further traffic congestion. The objective of the next generation on-line navigation systems is then to come up with a 'smart', real-time route assignment, which balances the load between the different road segments and offers the best quality to the drivers. However, every routing decision made has an impact on the traffic conditions (one more vehicle on the road segments selected) and computing the load induced by the trips is a computationally heavy problem. This paper addresses this question of real-time on-line traffic assignment, and shows that under certain conditions it is possible to have (i) an accurate estimation of the load and travel time on every road segment and (ii) an optimised traffic assignment that adapts to divergence and evolutions (e.g., accidents) of the system.
    Scopus© Citations 9  420
  • Publication
    SOC: Satisfaction-Oriented Virtual Machine Consolidation in Enterprise Data Centers
    Server sprawl is a problem faced by data centers, which causes unnecessary waste of hardware resources, collateral costs of space, power and cooling systems, and administration. This is usually combated by virtualization based consolidation, and both industry and academia have put many efforts into solving the underlying virtual machine (VM) placement problem. However, IT managers’ preferences are seldom considered when making VM placement decisions. This paper proposes a satisfaction-oriented VM consolidation mechanism (SOC) to plan VM consolidation while taking IT managers’ preferences into consideration. In the mechanism, we propose: (1) an XML-based description language to express managers’ preferences and metrics to evaluate the satisfaction degree; (2) to apply matchmaking to locate entities [i.e., VMs and physical machines (PMs)] that best match each other’s preferences; (3) to employ the VM placement algorithm proposed in our previous work to minimize the number of hosts required and the resource wastage on allocated hosts. SOC is compared with two baselines: placement-only and matchmaking-only. The simulation results show that most of the VM-to-PM mappings output from placement-only violate given preferences, while SOC has a satisfaction degree close to matchmaking-only, without requiring too many PMs as matchmaking-only does, but only an amount close to placement-only. In brief, SOC is effective in minimizing the number of hosts required to support a certain set of VMs, while maximizing the satisfaction degree of both managers from the provider and requester side.
    Scopus© Citations 8  549
  • Publication
    BigDataNetSim: A Simulator for Data and Process Placement in Large Big Data Platforms
    Big Data platforms are convoluted distributed systems which commonly comprise skill- and labour-intensive solution development to treat inherent Big Data application challenges. Several tools have been proposed to help developers and engineers to overcome the involved complexities in coordinating the execution of plenty processes/threads on multiple machines. However, no work so far has been able to combine both an accurate representation of Big Data jobs and realistic modeling of the behaviour of Big Data platforms at scale, including networking elements and data and job placement. In this paper, we propose BigDataNetSim, the first simulator which models accurately all the main components of the data movements in Big Data platforms (e.g., HDFS, YARN/MapReduce, network topologies, switching/routing protocols) in a large scale system. BigDataNetSim can serve as a valuable tool for engineering Big Data solutions, which includes set-up of systems, prototyping of jobs, and improvement of components/algorithms for Big Data platforms. We also demonstrate that BigDataNetSim can simulate a real Hadoop cluster with a high degree of accuracy in terms of data and job placements, being able to scale up to very large systems.
    Scopus© Citations 4  681