Batista de Almeida, LeandroLeandroBatista de AlmeidaCunha de Almeida, EduardoEduardoCunha de AlmeidaMurphy, JohnJohnMurphyDe Grande, Robson E.Robson E.De GrandeVentresque, AnthonyAnthonyVentresque2019-05-222019-05-222018 IEEE2018-10-17http://hdl.handle.net/10197/10594The 2018 IEEE/ACM 22nd International Symposium on Distributed Simulation and Real Time Applications (DS-RT)Big Data platforms are convoluted distributed systems which commonly comprise skill- and labour-intensive solution development to treat inherent Big Data application challenges. Several tools have been proposed to help developers and engineers to overcome the involved complexities in coordinating the execution of plenty processes/threads on multiple machines. However, no work so far has been able to combine both an accurate representation of Big Data jobs and realistic modeling of the behaviour of Big Data platforms at scale, including networking elements and data and job placement. In this paper, we propose BigDataNetSim, the first simulator which models accurately all the main components of the data movements in Big Data platforms (e.g., HDFS, YARN/MapReduce, network topologies, switching/routing protocols) in a large scale system. BigDataNetSim can serve as a valuable tool for engineering Big Data solutions, which includes set-up of systems, prototyping of jobs, and improvement of components/algorithms for Big Data platforms. We also demonstrate that BigDataNetSim can simulate a real Hadoop cluster with a high degree of accuracy in terms of data and job placements, being able to scale up to very large systems.en© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Big dataHadoopSimulationTask analysisData modelsToolsNetwork topologyProtocolsYARNBigDataNetSim: A Simulator for Data and Process Placement in Large Big Data PlatformsConference Publication10.1109/DISTRA.2018.8601018978-1-5386-5048-62019-02-09https://creativecommons.org/licenses/by-nc-nd/3.0/ie/