COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Ayala-Rivera, VanessaVanessaAyala-RiveraPortillo Dominguez, Andres OmarAndres OmarPortillo DominguezMurphy, Liam, B.E.Liam, B.E.MurphyThorpe, ChristinaChristinaThorpe2017-09-132017-09-132017 Sprin2016-09-16http://hdl.handle.net/10197/8763UNESCO Chair in Data Privacy, International Conference, PSD 2016, Dubrovnik, Croatia, September 14–16, 2016Conducting extensive testing of anonymization techniques is critical to assess their robustness and identify the scenarios where they are most suitable. However, the access to real microdata is highly restricted and the one that is publicly-available is usually anonymized or aggregated; hence, reducing its value for testing purposes. In this paper, we present a framework (COCOA) for the generation of realistic synthetic microdata that allows to define multi-attribute relationships in order to preserve the functional dependencies of the data. We prove how COCOA is useful to strengthen the testing of anonymization techniques by broadening the number and diversity of the test scenarios. Results also show how COCOA is practical to generate large datasets.enThe final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-45381-1_13Synthetic dataAnonymizationTestingData privacyCOCOA: A Synthetic Data Generator for Testing Anonymization TechniquesConference Publication10.1007/978-3-319-45381-1_132017-08-11https://creativecommons.org/licenses/by-nc-nd/3.0/ie/