Now showing 1 - 10 of 18
  • Publication
    A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners
    The vast amount of data being collected about individuals has brought new challenges in protecting their privacy when this data is disseminated. As a result, Privacy-Preserving Data Publishing has become an active research area, in which multiple anonymization algorithms have been proposed. However, given the large number of algorithms available and limited information regarding their performance, it is difficult to identify and select the most appropriate algorithm given a particular publishing scenario, especially for practitioners. In this paper, we perform a systematic comparison of three well-known k-anonymization algorithms to measure their efficiency (in terms of resources usage) and their effectiveness (in terms of data utility). We extend the scope of their original evaluation by employing a more comprehensive set of scenarios: different parameters, metrics and datasets. Using publicly available implementations of those algorithms, we conduct a series of experiments and a comprehensive analysis to identify the factors that influence their performance, in order to guide practitioners in the selection of an algorithm. We demonstrate through experimental evaluation, the conditions in which one algorithm outperforms the others for a particular metric, depending on the input dataset and privacy requirements. Our findings motivate the necessity of creating methodologies that provide recommendations about the best algorithm given a particular publishing scenario.
  • Publication
    Synthetic Data Generation using Benerator Tool
    (University College Dublin. School of Computer Science and Informatics, 2013-10-29) ; ; ;
    Datasets of different characteristics are needed by the research community for experimental purposes. However, real data may be difficult to obtain due to privacy concerns. Moreover, real data may not meet specific characteristics which are needed to verify new approaches under certain conditions. Given these limitations, the use of synthetic data is a viable alternative to complement the real data. In this report, we describe the process followed to generate synthetic data using Benerator, a publicly available tool. The results show that the synthetic data preserves a high level of accuracy compared to the original data. The generated datasets correspond to microdata containing records with social, economic and demographic data which mimics the distribution of aggregated statistics from the 2011 Irish Census data.
  • Publication
    Enhancing the utility of anonymized data in privacy-preserving data publishing
    (University College Dublin. School of Computer Science  , 2017)
    The collection, publication, and mining of personal data have become key drivers of innovation and value creation. In this context, it is vital that organizations comply with the pertinent data protection laws to safeguard the privacy of the individuals and prevent the uncontrolled disclosure of their information (especially of sensitive data). However, data anonymization is a time-consuming, error-prone, and complex process that requires a high level of expertise in data privacy and domain knowledge. Otherwise, the quality of the anonymized data and the robustness of its privacy protection would be compromised. This thesis contributes to the area of Privacy-Preserving Data Publishing by proposing a set of techniques that help users to make informed decisions on publishing safe and useful anonymized data, while reducing the expert knowledge and effort required to apply anonymization. In particular, the main contributions of this thesis are: (1) A novel method to evaluate, in an objective, quantifiable, and automatic way, the semantic quality of VGHs for categorical data. By improving the specification of the VGHs, the quality of the anonymized data is also improved. (2) A framework for the automatic construction and multi-dimensional evaluation of VGHs. The aim is to generate VGHs more efficiently and of better quality than when manually done. Moreover, the evaluation of VGHs is enhanced as users can compare VGHs from various perspectives and select the ones that better fit their preferences to drive the anonymization of data. (3) A practical approach for the generation of realistic synthetic datasets which preserves the functional dependencies of the data. The aim is to strengthen the testing of anonymization techniques by broadening the number and diversity of the test scenarios. (4) A conceptual framework that describes a set of relevant elements that underlie the assessment and selection of anonymization algorithms. Also, a systematic comparison and analysis of a set of anonymization algorithms to identify the factors that influence their performance, in order to guide users in the selection of a suitable algorithm.
  • Publication
    Improving the Testing of Clustered Systems Through the Effective Usage of Java Benchmarks
    Nowadays, cluster computing has become a cost-effective and powerful solution for enterprise-level applications. Nevertheless, the usage of this architecture model also increases the complexity of the applications, complicating all activities related to performance optimisation. Thus, many research works have pursued to develop advancements for improving the performance of clusters. Comprehensively evaluating such advancements is key to understand the conditions under which they can be more useful. However, the creation of an appropriate test environment, that is, one which offers different application behaviours (so that the obtained conclusions can be better generalised) is typically an effort-intensive task. To help tackle this problem, this paper presents a tool that helps to decrease the effort and expertise needed to build useful test environments to perform more robust cluster testing. This is achieved by enabling the effective usage of Java Benchmarks to easily create clustered test environments; hence, diversifying the application behaviours that can be evaluated. We also present the results of a practical validation of the proposed tool, where it has been successfully applied to the evaluation of two cluster-related advancements.
  • Publication
    Protecting organizational data confidentiality in the cloud using a high-performance anonymization engine
    Data security remains a top concern for the adoption of cloud-based delivery models, especially in the case of the Software as a Service (SaaS). This concern is primarily caused due to the lack of transparency on how customer data is managed. Clients depend on the security measures implemented by the service providers to keep their information protected. However, not many practical solutions exist to protect data from malicious insiders working for the cloud providers, a factor that represents a high potential for data breaches. This paper presents the High-Performance Anonymization Engine (HPAE), an approach to allow companies to protect their sensitive information from SaaS providers in a public cloud. This approach uses data anonymization to prevent the exposure of sensitive data in its original form, thus reducing the risk for misuses of customer information. This work involved the implementation of a prototype and an experimental validation phase, which assessed the performance of the HPAE in the context of a cloud-based log management service. The results showed that the architecture of the HPAE is a practical solution and can efficiently handle large volumes of data.
  • Publication
    Enhancing the Utility of Anonymized Data by Improving the Quality of Generalization Hierarchies
    The dissemination of textual personal information has become an important driver of innovation. However, due to the possible content of sensitive information, this data must be anonymized. A commonly-used technique to anonymize data is generalization. Nevertheless, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used as poorly-specified VGHs can decrease the usefulness of the resulting data. To tackle this problem, in our previous work we presented the Generalization Semantic Loss (GSL), a metric that captures the quality of categorical VGHs in terms of semantic consistency and taxonomic organization. We validated the accuracy of GSL using an intrinsic evaluation with respect to a gold standard ontology. In this paper, we extend our previous work by conducting an extrinsic evaluation of GSL with respect to the performance that VGHs have in anonymization (using data utility metrics). We show how GSL can be used to perform an a priori assessment of the VGHs¿ effectiveness for anonymization. In this manner, data publishers can quantitatively compare the quality of various VGHs and identify (before anonymization) those that better retain the semantics of the original data. Consequently, the utility of the anonymized datasets can be improved without sacrificing the privacy goal. Our results demonstrate the accuracy of GSL, as the quality of VGHs measured with GSL strongly correlates with the utility of the anonymized data. Results also show the benefits that an a priori VGH assessment strategy brings to the anonymization process in terms of time-savings and a reduction in the dependency on expert knowledge. Finally, GSL also proved to be lightweight in terms of computational resources.
  • Publication
    Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization
    In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.
  • Publication
    "The Grace Period Has Ended": An Approach to Operationalize GDPR Requirements
    The General Data Protection Regulation (GDPR) aims to protect personal data of EU residents and can impose severe sanctions for non-compliance. Organizations are currently implementing various measures to ensure their software systems fulfill GDPR obligations such as identifying a legal basis for data processing or enforcing data anonymization. However, as regulations are formulated vaguely, it is difficult for practitioners to extract and operationalize legal requirements from the GDPR. This paper aims to help organizations understand the data protection obligations imposed by the GDPR and identify measures to ensure compliance. To achieve this goal, we propose GuideMe, a 6-step systematic approach that supports elicitation of solution requirements that link GDPR data protection obligations with the privacy controls that fulfill these obligations and that should be implemented in an organization's software system. We illustrate and evaluate our approach using an example of a university information system. Our results demonstrate that the solution requirements elicited using our approach are aligned with the recommendations of privacy experts and are expressed correctly.
      1038Scopus© Citations 46
  • Publication
    Towards an Efficient Performance Testing Through Dynamic Workload Adaptation
    Performance testing is a critical task to ensure an acceptable user experience with software systems, especially when there are high numbers of concurrent users. Selecting an appropriate test workload is a challenging and time-consuming process that relies heavily on the testers’ expertise. Not only are workloads application-dependent, but also it is usually unclear how large a workload must be to expose any performance issues that exist in an application. Previous research has proposed to dynamically adapt the test workloads in real-time based on the application behavior. By reducing the need for the trial-and-error test cycles required when using static workloads, dynamic workload adaptation can reduce the effort and expertise needed to carry out performance testing. However, such approaches usually require testers to properly configure several parameters in order to be effective in identifying workload-dependent performance bugs, which may hinder their usability among practitioners. To address this issue, this paper examines the different criteria needed to conduct performance testing efficiently using dynamic workload adaptation. We present the results of comprehensively evaluating one such approach, providing insights into how to tune it properly in order to obtain better outcomes based on different scenarios. We also study the effects of varying its configuration and how this can affect the results obtained.
  • Publication
    A Requirements-based Approach for the Evaluation of Emulated IoT Systems
    The Internet of Things (IoT) has become a major technological revolution. Evaluating any IoT advancements comprehensively is critical to understand the conditions under which they can be more useful, as well as to assess the robustness and efficiency of IoT systems to validate them before their deployment in real life. Nevertheless, the creation of an appropriate IoT test environment is a difficult, effort-intensive, and expensive task; typically requiring a significant amount of human effort and physical hardware to build it. To tackle this problem, emulation tools to test IoT devices have been proposed. However, there is a lack of systematic approaches for evaluating IoT emulation environments. In this paper, we present a requirements-based framework to enable the systematic evaluation of the suitability of an emulated IoT environment to fulfil the requirements that secure the quality of an adequate test environment for IoT.
      322Scopus© Citations 1