Now showing 1 - 10 of 18
  • Publication
    Automatic Construction of Generalization Hierarchies for Publishing Anonymized Data
    Concept hierarchies are widely used in multiple fields to carry out data analysis. In data privacy, they are known as Value Generalization Hierarchies (VGHs), and are used by generalization algorithms to dictate the data anonymization. Thus, their proper specification is critical to obtain anonymized data of good quality. The creation and evaluation of VGHs require expert knowledge and a significant amount of manual effort, making these tasks highly error-prone and timeconsuming. In this paper we present AIKA, a knowledge-based framework to automatically construct and evaluate VGHs for the anonymization of categorical data. AIKA integrates ontologies to objectively create and evaluate VGHs. It also implements a multi-dimensional reward function to tailor the VGH evaluation to different use cases. Our experiments show that AIKA improved the creation of VGHs by generating VGHs of good quality in less time than when manually done. Results also showed how the reward function properly captures the desired VGH properties.
      432Scopus© Citations 2
  • Publication
    Improving the Testing of Clustered Systems Through the Effective Usage of Java Benchmarks
    Nowadays, cluster computing has become a cost-effective and powerful solution for enterprise-level applications. Nevertheless, the usage of this architecture model also increases the complexity of the applications, complicating all activities related to performance optimisation. Thus, many research works have pursued to develop advancements for improving the performance of clusters. Comprehensively evaluating such advancements is key to understand the conditions under which they can be more useful. However, the creation of an appropriate test environment, that is, one which offers different application behaviours (so that the obtained conclusions can be better generalised) is typically an effort-intensive task. To help tackle this problem, this paper presents a tool that helps to decrease the effort and expertise needed to build useful test environments to perform more robust cluster testing. This is achieved by enabling the effective usage of Java Benchmarks to easily create clustered test environments; hence, diversifying the application behaviours that can be evaluated. We also present the results of a practical validation of the proposed tool, where it has been successfully applied to the evaluation of two cluster-related advancements.
  • Publication
    Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization
    In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.
  • Publication
    Enhancing the utility of anonymized data in privacy-preserving data publishing
    (University College Dublin. School of Computer Science  , 2017)
    The collection, publication, and mining of personal data have become key drivers of innovation and value creation. In this context, it is vital that organizations comply with the pertinent data protection laws to safeguard the privacy of the individuals and prevent the uncontrolled disclosure of their information (especially of sensitive data). However, data anonymization is a time-consuming, error-prone, and complex process that requires a high level of expertise in data privacy and domain knowledge. Otherwise, the quality of the anonymized data and the robustness of its privacy protection would be compromised. This thesis contributes to the area of Privacy-Preserving Data Publishing by proposing a set of techniques that help users to make informed decisions on publishing safe and useful anonymized data, while reducing the expert knowledge and effort required to apply anonymization. In particular, the main contributions of this thesis are: (1) A novel method to evaluate, in an objective, quantifiable, and automatic way, the semantic quality of VGHs for categorical data. By improving the specification of the VGHs, the quality of the anonymized data is also improved. (2) A framework for the automatic construction and multi-dimensional evaluation of VGHs. The aim is to generate VGHs more efficiently and of better quality than when manually done. Moreover, the evaluation of VGHs is enhanced as users can compare VGHs from various perspectives and select the ones that better fit their preferences to drive the anonymization of data. (3) A practical approach for the generation of realistic synthetic datasets which preserves the functional dependencies of the data. The aim is to strengthen the testing of anonymization techniques by broadening the number and diversity of the test scenarios. (4) A conceptual framework that describes a set of relevant elements that underlie the assessment and selection of anonymization algorithms. Also, a systematic comparison and analysis of a set of anonymization algorithms to identify the factors that influence their performance, in order to guide users in the selection of a suitable algorithm.
  • Publication
    A Requirements-based Approach for the Evaluation of Emulated IoT Systems
    The Internet of Things (IoT) has become a major technological revolution. Evaluating any IoT advancements comprehensively is critical to understand the conditions under which they can be more useful, as well as to assess the robustness and efficiency of IoT systems to validate them before their deployment in real life. Nevertheless, the creation of an appropriate IoT test environment is a difficult, effort-intensive, and expensive task; typically requiring a significant amount of human effort and physical hardware to build it. To tackle this problem, emulation tools to test IoT devices have been proposed. However, there is a lack of systematic approaches for evaluating IoT emulation environments. In this paper, we present a requirements-based framework to enable the systematic evaluation of the suitability of an emulated IoT environment to fulfil the requirements that secure the quality of an adequate test environment for IoT.
      445Scopus© Citations 1
  • Publication
    Improving the Utility of Anonymized Datasets through Dynamic Evaluation of Generalization Hierarchies
    The dissemination of textual personal information has become a key driver for innovation and value creation. However, due to the possible content of sensitive information, this data must be anonymized, which can reduce its usefulness for secondary uses. One of the most used techniques to anonymize data is generalization. However, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used to dictate the anonymization of data, as poorly-specified VGHs can reduce the usefulness of the resulting data. To tackle this problem, we propose a metric for evaluating the quality of textual VGHs used in anonymization. Our evaluation approach considers the semantic properties of VGHs and exploits information from the input datasets to predict with higher accuracy (compared to existing approaches) the potential effectiveness of VGHs for anonymizing data. As a consequence, the utility of the resulting datasets is improved without sacrificing the privacy goal. We also introduce a novel rating scale to classify the quality of the VGHs into categories to facilitate the interpretation of our quality metric for practitioners.
      524Scopus© Citations 1
  • Publication
    Towards an Efficient Log Data Protection in Software Systems through Data Minimization and Anonymization
    IT infrastructures of companies generate large amounts of log data every day. These logs are typically analyzed by software engineers to gain insights about activities occurring within a company (e.g., to debug issues exhibited by the production systems). To facilitate this process, log data management is often outsourced to cloud providers. However, logs may contain information that is sensitive by nature and considered personal identifiable under most of the new privacy protection laws, such as the European General Data Protection Regulation (GDPR). To ensure that companies do not violate regulatory compliance, they must adopt, in their software systems, appropriate data protection measures. Such privacy protection laws also promote the use of anonymization techniques as possible mechanisms to operationalize data protection. However, companies struggle to put anonymization in practice due to the lack of integrated, intuitive, and easy-to-use tools that accommodate effectively with their log management systems. In this paper, we propose an automatic approach (SafeLog) to filter out information and anonymize log streams to safeguard the confidentiality of sensitive data and prevent its exposure and misuse from third parties. Our results show that atomic anonymization operations can be effectively applied to log streams to preserve the confidentiality of information, while still allowing to conduct different types of analysis tasks such as users behavior, and anomaly detection. Our approach also reduces the amount of data sent to cloud vendors, hence decreasing the financial costs and the risk of overexposing information.
      395Scopus© Citations 3
  • Publication
    One Size Does Not Fit All: In-Test Workload Adaptation for Performance Testing of Enterprise Applications
    Carrying out proper performance testing is considerably challenging .In particular, the identification of performance issues, as well as their root causes, is a time-consuming and complex process which typically requires several iterations of tests (as this type of issue scan depend on the input workloads), and heavily relies on human expert knowledge. To improve this process, this paper presents an automated approach (that extends some of our previous work) to dynamically adapt the workload (used by a performance testing tool) during the test runs. As a result, the performance issues of the tested application can be revealed more quickly; hence, identifying them with less effort and expertise. Our experimental evaluation has assessed the accuracy of the proposed approach and the time savings that it brings to testers. The results have demonstrated the benefits of the approach by achieving a significant decrease in the time invested in performance testing (without compromising the accuracy of the test results), while introducing a low overhead in the testing environment.
      423Scopus© Citations 6
  • Publication
    DYNAMOJM: A JMeter Tool for Performance Testing Using Dynamic Workload Adaptation
    Performance testing is a critical task to assure optimal experience for users, especially when there are high loads of concurrent users. JMeter is one of the most widely used tools for load and stress testing. With JMeter, it is possible to test the performance of static and dynamic resources on the web. This paper presents DYNAMOJM, a novel tool built on top of JMeter that enables testers to create a dynamic workload for performance testing. This tool implements the DYNAMO approach, which has proven useful to find performance issues more efficiently than static testing techniques.
  • Publication
    Enhancing the Utility of Anonymized Data by Improving the Quality of Generalization Hierarchies
    The dissemination of textual personal information has become an important driver of innovation. However, due to the possible content of sensitive information, this data must be anonymized. A commonly-used technique to anonymize data is generalization. Nevertheless, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used as poorly-specified VGHs can decrease the usefulness of the resulting data. To tackle this problem, in our previous work we presented the Generalization Semantic Loss (GSL), a metric that captures the quality of categorical VGHs in terms of semantic consistency and taxonomic organization. We validated the accuracy of GSL using an intrinsic evaluation with respect to a gold standard ontology. In this paper, we extend our previous work by conducting an extrinsic evaluation of GSL with respect to the performance that VGHs have in anonymization (using data utility metrics). We show how GSL can be used to perform an a priori assessment of the VGHs¿ effectiveness for anonymization. In this manner, data publishers can quantitatively compare the quality of various VGHs and identify (before anonymization) those that better retain the semantics of the original data. Consequently, the utility of the anonymized datasets can be improved without sacrificing the privacy goal. Our results demonstrate the accuracy of GSL, as the quality of VGHs measured with GSL strongly correlates with the utility of the anonymized data. Results also show the benefits that an a priori VGH assessment strategy brings to the anonymization process in terms of time-savings and a reduction in the dependency on expert knowledge. Finally, GSL also proved to be lightweight in terms of computational resources.