Now showing 1 - 8 of 8
  • Publication
    Protecting organizational data confidentiality in the cloud using a high-performance anonymization engine
    Data security remains a top concern for the adoption of cloud-based delivery models, especially in the case of the Software as a Service (SaaS). This concern is primarily caused due to the lack of transparency on how customer data is managed. Clients depend on the security measures implemented by the service providers to keep their information protected. However, not many practical solutions exist to protect data from malicious insiders working for the cloud providers, a factor that represents a high potential for data breaches. This paper presents the High-Performance Anonymization Engine (HPAE), an approach to allow companies to protect their sensitive information from SaaS providers in a public cloud. This approach uses data anonymization to prevent the exposure of sensitive data in its original form, thus reducing the risk for misuses of customer information. This work involved the implementation of a prototype and an experimental validation phase, which assessed the performance of the HPAE in the context of a cloud-based log management service. The results showed that the architecture of the HPAE is a practical solution and can efficiently handle large volumes of data.
      122
  • Publication
    Enhancing the Utility of Anonymized Data by Improving the Quality of Generalization Hierarchies
    The dissemination of textual personal information has become an important driver of innovation. However, due to the possible content of sensitive information, this data must be anonymized. A commonly-used technique to anonymize data is generalization. Nevertheless, its effectiveness can be hampered by the Value Generalization Hierarchies (VGHs) used as poorly-specified VGHs can decrease the usefulness of the resulting data. To tackle this problem, in our previous work we presented the Generalization Semantic Loss (GSL), a metric that captures the quality of categorical VGHs in terms of semantic consistency and taxonomic organization. We validated the accuracy of GSL using an intrinsic evaluation with respect to a gold standard ontology. In this paper, we extend our previous work by conducting an extrinsic evaluation of GSL with respect to the performance that VGHs have in anonymization (using data utility metrics). We show how GSL can be used to perform an a priori assessment of the VGHs¿ effectiveness for anonymization. In this manner, data publishers can quantitatively compare the quality of various VGHs and identify (before anonymization) those that better retain the semantics of the original data. Consequently, the utility of the anonymized datasets can be improved without sacrificing the privacy goal. Our results demonstrate the accuracy of GSL, as the quality of VGHs measured with GSL strongly correlates with the utility of the anonymized data. Results also show the benefits that an a priori VGH assessment strategy brings to the anonymization process in terms of time-savings and a reduction in the dependency on expert knowledge. Finally, GSL also proved to be lightweight in terms of computational resources.
      271
  • Publication
    A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners
    The vast amount of data being collected about individuals has brought new challenges in protecting their privacy when this data is disseminated. As a result, Privacy-Preserving Data Publishing has become an active research area, in which multiple anonymization algorithms have been proposed. However, given the large number of algorithms available and limited information regarding their performance, it is difficult to identify and select the most appropriate algorithm given a particular publishing scenario, especially for practitioners. In this paper, we perform a systematic comparison of three well-known k-anonymization algorithms to measure their efficiency (in terms of resources usage) and their effectiveness (in terms of data utility). We extend the scope of their original evaluation by employing a more comprehensive set of scenarios: different parameters, metrics and datasets. Using publicly available implementations of those algorithms, we conduct a series of experiments and a comprehensive analysis to identify the factors that influence their performance, in order to guide practitioners in the selection of an algorithm. We demonstrate through experimental evaluation, the conditions in which one algorithm outperforms the others for a particular metric, depending on the input dataset and privacy requirements. Our findings motivate the necessity of creating methodologies that provide recommendations about the best algorithm given a particular publishing scenario.
      1633
  • Publication
    Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization
    In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.
      100
  • Publication
    Synthetic Data Generation using Benerator Tool
    (University College Dublin. School of Computer Science and Informatics, 2013-10-29) ; ; ;
    Datasets of different characteristics are needed by the research community for experimental purposes. However, real data may be difficult to obtain due to privacy concerns. Moreover, real data may not meet specific characteristics which are needed to verify new approaches under certain conditions. Given these limitations, the use of synthetic data is a viable alternative to complement the real data. In this report, we describe the process followed to generate synthetic data using Benerator, a publicly available tool. The results show that the synthetic data preserves a high level of accuracy compared to the original data. The generated datasets correspond to microdata containing records with social, economic and demographic data which mimics the distribution of aggregated statistics from the 2011 Irish Census data.
      133
  • Publication
    Integration of QoS Metrics, Rules and Semantic Uplift for Advanced IPTV Monitoring
    Increasing and variable traffic demands due to triple play services pose significant Internet Protocol Television (IPTV) resource management challenges for service providers. Managing subscriber expectations via consolidated IPTV quality reporting will play a crucial role in guaranteeing return-on-investment for players in the increasingly competitive IPTV delivery ecosystem. We propose a fault diagnosis and problem isolation solution that addresses the IPTV monitoring challenge and recommends problem-specific remedial action. IPTV delivery-specific metrics are collected at various points in the delivery topology, the residential gateway and the Digital Subscriber Line Access Multiplexer through to the video Head-End. They are then pre-processed using new metric rules. A semantic uplift engine takes these raw metric logs; it then transforms them into World Wide Web Consortium’s standard Resource Description Framework for knowledge representation and annotates them with expert knowledge from the IPTV domain. This system is then integrated with a monitoring visualization framework that displays monitoring events, alarms, and recommends solutions. A suite of IPTV fault scenarios is presented and used to evaluate the feasibility of the solution. We demonstrate that professional service providers can provide timely reports on the quality of IPTV service delivery using this system.
      2199Scopus© Citations 6
  • Publication
    Enabling IPTV service assurance using OpenFlow
    One difficulty facing Internet Protocol Television (IPTV) service providers is the issue of monitoring and managing their service delivery network. An in-depth monitoring regime is required, which performs measurements within different networking devices. When network conditions deteriorate to the point where they could disrupt IPTV services, Network Operators (NOs) can use the measurements as a basis to reconfigure the network with minimal delay. OpenFlow (OF) presents a potential solution to this problem as it provides vendor-neutral access to the packet forwarding interface of the different hardware device types. This work investigates how OF can leverage video packet inspection measurements taken from within the IPTV service delivery network and combine these with of statistics to make decisions regarding routing in order to assure service quality.
      530Scopus© Citations 14
  • Publication
    Experience of developing an openflow SDN prototype for managing IPTV networks
    IPTV is a method of delivering TV content to endusers that is growing in popularity. The implications of poor video quality may ultimately be a loss of revenue for the provider. Hence, it is vital to provide service assurance in these networks. This paper describes our experience of building an IPTV Software Defined Network testbed that can be used to develop and validate new approaches for service assurance in IPTV networks. The testbed is modular and many of the concepts detailed in this tutorial may be applied to the management of other end-to-end services.
      1223Scopus© Citations 9