Now showing 1 - 4 of 4
  • Publication
    Hierarchical Bloom Filter Trees for Approximate Matching
    (Journal of Digital Forensics, Security and Law, 2018-01) ; ;
    Bytewise approximate matching algorithms have in recent years shown significant promise in detecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.
      358
  • Publication
    Separation of concerns in hybrid agent and component system
    Modularising requirements is a classic problem of software engineering; concerns often overlap, requiring multiple dimensions of decomposition to achieve separation. Whenever complete modularity is unachievable, it is important to provide principled approaches to the decoupling of concerns. To this end, this paper discusses the Socially Situated Agent Architecture (SoSAA) - a complete construction methodology, which leverages existing well established research and associated methodologies and frameworks in both the Agent-oriented and Component-based Software Engineering domains. As a software framework, SoSAA is primarily intended to serve as a foundation on which to build agent based applications by promoting separation of concerns in the development of open, heterogeneous, adaptive and distributed systems. While previous work has discussed the design rationale for SoSAA and illustrated its application to the construction of multiagent systems, this paper focuses on the separation of concerns issue. It highlights concerns typically addressed in the development of distributed systems, such as adaptation, concurrency, fault-tolerance. It analyses how a hybrid agent/component integration approach can improve the separation of these concerns by leveraging modularity constructs already available in agent and component systems, and sets clear guidelines on where the different concerns must be addressed within the overall architecture. Finally, this paper provides a first evaluation of the application of our framework by applying well- known metrics to a distributed information retrieval case study, and by discussing how this initial results can be projected to a typical multiagent application developed with the same hybrid approach.
    Scopus© Citations 2  916
  • Publication
    EviPlant: An Efficient Digital Forensic Challenge Creation, Manipulation, and Distribution Solution
    (Elsevier, 2017-03-21) ; ;
    Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on behalf of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant resources and time and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely bare operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a diffing of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an evidence package. Evidence packages can be created for different personas, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated with the base images. A number of advantages and additional functionality over the current approaches are discussed that emerge as a result of using EviPlant.
    Scopus© Citations 13  416
  • Publication
    Improving the accuracy of automated facial age estimation to aid CSEM investigations
    The investigation of violent crimes against individuals, such as the investigation of child sexual exploitation material (CSEM), is one of the more commonly encountered criminal investigation types throughout the world. While hash lists of known CSEM content are commonly used to identify previously encountered material on suspects’ devices, previously unencountered material requires expert, manual analysis and categorisation. The discovery, analysis, and categorisation of these digital images and videos has the potential to be significantly expedited with the use of automated artificial intelligence (AI) based techniques. Intelligent, automated evidence processing and prioritisation has the potential to aid investigators in alleviating some of the digital evidence backlogs that have become commonplace worldwide. In order for AI-aided CSEM investigations to be beneficial, the fundamental question when analysing multimedia content becomes “how old is each subject encountered?’’. Our work presents the evaluation of existing cloud-based and offline age estimation services, introduces our deep learning model, DS13K, which was created with a VGG-16 Deep Convolutional Neural Network (CNN) architecture, and develops an ensemble technique that improves the accuracy of underage facial age estimation. In addition to our model, a number of existing services including Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net, and Deep Expectation (DEX) were used to create an ensemble learning technique. It was found that for the borderline adulthood age range (i.e., 16–17 years old), our DS13K model substantially outperformed existing services, achieving a performance accuracy of 68%. A comparative examination of the obtained results allowed us to identify performance trends and issues inherent to each service/tool and develop ensemble techniques to improve the accuracy of automated adulthood determination.
      17