Now showing 1 - 8 of 8
  • Publication
    Current Challenges and Future Research Areas for Digital Forensic Investigation
    Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout the world. It can be anticipated that the number of cases requiring digital forensic analysis will greatly increase in the future. It is also likely that each case will require the analysis of an increasing number of devices including computers, smartphones, tablets, cloud-based services, Internet of Things devices, wearables, etc. The variety of new digital evidence sources poses new and challenging problems for the digital investigator from an identification, acquisition, storage and analysis perspective. This paper explores the current challenges contributing to the backlog in digital forensics from a technical standpoint and outlines a number of future research topics that could greatly contribute to a more efficient digital forensic process.
      593
  • Publication
    Hierarchical Bloom Filter Trees for Approximate Matching
    (Journal of Digital Forensics, Security and Law, 2018-01) ; ;
    Bytewise approximate matching algorithms have in recent years shown significant promise in detecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.
      360
  • Publication
    EviPlant: An Efficient Digital Forensic Challenge Creation, Manipulation, and Distribution Solution
    (Elsevier, 2017-03-21) ; ;
    Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on behalf of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant resources and time and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely bare operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a diffing of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an evidence package. Evidence packages can be created for different personas, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated with the base images. A number of advantages and additional functionality over the current approaches are discussed that emerge as a result of using EviPlant.
      423Scopus© Citations 14
  • Publication
    Assessing the Influencing Factors on the Accuracy of Underage Facial Age Estimation
    Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic material. Automated techniques also show promise in decreasing the overflowing backlog of evidence obtained from increasing numbers of devices and online services. A lack of sufficient training data combined with natural human variance has been long hindering accurate automated age estimation - especially for underage subjects. This paper presented a comprehensive evaluation of the performance of two cloud age estimation services (Amazon Web Service's Rekognition service and Microsoft Azure's Face API) against a dataset of over 21,800 underage subjects. The objective of this work is to evaluate the influence that certain human biometric factors, facial expressions, and image quality (i.e. blur, noise, exposure and resolution) have on the outcome of automated age estimation services. A thorough evaluation allows us to identify the most influential factors to be overcome in future age estimation systems.
      40Scopus© Citations 4
  • Publication
    Improving Borderline Adulthood Facial Age Estimation through Ensemble Learning
    Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 years. The weakness of the algorithms specifically in the borderline has been a motivation for this paper. In our approach, we have developed an ensemble technique that improves the accuracy of underage estimation in conjunction with our deep learning model (DS13K) that has been fine-tuned on the Deep Expectation (DEX) model. We have achieved an accuracy of 68% for the age group 16 to 17 years old, which is 4 times better than the DEX accuracy for such age range. We also present an evaluation of existing cloud-based and offline facial age prediction services, such as Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net and DEX.
      286Scopus© Citations 12
  • Publication
    Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
    Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.
      24Scopus© Citations 13
  • Publication
    Improving the accuracy of automated facial age estimation to aid CSEM investigations
    The investigation of violent crimes against individuals, such as the investigation of child sexual exploitation material (CSEM), is one of the more commonly encountered criminal investigation types throughout the world. While hash lists of known CSEM content are commonly used to identify previously encountered material on suspects’ devices, previously unencountered material requires expert, manual analysis and categorisation. The discovery, analysis, and categorisation of these digital images and videos has the potential to be significantly expedited with the use of automated artificial intelligence (AI) based techniques. Intelligent, automated evidence processing and prioritisation has the potential to aid investigators in alleviating some of the digital evidence backlogs that have become commonplace worldwide. In order for AI-aided CSEM investigations to be beneficial, the fundamental question when analysing multimedia content becomes “how old is each subject encountered?’’. Our work presents the evaluation of existing cloud-based and offline age estimation services, introduces our deep learning model, DS13K, which was created with a VGG-16 Deep Convolutional Neural Network (CNN) architecture, and develops an ensemble technique that improves the accuracy of underage facial age estimation. In addition to our model, a number of existing services including Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net, and Deep Expectation (DEX) were used to create an ensemble learning technique. It was found that for the borderline adulthood age range (i.e., 16–17 years old), our DS13K model substantially outperformed existing services, achieving a performance accuracy of 68%. A comparative examination of the obtained results allowed us to identify performance trends and issues inherent to each service/tool and develop ensemble techniques to improve the accuracy of automated adulthood determination.
      17
  • Publication
    On the Benefits of Information Retrieval and Information Extraction Techniques Applied to Digital Forensics
    (Springer, 2016-08-30) ;
    Many jurisdictions suffer from lengthy evidence processing backlogs in digital forensics investigations. This has negative consequences for the timely incorporation of digital evidence into criminal investigations, while also affecting the timelines required to bring a case to court. Modern technological advances, in particular the move towards cloud computing, have great potential in expediting the automated processing of digital evidence, thus reducing the manual workload for investigators. It also promises to provide a platform upon which more sophisticated automated techniques may be employed to improve the process further. This paper identifies some research strains from the areas of Information Retrieval and Information Extraction that have the potential to greatly help with the efficiency and effectiveness of digital forensics investigations.
      22Scopus© Citations 4