Hierarchical Bloom Filter Trees for Approximate Matching

Files in This Item:
File Description SizeFormat 
jdfsl.pdf349.54 kBAdobe PDFDownload
Title: Hierarchical Bloom Filter Trees for Approximate Matching
Authors: Lillis, David
Breitinger, Frank
Scanlon, Mark
Permanent link: http://hdl.handle.net/10197/9694
Date: Jan-2018
Online since: 2019-03-26T12:48:51Z
Abstract: Bytewise approximate matching algorithms have in recent years shown significant promise in detecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.
Type of material: Journal Article
Publisher: Journal of Digital Forensics, Security and Law
Journal: Journal of Association of Digital Forensics, Security and Law
Volume: 13
Issue: 1
Start page: 80
End page: 96
Copyright (published version): 2018 ADFSL
Keywords: Approximate matchingHierarchical bloom filter treesmrsh-v2
DOI: 10.15394/jdfsl.2018.1489
Language: en
Status of Item: Peer reviewed
Appears in Collections:Computer Science Research Collection

Show full item record

Google ScholarTM

Check

Altmetric


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.