Options
Haughton, David
Preferred name
Haughton, David
Official Name
Haughton, David
Research Output
Now showing 1 - 8 of 8
No Thumbnail Available
Publication
Security of keyed DNA data embedding
2013-12, Haughton, David, Balado, Félix
The emergent field of DNA data embedding has, in
the past decade, seen the proposal of many methods to encode
data in the genome of living organisms. These algorithms have
primarily focused on incorporating error correction into encoded
data. However the security aspects of data encoded in DNA have
generally been overlooked. As with any data hiding method, information
encoded in DNA can be protected with a secret key known
only to authorized individuals. While there have been suggestions
to incorporate keys, no well defined functions for mapping keys to
parameters have been proposed yet. Furthermore there have been
no security analyses of keyed DNA data embedding methods, in
terms of quantifying how difficult it is for an attacker to find the
key. In this paper, modifications to incorporate secret keys into a
prominent embedding algorithm are proposed, and the security
is analysed under realistic attack scenarios. The algorithm to be
modified, called BioCode ncDNA has been proposed by us, and is
designed for encoding data in vivo (in the cells of living organism)
with high biocompatibility and near-optimum embedding rate.
No Thumbnail Available
Publication
A Modified Watermark Synchronisation Code for Robust Embedding of Data in DNA
2013-05-26, Haughton, David, Balado, Félix
DNA data embedding is a newly emerging field aspiring to encode
data in deoxyribonucleic acid (DNA). DNA is an inherently
digital and noisy medium, undergoing substitution, insertion and
deletion mutations. Hence, encoding information in DNA can be
seen as a particular case of digital communications in which biological
constraints must be observed. In this paper we propose a modification
of Davey and MacKay’s watermark synchronisation code
(unrelated to digital watermarking) to create an encoding procedure
more biocompatible with the host organism than previous methods.
In addition, when combined with a low density parity check (LDPC)
code, the method provides near-optimum error correction. We also
obtain the theoretical embedding capacity of DNA under substitution
mutations for the increased biocompatibility constraint. This result,
along with an existing bound on capacity for insertion and deletion
mutations, is compared to the proposed algorithm’s performance by
means of Monte Carlo simulations
No Thumbnail Available
Publication
Permutation Codes and Steganography
2013-05-26, Balado, Félix, Haughton, David
We show that Slepian’s Variant I permutation codes implement first-order perfect steganography (i.e., histogram-preserving steganography). We give theoretical expressions for the embedding distortion, embedding rate and embedding efficiency of permutation codes in steganography, which demonstrate that these codes conform to prior
analyses of the properties of capacity-achieving perfect stegosystems with a passive warden. We also propose a modification of adaptive arithmetic coding that near optimally implements permutation coding with a low complexity, confirming all our theoretical predictions. Finally we discuss how to control the embedding distortion. Permutation
coding turns out to be akin to Sallee’s model-based steganography, and to supersede both this method and LSB matching.
No Thumbnail Available
Publication
Gene Tagging and the Data Hiding Rate
2012-06-28, Balado, Félix, Haughton, David
We analyze the maximum number of ways in which one can intrinsically tag
a very particular kind of digital asset: a gene, which is just a DNA sequence that encodes
a protein. We consider gene tagging under the most relevant biological constraints:
protein encoding preservation with and without codon count preservation. We show
that our finite and deterministic combinatorial results are asymptotically—as the length
of the gene increases— particular cases of the stochastic Gel’fand and Pinsker capacity
formula for communications with side information at the encoder, which lies at the
foundations of data hiding theory. This is because gene tagging is a particular case of
DNA watermarking.
No Thumbnail Available
Publication
Repetition coding as an effective error correction code for embedding information in DNA
2011-10-24, Haughton, David, Balado, Félix
The goal of DNA data embedding is to enable robust encoding of non-genetic information in DNA. This field straddles the areas of bioinformatics and digital communications, since DNA mutations can be seen as akin to a noisy channel from the point of view of information encoding. In this paper we present two algorithms which, building on a variant of a method proposed by Yachie et al., rely on repetition coding to effectively counteract the impact that mutations have on an embedded message. The algorithms are designed for resynchronising multiple, originally identical, information encoded DNA sequences, embedded within non-coding DNA (ncDNA) sections of a host genome. They use both BLAST and MUSCLE algorithms to accomplish this. Bit error rates at the decoder are established for mutations rates accumulated over a number of generations of the host organism. The empirical results obtained are compared to a theoretical bound for optimal decoding.
No Thumbnail Available
Publication
Optimum Perfect Steganography of Memoryless Sources as a Rate-Distortion Problem
2013-11-20, Balado, Félix, Haughton, David
Slepian’s variant I permutation coding has been recently shown to be a fundamental steganographic tool, as it implements optimum perfect steganography of memoryless sources. Although real host signals are not memoryless, a decorrelating energy-preserving transform can always be applied before a method that assumes a memoryless source, as is usually done in the dual problem of source coding. A further constraint is needed in practice: the information-carrying signal must be close to the
host, according to some distance measure. Thus steganography of memoryless sources using permutation coding is a rate-distortion problem. Here we delve deeper in the study of the embedding distortion of permutation coding, and we show that the rate-distortion tradeoff for partitioned permutation coding is nearoptimum according to the Gel’fand and Pinsker capacity formula.
No Thumbnail Available
Publication
Performance of DNA data embedding algorithms under substitution mutations
2010-12, Haughton, David, Balado, Félix
DNA data embedding is a relatively recent area which aims at embedding arbitrary information in deoxyribonucleic acid (DNA) strands. One interesting application of DNA data embedding can be tracing pathways of genetic material in novel ways. This paper explores the decoding performance of several DNA data embedding algorithms proposed in the literature, which are also briefly reviewed. DNA may undergo random mutations, which can cause errors at the decoding stage of such algorithms. Although some proposed methods do account for such errors, decoding performance under mutations has not been previously studied in general. The empirical performance comparison that we provide here allows to fairly compare a number of DNA data embedding algorithms under mutations for the first time. The evaluation is undertaken by means of Monte Carlo simulations. Additionally, we propose two new DNA data embedding algorithms with good robustness properties.
No Thumbnail Available
Publication
Asymptotically Optimum Perfect Universal Steganography of Finite Memoryless Sources
2018-02, Balado, Félix, Haughton, David
A solution to the problem of asymptotically optimum perfect universal steganography of finite memoryless sources with a passive warden is provided, which is then extended to contemplate a distortion constraint. The solution rests on the fact that Slepian’s Variant I permutation coding implements firstorder perfect universal steganography of finite host signals with optimum embedding rate. The duality between perfect universal steganography with asymptotically optimum embedding rate and lossless universal source coding with asymptotically optimum compression rate is evinced in practice by showing that permutation coding can be implemented by means of adaptive arithmetic coding. Next, a distortion constraint between the host signal and the information-carrying signal is considered. Such a constraint is essential whenever real-world host signals with memory (e.g., images, audio, or video) are decorrelated to conform to the memoryless assumption. The constrained version of the problem requires trading off embedding rate and distortion. Partitioned permutation coding is shown to be a practical way to implement this trade-off, performing close to an unattainable upper bound on the rate-distortion function of the problem.