Options
Haughton, David
Preferred name
Haughton, David
Official Name
Haughton, David
Research Output
Now showing 1 - 3 of 3
- PublicationGene Tagging and the Data Hiding RateWe analyze the maximum number of ways in which one can intrinsically tag a very particular kind of digital asset: a gene, which is just a DNA sequence that encodes a protein. We consider gene tagging under the most relevant biological constraints: protein encoding preservation with and without codon count preservation. We show that our finite and deterministic combinatorial results are asymptotically—as the length of the gene increases— particular cases of the stochastic Gel’fand and Pinsker capacity formula for communications with side information at the encoder, which lies at the foundations of data hiding theory. This is because gene tagging is a particular case of DNA watermarking.
330 - PublicationPerformance of DNA data embedding algorithms under substitution mutations(2010-12)
; DNA data embedding is a relatively recent area which aims at embedding arbitrary information in deoxyribonucleic acid (DNA) strands. One interesting application of DNA data embedding can be tracing pathways of genetic material in novel ways. This paper explores the decoding performance of several DNA data embedding algorithms proposed in the literature, which are also briefly reviewed. DNA may undergo random mutations, which can cause errors at the decoding stage of such algorithms. Although some proposed methods do account for such errors, decoding performance under mutations has not been previously studied in general. The empirical performance comparison that we provide here allows to fairly compare a number of DNA data embedding algorithms under mutations for the first time. The evaluation is undertaken by means of Monte Carlo simulations. Additionally, we propose two new DNA data embedding algorithms with good robustness properties.631 - PublicationRepetition coding as an effective error correction code for embedding information in DNAThe goal of DNA data embedding is to enable robust encoding of non-genetic information in DNA. This field straddles the areas of bioinformatics and digital communications, since DNA mutations can be seen as akin to a noisy channel from the point of view of information encoding. In this paper we present two algorithms which, building on a variant of a method proposed by Yachie et al., rely on repetition coding to effectively counteract the impact that mutations have on an embedded message. The algorithms are designed for resynchronising multiple, originally identical, information encoded DNA sequences, embedded within non-coding DNA (ncDNA) sections of a host genome. They use both BLAST and MUSCLE algorithms to accomplish this. Bit error rates at the decoder are established for mutations rates accumulated over a number of generations of the host organism. The empirical results obtained are compared to a theoretical bound for optimal decoding.
282Scopus© Citations 11