Options
Optimal Probe Length Varies for Targets with High Sequence Variation: Implications for Probe Library Design for Resequencing Highly Variable Genes
Date Issued
2008-06
Date Available
2012-12-03T15:01:36Z
Abstract
Sequencing by hybridisation is an effective method for obtaining large amounts of
DNA sequence information at low cost. The efficiency of SBH depends on
the design of the probe library to provide the maximum information for
minimum cost. Long probes provide a higher probability of non-repeated
sequences but lead to an increase in the number of probes required
whereas short probes may not provide unique sequence information due to
repeated sequences. We have investigated the effect of probe length, use
of reference sequences, and thermal filtering on the design of probe
libraries for several highly variable target DNA sequences.
Results
We
designed overlapping probe libraries for a range of highly variable
drug target genes based on known sequence information and develop a
formal terminology to describe probe library design. We find that for
some targets these libraries can provide good coverage of a previously
unseen target whereas for others the coverage is less than 30%. The
optimal probe length varies from as short at 12 nt to as large as 19 nt
and depends on the sequence, its variability, and the stringency of
thermal filtering. It cannot be determined from inspection of an example
gene sequence.
Conclusions
Optimal
probe length and the optimal number of reference sequences used to
design a probe library are highly target specific for highly variable
sequencing targets. The optimum design cannot be determined simply by
inspection of input sequences or of alignments but only by detailed
analysis of the each specific target. For highly variable sequences,
shorter probes can in some cases provide better information than longer
probes. Probe library design would benefit from a general purpose tool
for analysing these issues. The formal terminology developed here and
the analysis approaches it is used to describe will contribute to the
development of such tools.
DNA sequence information at low cost. The efficiency of SBH depends on
the design of the probe library to provide the maximum information for
minimum cost. Long probes provide a higher probability of non-repeated
sequences but lead to an increase in the number of probes required
whereas short probes may not provide unique sequence information due to
repeated sequences. We have investigated the effect of probe length, use
of reference sequences, and thermal filtering on the design of probe
libraries for several highly variable target DNA sequences.
Results
We
designed overlapping probe libraries for a range of highly variable
drug target genes based on known sequence information and develop a
formal terminology to describe probe library design. We find that for
some targets these libraries can provide good coverage of a previously
unseen target whereas for others the coverage is less than 30%. The
optimal probe length varies from as short at 12 nt to as large as 19 nt
and depends on the sequence, its variability, and the stringency of
thermal filtering. It cannot be determined from inspection of an example
gene sequence.
Conclusions
Optimal
probe length and the optimal number of reference sequences used to
design a probe library are highly target specific for highly variable
sequencing targets. The optimum design cannot be determined simply by
inspection of input sequences or of alignments but only by detailed
analysis of the each specific target. For highly variable sequences,
shorter probes can in some cases provide better information than longer
probes. Probe library design would benefit from a general purpose tool
for analysing these issues. The formal terminology developed here and
the analysis approaches it is used to describe will contribute to the
development of such tools.
Other Sponsorship
Research Councils UK Basic Technology Programme
Type of Material
Journal Article
Publisher
PLOS
Journal
PLoS ONE
Volume
3
Issue
6
Start Page
e2500
Copyright (Published Version)
2008 Haslam et al
Subject – LCSH
Nucleotide sequence
Computational biology
Genomics--Methodology
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
Haslam_et_al._-_2008.pdf
Size
277.19 KB
Format
Adobe PDF
Checksum (MD5)
714e6432489e2e05889d704fe2bbf41e
Owning collection