ࡱ >
bjbj @
]# ]# ]# ]# ]# q# q# q# 8 # ]$ q# I A' - - - - - - - I "I "I "I "I "I "I $ 'L N FI ]# . - - . . FI ]# ]# - - [I 2 2 2 . R ]# - ]# - |D 2 . I 2 2 ,@ ,A - #@ q# . j @ hD qI 0 I @ \ mP ^/ mP ( ,A ,A P mP ]# |A - 0 . " 2 6. R. P - - - FI FI 1 - - - I . . . . mP - - - - - - - - - #" : SLiMSearch: a webserver for finding noveloccurrences of short linear motifs in proteins,incorporating sequence context
Norman E. Davey1, Niall J. Haslam2, Denis C. Shields2 and Richard J. Edwards3,
1 Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany, 1 School of Medicine and Medical Sciences, UCD Complex and Adaptive Systems Laboratory & UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, Ireland, 3 School of Biological Sciences, University of Southampton, Southampton, UK,
HYPERLINK "mailto:{davey}@embl.de" {davey}@embl.de, {niall.haslam, HYPERLINK "mailto:denis.shields}@ucd.ie" denis.shields}@ucd.ie, {r.edwards}@southampton.ac.uk
Abstract. Short, linear motifs (SLiMs) play a critical role in many biological processes. The SLiMSearch (Short, Linear Motif Search) webserver is a flexible tool that enables researchers to identify novel occurrences of pre-defined SLiMs in sets of proteins. Numerous masking options give the user great control over the contextual information to be included in the analyses, including evolutionary filtering and protein structural disorder. User-friendly output and visualizations of motif context allow the user to quickly gain insight into the validity of a putatively functional motif occurrence. Users can search motifs against the human proteome, or submit their own datasets of UniProt proteins, in which case motif support within the dataset is statistically assessed for over- and under-representation, accounting for evolutionary relationships between input proteins. SLiMSearch is freely available as open source Python modules and all webserver results are available for download. The SLiMSearch server is available at: HYPERLINK "http://bioware.ucd.ie/slimsearch.html" http://bioware.ucd.ie/slimsearch.html.
Keywords: short linear motif, motif discovery, minimotif, elm,
Introduction
The purpose of the SLiMSearch (Short, Linear Motif Search) webserver is to allow researchers to identify novel occurrences of pre-defined Short Linear Motifs (SLiMs) in a set of sequences. SLiMs, also referred to as linear motifs or minimotifs, are functional microdomains that play a central role in many diverse biological pathways ADDIN EN.CITE Diella200827[1]272717Diella, F.Haslam, N.Chica, C.Budd, A.Michael, S.Brown, N. P.Trave, G.Gibson, T. J.Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.Understanding eukaryotic linear motifs and their role in cell signaling and regulationFront BiosciFront Biosci6580-60313200818508681http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18508681 [1]. SLiM-mediated biological processes include post-translational modification (including cleavage), subcellular localization, and ligand binding ADDIN EN.CITE ADDIN EN.CITE.DATA [2]. SLiMs are typically less than ten amino acids long and have less than five defined positions, many of which will be degenerate and incorporate some degree of flexibility in terms of the amino acid at that position. Their length and degeneracy gives them an evolutionary plasticity which is unavailable to domains meaning that they will often evolve convergently, adding new functionality to proteins ADDIN EN.CITE Diella200827[1]272717Diella, F.Haslam, N.Chica, C.Budd, A.Michael, S.Brown, N. P.Trave, G.Gibson, T. J.Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.Understanding eukaryotic linear motifs and their role in cell signaling and regulationFront BiosciFront Biosci6580-60313200818508681http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18508681 [1]. SLiMs hold great promise as future therapeutic targets, which makes their discovery of great interest ADDIN EN.CITE ADDIN EN.CITE.DATA [3, 4].
Once a SLiM has been defined, finding matches a given set of protein sequences is a fairly trivial task. Finding biological motifs is a standard pattern recognition task in bioinformatics. Several web-based methods to discover novel instances of known SLiMs are available, including ELM ADDIN EN.CITE ADDIN EN.CITE.DATA [2], MnM ADDIN EN.CITE Rajasekaran2009115[5]11511517Rajasekaran, S.Balla, S.Gradie, P.Gryk, M. R.Kadaveru, K.Kundeti, V.Maciejewski, M. W.Mi, T.Rubino, N.Vyas, J.Schiller, M. R.Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, USA. rajasek@engr.uconn.eduMinimotif miner 2nd release: a database and web system for motif searchNucleic Acids ResNucleic Acids ResD185-9037Database issue2008/11/04*Amino Acid Motifs/genetics*Databases, ProteinInternetPolymorphism, Single NucleotidePrions*Protein Interaction Domains and MotifsSequence Analysis, ProteinUser-Computer Interface2009Jan1362-4962 (Electronic)
0305-1048 (Linking)18978024http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18978024gkn865 [pii]
10.1093/nar/gkn865eng[5], ScanProsite ADDIN EN.CITE de Castro2006122[6]12212217de Castro, E.Sigrist, C. J.Gattiker, A.Bulliard, V.Langendijk-Genevaux, P. S.Gasteiger, E.Bairoch, A.Hulo, N.Swiss Institute of Bioinformatics, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland. ecastro@isb-sib.chScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteinsNucleic Acids ResNucleic Acids ResW362-534Web Server issue2006/07/18Amino Acids/*chemistryDatabases, ProteinInternet*Protein Structure, TertiaryProteins/chemistrySequence Analysis, Protein/*methodsSequence Homology, Amino Acid*SoftwareUser-Computer Interface2006Jul 11362-4962 (Electronic)
0305-1048 (Linking)16845026http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16845026153884734/suppl_2/W362 [pii]
10.1093/nar/gkl124eng[6] and QuasiMotifFinder ADDIN EN.CITE Gutman200545[7]454517Gutman, R.Berezin, C.Wollman, R.Rosenberg, Y.Ben-Tal, N.Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel.QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patternsNucleic Acids ResNucleic Acids ResW255-6133Web Server issue*Amino Acid MotifsAnimalsCattleConserved SequenceEvolution, MolecularFurin/chemistryInternetLogistic ModelsResearch Support, Non-U.S. Gov'tSequence Alignment/*methodsSequence Analysis, Protein/*methods*Software2005Jul 115980465http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15980465 [7], which generally utilize databases of known motif patterns to search query protein sequences supplied by the user. Whilst finding matches is trivial, however, interpreting their biological significance is far from easy. The small, degenerate nature of SLiMs makes stochastic occurrences of motifs common; distinguishing real occurrences from the background of random motif hits remains the greatest challenge in a priori motif discovery. One approach is to simply filter out motifs that are likely to occur numerous times by chance ScanProsite ADDIN EN.CITE de Castro2006122[6]12212217de Castro, E.Sigrist, C. J.Gattiker, A.Bulliard, V.Langendijk-Genevaux, P. S.Gasteiger, E.Bairoch, A.Hulo, N.Swiss Institute of Bioinformatics, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland. ecastro@isb-sib.chScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteinsNucleic Acids ResNucleic Acids ResW362-534Web Server issue2006/07/18Amino Acids/*chemistryDatabases, ProteinInternet*Protein Structure, TertiaryProteins/chemistrySequence Analysis, Protein/*methodsSequence Homology, Amino Acid*SoftwareUser-Computer Interface2006Jul 11362-4962 (Electronic)
0305-1048 (Linking)16845026http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16845026153884734/suppl_2/W362 [pii]
10.1093/nar/gkl124eng[6], for example, has an option to Exclude motifs with a high probability of occurrence, while QuasiMotifFinder ADDIN EN.CITE Gutman200545[7]454517Gutman, R.Berezin, C.Wollman, R.Rosenberg, Y.Ben-Tal, N.Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel.QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patternsNucleic Acids ResNucleic Acids ResW255-6133Web Server issue*Amino Acid MotifsAnimalsCattleConserved SequenceEvolution, MolecularFurin/chemistryInternetLogistic ModelsResearch Support, Non-U.S. Gov'tSequence Alignment/*methodsSequence Analysis, Protein/*methods*Software2005Jul 115980465http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15980465 [7] uses the background occurrence of motifs in PfamA families ADDIN EN.CITE Bateman20049[8]9917Bateman, A.Coin, L.Durbin, R.Finn, R. D.Hollich, V.Griffiths-Jones, S.Khanna, A.Marshall, M.Moxon, S.Sonnhammer, E. L.Studholme, D. J.Yeats, C.Eddy, S. R.Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. agb@sanger.ac.ukThe Pfam protein families databaseNucleic Acids ResNucleic Acids ResD138-4132Database issueAnimalsComputational Biology*Databases, ProteinHumansInternetModels, MolecularMultigene FamilyProtein Structure, TertiaryProteins/*chemistry/*classificationResearch Support, Non-U.S. Gov't2004Jan 114681378http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=14681378 [8] to assess the significance of hits. These strategies work well for longer, family descriptor motifs (such as are found in the Prosite database ADDIN EN.CITE ADDIN EN.CITE.DATA [9] used by both ScanProsite and QuasiMotifFinder) but are not so useful for SLiMs because of their tendency to occur by chance. Instead, additional contextual information such as sequence conservation ADDIN EN.CITE ADDIN EN.CITE.DATA [5, 7, 10, 11] and structural context ADDIN EN.CITE ADDIN EN.CITE.DATA [5, 12] can be used to assess the likelihood of true functional significance for putatively functional sites.
Most motif search tools rely on pre-existing motif libraries, such as ELM ADDIN EN.CITE ADDIN EN.CITE.DATA [2], MnM ADDIN EN.CITE Rajasekaran2009115[5]11511517Rajasekaran, S.Balla, S.Gradie, P.Gryk, M. R.Kadaveru, K.Kundeti, V.Maciejewski, M. W.Mi, T.Rubino, N.Vyas, J.Schiller, M. R.Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06029-2155, USA. rajasek@engr.uconn.eduMinimotif miner 2nd release: a database and web system for motif searchNucleic Acids ResNucleic Acids ResD185-9037Database issue2008/11/04*Amino Acid Motifs/genetics*Databases, ProteinInternetPolymorphism, Single NucleotidePrions*Protein Interaction Domains and MotifsSequence Analysis, ProteinUser-Computer Interface2009Jan1362-4962 (Electronic)
0305-1048 (Linking)18978024http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18978024gkn865 [pii]
10.1093/nar/gkn865eng[5] or Prosite ADDIN EN.CITE ADDIN EN.CITE.DATA [9]. Those that permit users to define their own motifs, such as ScanProsite ADDIN EN.CITE de Castro2006122[6]12212217de Castro, E.Sigrist, C. J.Gattiker, A.Bulliard, V.Langendijk-Genevaux, P. S.Gasteiger, E.Bairoch, A.Hulo, N.Swiss Institute of Bioinformatics, 1 rue Michel Servet, CH-1211 Geneva 4, Switzerland. ecastro@isb-sib.chScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteinsNucleic Acids ResNucleic Acids ResW362-534Web Server issue2006/07/18Amino Acids/*chemistryDatabases, ProteinInternet*Protein Structure, TertiaryProteins/chemistrySequence Analysis, Protein/*methodsSequence Homology, Amino Acid*SoftwareUser-Computer Interface2006Jul 11362-4962 (Electronic)
0305-1048 (Linking)16845026http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16845026153884734/suppl_2/W362 [pii]
10.1093/nar/gkl124eng[6], are generally lacking the contextual information required to aid functional inference. Recent developments in de novo motif discovery has given rise to a number of tools that are capable of predicting entirely novel SLiMs from sets of protein sequences (e.g. PRATT ADDIN EN.CITE Jonassen199554[13]545417Jonassen, I.Collins, J. F.Higgins, D. G.Department of Informatics, University of Bergen, HIB, Norway.Finding flexible patterns in unaligned protein sequencesProtein SciProtein Sci1587-9548AlgorithmsAmino Acid SequenceConserved SequenceMolecular Sequence Data*Pattern Recognition, AutomatedProteins/*chemistryResearch Support, Non-U.S. Gov't*Sequence AlignmentSoftware1995Aug8520485http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=8520485 [13], MEME ADDIN EN.CITE Bailey2009116[14]11611617Bailey, T. L.Boden, M.Buske, F. A.Frith, M.Grant, C. E.Clementi, L.Ren, J.Li, W. W.Noble, W. S.Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia. t.bailey@imb.uq.edu.auMEME SUITE: tools for motif discovery and searchingNucleic Acids ResNucleic Acids ResW202-837Web Server issue2009/05/22AlgorithmsBinding SitesDatabases, GeneticInternetRegulatory Elements, Transcriptional*Sequence Analysis, DNA*Sequence Analysis, Protein*SoftwareTranscription Factors/metabolism2009Jul 11362-4962 (Electronic)
0305-1048 (Linking)19458158http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19458158gkp335 [pii]
10.1093/nar/gkp335eng[14], Dilimot ADDIN EN.CITE Neduva200670[15]707017Neduva, V.Russell, R. B.EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany.DILIMOT: discovery of linear motifs in proteinsNucleic Acids ResNucleic Acids ResW350-534Web Server issue*Amino Acid MotifsBinding SitesInternetMicrotubule-Associated Proteins/chemistry/metabolismProtein Kinases/metabolismProtein Sorting SignalsSequence Analysis, Protein/*methods*SoftwareUser-Computer Interface2006Jul 116845024http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16845024 [15], SLiMDisc ADDIN EN.CITE Davey200623[16]232317Davey, N. E.Shields, D. C.Edwards, R. J.Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.SLiMDisc: short, linear motif discovery, correcting for common evolutionary descentNucleic Acids ResNucleic Acids Res3546-543412*Amino Acid MotifsEvolution, MolecularHumansProteins/chemistry/geneticsSequence Analysis, Protein/*methods*Software200616855291http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16855291 [16] and SLiMFinder ADDIN EN.CITE Edwards200733[17]333317Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, Ireland.SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in ProteinsPLoS ONEPLoS ONEe967210200717912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346 [17]). Although SLiMFinder ADDIN EN.CITE Edwards200733[17]333317Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, Ireland.SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in ProteinsPLoS ONEPLoS ONEe967210200717912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346 [17] estimates the statistical significance of returned motif predictions, correcting for biases introduced by evolutionary relationships within the data, assessing the biological significance of predicted SLiMs remains challenging. On approach is to compare candidate SLiMs to existing motif libraries to identify similarities to previously known motifs ADDIN EN.CITE Edwards2008114[18]11411417Edwards, R. J.Davey, N. E.Shields, D. C.UCD Complex and Adaptive Systems Laboratory and UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland. r.edwards@southampton.ac.ukCompariMotif: quick and easy comparisons of sequence motifsBioinformaticsBioinformatics1307-924102008/04/01*Algorithms*Amino Acid MotifsAmino Acid SequenceMolecular Sequence DataProteins/*chemistrySequence Alignment/*methodsSequence Analysis, Protein/*methods*Sequence Homology, Amino Acid*Software2008May 151367-4811 (Electronic)
1367-4803 (Linking)18375965http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18375965btn105 [pii]
10.1093/bioinformatics/btn105eng[18].When a genuinely novel motif is predicted, however, knowledge of existing motifs is of limited use. Instead, it is useful to be able to establish the background distribution of occurrences of the novel motif, utilizing contextual information to help screen out the inevitable spurious chance matches.
We recently made our powerful de novo SLiM discovery tool, SLiMFinder ADDIN EN.CITE Edwards200733[17]333317Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, Ireland.SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in ProteinsPLoS ONEPLoS ONEe967210200717912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346 [17], available as a webserver ADDIN EN.CITE Davey2010208[19]20820817Davey, N. E.Haslam, N. J.Shields, D. C.Edwards, R. J.Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany, UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute, and School of Medicine and Medical Sciences, University College Dublin, Dublin 4, Ireland and School of Biological Sciences, University of Southampton, Southampton, UK.SLiMFinder: a web server to find novel, significantly over-represented, short protein motifsNucleic Acids ResNucleic Acids Res2010/05/262010May 231362-4962 (Electronic)
0305-1048 (Linking)20497999http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=20497999gkq440 [pii]
10.1093/nar/gkq440Eng[19]. To aid interpretation of SLiMFinder results, we have made a new tool available, SLiMSearch, which allows users to search protein datasets with user-defined motifs, including motif prediction output from SLiMFinder. SLiMSearch utilizes the same sequence context assessment as SLiMFinder, enabling results to be masked or ranked based on the important biological indicators of sequence conservation and structural disorder ADDIN EN.CITE ADDIN EN.CITE.DATA [11, 20]. SLiMSearch also features the same SLiMChance algorithm for assessing statistical over-representation of SLiM occurrences, correcting for biases introduced by evolutionary relationships within the data. SLiMSearch is open source and freely available for download. For ease of use, the main SLiMSearch features have been made available as a webserver, which enables the user to proteins for occurrences of user-specified motifs. Motifs can be searched against small custom datasets of proteins from UniProt ADDIN EN.CITE Bairoch20057[21]7717Bairoch, A.Apweiler, R.Wu, C. H.Barker, W. C.Boeckmann, B.Ferro, S.Gasteiger, E.Huang, H.Lopez, R.Magrane, M.Martin, M. J.Natale, D. A.O'Donovan, C.Redaschi, N.Yeh, L. S.Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.The Universal Protein Resource (UniProt)Nucleic Acids ResNucleic Acids ResD154-933Database issueAmino Acid Sequence*Databases, ProteinProteins/*chemistry/physiologyResearch Support, Non-U.S. Gov'tResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.Systems IntegrationUser-Computer Interface2005Jan 115608167http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608167 [21]. Alternatively, searches can be performed against the whole human proteome, or defined subsets of it. Underlying methods, results formats and visualizations are fully compatible with our existing SLiM analysis webservers, SLiMDisc ADDIN EN.CITE Davey200722[22]222217Davey, N. E.Edwards, R. J.Shields, D. C.UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.The SLiMDisc server: short, linear motif discovery in proteinsNucleic Acids ResNucleic Acids ResW455-935Web Server issue*Amino Acid MotifsAmino Acid SequenceAnimalsComputational Biology/*methodsComputer SimulationEvolution, MolecularHumansInternetModels, BiologicalMolecular Sequence DataProteins/*chemistry/geneticsRatsSequence Analysis, Protein/*methodsSequence Homology, Amino Acid*Software2007Jul17576682http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17576682 [22], CompariMotif ADDIN EN.CITE Edwards2008114[18]11411417Edwards, R. J.Davey, N. E.Shields, D. C.UCD Complex and Adaptive Systems Laboratory and UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland. r.edwards@southampton.ac.ukCompariMotif: quick and easy comparisons of sequence motifsBioinformaticsBioinformatics1307-924102008/04/01*Algorithms*Amino Acid MotifsAmino Acid SequenceMolecular Sequence DataProteins/*chemistrySequence Alignment/*methodsSequence Analysis, Protein/*methods*Sequence Homology, Amino Acid*Software2008May 151367-4811 (Electronic)
1367-4803 (Linking)18375965http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18375965btn105 [pii]
10.1093/bioinformatics/btn105eng[18] and SLiMFinder ADDIN EN.CITE Davey2010208[19]20820817Davey, N. E.Haslam, N. J.Shields, D. C.Edwards, R. J.Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany, UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute, and School of Medicine and Medical Sciences, University College Dublin, Dublin 4, Ireland and School of Biological Sciences, University of Southampton, Southampton, UK.SLiMFinder: a web server to find novel, significantly over-represented, short protein motifsNucleic Acids ResNucleic Acids Res2010/05/262010May 231362-4962 (Electronic)
0305-1048 (Linking)20497999http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=20497999gkq440 [pii]
10.1093/nar/gkq440Eng[19], providing a suite of integrated tools for analyzing these biologically important sequence features.
The SLiMSearch Algorithm
SLiMSearch performs its motif finding in three phases: (1) Input sequences and read and masked; (2) Motifs are searched against masked sequences using standard regular expression searches; (3) Motif statistics are calculated for identified motif occurrences. If desired, input sequences, input motifs and motif occurrences can be filtered based on attributes such as length, number of positions, motif conservation etc. SLiMs have a tendency to occur in disordered regions of proteins ADDIN EN.CITE Russell200899[23]999917Russell, R. B.Gibson, T. J.EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany. russell@embl.deA careful disorderliness in the proteome: sites for interaction and targets for future therapiesFEBS LettFEBS Lett1271-558282008/02/21*Proteomics*Therapeutics2008Apr 90014-5793 (Print)18284921http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18284921S0014-5793(08)00132-4 [pii]
10.1016/j.febslet.2008.02.027eng[23] and IUPred ADDIN EN.CITE Dosztanyi200529[20]292917Dosztanyi, Z.Csizmok, V.Tompa, P.Simon, I.Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary. zsuzsa@enzim.huIUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy contentBioinformaticsBioinformatics3433-42116*AlgorithmsComputer SimulationEnergy Transfer*Internet*Models, Chemical*Models, MolecularProtein ConformationProtein FoldingProteins/analysis/*chemistrySequence Alignment/*methodsSequence Analysis, Protein/*methods*SoftwareStructure-Activity Relationship2005Aug 1515955779http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15955779 [20] protein disorder predictions can be used for input masking or ranking/filtering results as described further below. Conservation scoring uses the Relative Local Conservation (RLC) score introduced by Davey et al. ADDIN EN.CITE Davey2009111[11]11111117Davey, N. E.Shields, D. C.Edwards, R. J.UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.Masking residues using context-specific evolutionary conservation significantly improves short linear motif discoveryBioinformaticsBioinformatics443-502542009/01/13*AlgorithmsComputational Biology/methodsConserved SequenceDatabases, ProteinEvolution, Molecular*Protein Interaction Domains and MotifsProtein Interaction MappingProteins/chemistrySequence AlignmentSequence Analysis, Protein/*methods2009Feb 151367-4811 (Electronic)
1367-4803 (Linking)19136552http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19136552btn664 [pii]
10.1093/bioinformatics/btn664eng[11] as implemented in SLiMFinder ADDIN EN.CITE Davey2010208[19]20820817Davey, N. E.Haslam, N. J.Shields, D. C.Edwards, R. J.Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany, UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute, and School of Medicine and Medical Sciences, University College Dublin, Dublin 4, Ireland and School of Biological Sciences, University of Southampton, Southampton, UK.SLiMFinder: a web server to find novel, significantly over-represented, short protein motifsNucleic Acids ResNucleic Acids Res2010/05/262010May 231362-4962 (Electronic)
0305-1048 (Linking)20497999http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=20497999gkq440 [pii]
10.1093/nar/gkq440Eng[19]. Conservation scoring can use pre-generated alignments or construct alignments of predicted orthology using GOPHER ADDIN EN.CITE Davey200722[22]222217Davey, N. E.Edwards, R. J.Shields, D. C.UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.The SLiMDisc server: short, linear motif discovery in proteinsNucleic Acids ResNucleic Acids ResW455-935Web Server issue*Amino Acid MotifsAmino Acid SequenceAnimalsComputational Biology/*methodsComputer SimulationEvolution, MolecularHumansInternetModels, BiologicalMolecular Sequence DataProteins/*chemistry/geneticsRatsSequence Analysis, Protein/*methodsSequence Homology, Amino Acid*Software2007Jul17576682http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17576682 [22], which estimates evolutionary relationships using BLAST ADDIN EN.CITE Altschul19972[24]2217Altschul, S. F.Madden, T. L.Schaffer, A. A.Zhang, J.Zhang, Z.Miller, W.Lipman, D. J.National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. altschul@ncbi.nlm.nih.govGapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids ResNucleic Acids Res3389-4022517AlgorithmsAmino Acid SequenceAnimalsDNA/*chemistry*Databases, FactualHumansMolecular Sequence DataProteins/*chemistryResearch Support, U.S. Gov't, P.H.S.*Sequence Alignment*Software1997Sep 19254694http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=9254694 [24] to identify the closest-related orthologue in each species in the chosen search database. Each putative orthologue retained is (a) more closely related to the query than any other protein from the same species; (b) related to the query through a predicted speciation event, not a duplication event.
SLiMChance Calculations of Significance
SLiMSearch utilizes a variation of the SLiMChance algorithm from SLiMFinder ADDIN EN.CITE Edwards2007209[17]20920917Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin, Dublin, Ireland. r.edwards@soton.ac.ukSLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteinsPLoS ONEPLoS ONEe9672102007/10/04AlgorithmsAmino Acid MotifsAmino Acids/*chemistryComputational Biology/*methodsDatabases, ProteinDimerizationEvolution, MolecularHumansModels, StatisticalPattern Recognition, AutomatedProbabilityProgramming LanguagesProteins/*chemistrySequence AlignmentSoftware20071932-6203 (Electronic)17912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346198913510.1371/journal.pone.0000967eng[17], which is based on the binomial statistics introduced by ASSET ADDIN EN.CITE Neuwald199473[25]737317Neuwald, A. F.Green, P.National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894.Detecting patterns in protein sequencesJ Mol BiolJ Mol Biol698-7122395Acetyltransferases/chemistryAlgorithms*Amino Acid Sequence*Conserved SequenceMolecular Sequence DataPattern Recognition, AutomatedSequence AlignmentStatistics as Topic1994Jun 248014990http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=8014990 [25] and calculates the a priori probability of observing each motif in each sequence using the (masked) amino acid frequencies of input sequences. Observed support is then compared to expectation at two levels: (1) the total number of occurrences in all sequences; (2) the number of individual sequences returning the motif. This enables different questions to be asked of different data types. SLiMChance has an important extension over the statistics used by ASSET, and homologous proteins are optionally weighted (as in SLiMDisc ADDIN EN.CITE Davey200623[16]232317Davey, N. E.Shields, D. C.Edwards, R. J.Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.SLiMDisc: short, linear motif discovery, correcting for common evolutionary descentNucleic Acids ResNucleic Acids Res3546-543412*Amino Acid MotifsEvolution, MolecularHumansProteins/chemistry/geneticsSequence Analysis, Protein/*methods*Software200616855291http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16855291 [16] and SLiMFinder ADDIN EN.CITE Edwards2007209[17]20920917Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin, Dublin, Ireland. r.edwards@soton.ac.ukSLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteinsPLoS ONEPLoS ONEe9672102007/10/04AlgorithmsAmino Acid MotifsAmino Acids/*chemistryComputational Biology/*methodsDatabases, ProteinDimerizationEvolution, MolecularHumansModels, StatisticalPattern Recognition, AutomatedProbabilityProgramming LanguagesProteins/*chemistrySequence AlignmentSoftware20071932-6203 (Electronic)17912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346198913510.1371/journal.pone.0000967eng[17]) to account for the dependencies introduced into the probabilistic framework by homologous proteins; in this case, SLiMSearch will also assess these weighted support values. Whereas SLiMFinder is explicitly using over-representation to identify, it is also of potential interest to see if a given motif has been avoided in a given dataset and is under-represented versus random expectation. The SLiMSearch implementation of SLiMChance therefore features an additional extension where the cumulative binomial probability is used to estimate the probability of seeing by chance the observed support or less in addition to the observed support or more.
The SLiMSearch Webserver
The SLiMSearch server is available at: http://bioware.ucd.ie/slimsearch.html. The purpose of the webserver is to allow researchers to identify novel occurrences of pre-defined Short Linear Motifs (SLiMs) in a set of sequences. Currently, searches are restricted to the human proteome or specific subsets thereof but future releases will expand these capabilities. (To search different datasets, users should use the downloadable version of SLiMSearch.) Sequences are first masked according to user specifications before motif occurrences are identified using standard regular expression searches. The SLiMChance algorithm then estimates statistical significance of over- or under-representation of each motif searched. In addition to summary results for each motif, interactive output permits easy exploration and visualization of individual motif occurrences. The context of each SLiM occurrence is then calculated in terms of protein disorder and evolutionary conservation to help the user gain insight into the validity of a putatively functional motif occurrence. The webserver is powered by the same code as the standalone version of SLiMSearch, which can be downloaded from the server. The main features of the webserver are described in more detail in the following sections.
Input
As input, SLiMSearch needs a set of protein sequences and a set of motif definitions, which are selected by the user in turn (Fig. 1). Whereas the standalone SLiMSearch program allows searching of any protein sequences, the webserver restricts the user to using UniProt sequences ADDIN EN.CITE Bairoch20057[21]7717Bairoch, A.Apweiler, R.Wu, C. H.Barker, W. C.Boeckmann, B.Ferro, S.Gasteiger, E.Huang, H.Lopez, R.Magrane, M.Martin, M. J.Natale, D. A.O'Donovan, C.Redaschi, N.Yeh, L. S.Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.The Universal Protein Resource (UniProt)Nucleic Acids ResNucleic Acids ResD154-933Database issueAmino Acid Sequence*Databases, ProteinProteins/*chemistry/physiologyResearch Support, Non-U.S. Gov'tResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.Systems IntegrationUser-Computer Interface2005Jan 115608167http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608167 [21]. This is because the server relies on pre-computed alignments to keep run times down. Using UniProt downloads also allows all the masking options to be utilized. The user is presented with a choice of two main input types (Fig. 1): (1) a chosen set of up to 100 UniProt entries can be downloaded for analysis; (2) the user can select from a series of predefined protein datasets. Currently, the human proteome from SwissProt ADDIN EN.CITE Bairoch20057[21]7717Bairoch, A.Apweiler, R.Wu, C. H.Barker, W. C.Boeckmann, B.Ferro, S.Gasteiger, E.Huang, H.Lopez, R.Magrane, M.Martin, M. J.Natale, D. A.O'Donovan, C.Redaschi, N.Yeh, L. S.Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.The Universal Protein Resource (UniProt)Nucleic Acids ResNucleic Acids ResD154-933Database issueAmino Acid Sequence*Databases, ProteinProteins/*chemistry/physiologyResearch Support, Non-U.S. Gov'tResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.Systems IntegrationUser-Computer Interface2005Jan 115608167http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608167 [21] is available, along with three subsets defined by their subcellular localization annotation: cytoplasmic proteins, nuclear proteins and transmembrane proteins. Future server releases will expand this to other species. When searching these large proteome datasets, the evolutionary filtering ADDIN EN.CITE Edwards2007209[17]20920917Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin, Dublin, Ireland. r.edwards@soton.ac.ukSLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteinsPLoS ONEPLoS ONEe9672102007/10/04AlgorithmsAmino Acid MotifsAmino Acids/*chemistryComputational Biology/*methodsDatabases, ProteinDimerizationEvolution, MolecularHumansModels, StatisticalPattern Recognition, AutomatedProbabilityProgramming LanguagesProteins/*chemistrySequence AlignmentSoftware20071932-6203 (Electronic)17912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346198913510.1371/journal.pone.0000967eng[17] is switched off.
Once a dataset has been selected, the user must input a set of motifs to search (Fig. 1). The SLiMSearch server takes a list of motifs, typed or pasted directly into the text box. Motifs themselves are constructed from a number of regular expression elements, which are mostly standard but with a couple of additional elements to represent 3of5 motifs ADDIN EN.CITE Seiler2006210[26]21021017Seiler, M.Mehrle, A.Poustka, A.Wiemann, S.Division of Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany. m.seiler@dkfz.deThe 3of5 web application for complex and comprehensive pattern matching in protein sequencesBMC BioinformaticsBMC Bioinformatics14472006/03/18*AlgorithmsAmino Acid SequenceArtificial Intelligence*InternetMolecular Sequence DataOnline SystemsPattern Recognition, Automated/*methodsProteins/*chemistrySequence Alignment/methodsSequence Analysis, Protein/*methods*Software20061471-2105 (Electronic)
1471-2105 (Linking)16542452http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1654245215232171471-2105-7-144 [pii]
10.1186/1471-2105-7-144eng[26] (Table 1). SLiMSearch accepts the same input formats as CompariMotif ADDIN EN.CITE Edwards2008114[18]11411417Edwards, R. J.Davey, N. E.Shields, D. C.UCD Complex and Adaptive Systems Laboratory and UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland. r.edwards@southampton.ac.ukCompariMotif: quick and easy comparisons of sequence motifsBioinformaticsBioinformatics1307-924102008/04/01*Algorithms*Amino Acid MotifsAmino Acid SequenceMolecular Sequence DataProteins/*chemistrySequence Alignment/*methodsSequence Analysis, Protein/*methods*Sequence Homology, Amino Acid*Software2008May 151367-4811 (Electronic)
1367-4803 (Linking)18375965http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18375965btn105 [pii]
10.1093/bioinformatics/btn105eng[18], including a plain list of regular expressions; output from SLiMDisc ADDIN EN.CITE Davey200722[22]222217Davey, N. E.Edwards, R. J.Shields, D. C.UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland.The SLiMDisc server: short, linear motif discovery in proteinsNucleic Acids ResNucleic Acids ResW455-935Web Server issue*Amino Acid MotifsAmino Acid SequenceAnimalsComputational Biology/*methodsComputer SimulationEvolution, MolecularHumansInternetModels, BiologicalMolecular Sequence DataProteins/*chemistry/geneticsRatsSequence Analysis, Protein/*methodsSequence Homology, Amino Acid*Software2007Jul17576682http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17576682 [22] or SLiMFinder ADDIN EN.CITE Davey2010208[19]20820817Davey, N. E.Haslam, N. J.Shields, D. C.Edwards, R. J.Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany, UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute, and School of Medicine and Medical Sciences, University College Dublin, Dublin 4, Ireland and School of Biological Sciences, University of Southampton, Southampton, UK.SLiMFinder: a web server to find novel, significantly over-represented, short protein motifsNucleic Acids ResNucleic Acids Res2010/05/262010May 231362-4962 (Electronic)
0305-1048 (Linking)20497999http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=20497999gkq440 [pii]
10.1093/nar/gkq440Eng[19] can also be used. Because the focus of SLiMSearch is short linear motifs, the maximum number of consecutive wildcards allowed by the server is nine. Motifs must have at least two defined (i.e. non-wildcard) positions.
Masking Options
The standalone SLiMSearch program features all the input masking options of SLiMFinder ADDIN EN.CITE Edwards2007209[17]20920917Edwards, R. J.Davey, N. E.Shields, D. C.University College Dublin Complex and Adaptive Systems Laboratory, University College Dublin, Dublin, Ireland. r.edwards@soton.ac.ukSLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteinsPLoS ONEPLoS ONEe9672102007/10/04AlgorithmsAmino Acid MotifsAmino Acids/*chemistryComputational Biology/*methodsDatabases, ProteinDimerizationEvolution, MolecularHumansModels, StatisticalPattern Recognition, AutomatedProbabilityProgramming LanguagesProteins/*chemistrySequence AlignmentSoftware20071932-6203 (Electronic)17912346http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17912346198913510.1371/journal.pone.0000967eng[17]. For simplicity, these have been pared down for the webserver to three sets of masking options (Fig. 1): (1) restricting searches to cytoplasmic tails and loops of transmembrane proteins, (2) masking out structurally ordered regions (as predicted by IUPred ADDIN EN.CITE Dosztanyi200529[20]292917Dosztanyi, Z.Csizmok, V.Tompa, P.Simon, I.Institute of Enzymology, BRC, Hungarian Academy of Sciences, PO Box 7, H-1518 Budapest, Hungary. zsuzsa@enzim.huIUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy contentBioinformaticsBioinformatics3433-42116*AlgorithmsComputer SimulationEnergy Transfer*Internet*Models, Chemical*Models, MolecularProtein ConformationProtein FoldingProteins/analysis/*chemistrySequence Alignment/*methodsSequence Analysis, Protein/*methods*SoftwareStructure-Activity Relationship2005Aug 1515955779http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15955779 [20] with a conservative threshold of 0.2) and/or relatively under-conserved residues ADDIN EN.CITE Davey2009111[11]11111117Davey, N. E.Shields, D. C.Edwards, R. J.UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.Masking residues using context-specific evolutionary conservation significantly improves short linear motif discoveryBioinformaticsBioinformatics443-502542009/01/13*AlgorithmsComputational Biology/methodsConserved SequenceDatabases, ProteinEvolution, Molecular*Protein Interaction Domains and MotifsProtein Interaction MappingProteins/chemistrySequence AlignmentSequence Analysis, Protein/*methods2009Feb 151367-4811 (Electronic)
1367-4803 (Linking)19136552http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19136552btn664 [pii]
10.1093/bioinformatics/btn664eng[11], (3) masking out domains, transmembrane and/or extracellular regions as annotated by UniProt ADDIN EN.CITE Bairoch20057[21]7717Bairoch, A.Apweiler, R.Wu, C. H.Barker, W. C.Boeckmann, B.Ferro, S.Gasteiger, E.Huang, H.Lopez, R.Magrane, M.Martin, M. J.Natale, D. A.O'Donovan, C.Redaschi, N.Yeh, L. S.Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.The Universal Protein Resource (UniProt)Nucleic Acids ResNucleic Acids ResD154-933Database issueAmino Acid Sequence*Databases, ProteinProteins/*chemistry/physiologyResearch Support, Non-U.S. Gov'tResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.Systems IntegrationUser-Computer Interface2005Jan 115608167http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608167 [21]. Any combination of these options is permitted; users could, for example, restrict searches to cytoplasmic tails and loops of transmembrane proteins and mask out regions of predicted order, under-conserved residues regions annotated as domains in UniProt.
Submitting Jobs
Once options have been chosen, clicking Submit will enter the job in the run queue. Run times will vary according to input data size and complexity, masking options and the current load of the server; the server has a maximum run time of 4 hours, after which jobs will be terminated. (For larger searches, users are encouraged to download and install a local version of SLiMSearch.) Each job is allocated a unique, randomly determined identifier. Users can either wait for their jobs to run, or bookmark the page and return to it later. Previously run job IDs can also be entered into a box on the SLiMSearch homepage to retrieve the run status and/or results.
Output
Once a job has run, the SLiMSearch results pages will open (Fig. 2). The main results page consists of a table of motif occurrences for each motif along with statistics for each occurrence including conservation (RLC) and disorder (IUPred). All fields can be sorted by clicking column headings and direct links to UniProt entries for each sequence are provided. The second primary results page consists of a summary table, which provides summary statistics for each motif. These include numbers of occurrences and SLiMChance assessments of over- or under-representation versus random expectation. Explanations of each field can be found in the SLiMSearch manual, which is available from the website. All the raw results files can also be downloaded for further analysis. When a user-defined dataset has been searched, these raw data files include the UniProt download.
Individual motif occurrences can also be visualized for contextual information (Fig. 3). The multiple sequence alignment used for evolutionary conservation calculations is shown, with the relative conservation and IUPred disorder scores plotted below. Regions predicted to be disordered (below the disorder threshold of 0.2) are shaded, indicating areas that were (or would be) masked with disorder masking. In addition to these data, additional annotation from key SLiM and Protein databases is added. Annotated and unannotated Regular Expression matches to SLiMs from the Eukaryotic Linear Motif (ELM) database ADDIN EN.CITE ADDIN EN.CITE.DATA [2] are displayed above the alignment; sequence features from UniProt ADDIN EN.CITE Bairoch20057[21]7717Bairoch, A.Apweiler, R.Wu, C. H.Barker, W. C.Boeckmann, B.Ferro, S.Gasteiger, E.Huang, H.Lopez, R.Magrane, M.Martin, M. J.Natale, D. A.O'Donovan, C.Redaschi, N.Yeh, L. S.Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.The Universal Protein Resource (UniProt)Nucleic Acids ResNucleic Acids ResD154-933Database issueAmino Acid Sequence*Databases, ProteinProteins/*chemistry/physiologyResearch Support, Non-U.S. Gov'tResearch Support, U.S. Gov't, Non-P.H.S.Research Support, U.S. Gov't, P.H.S.Systems IntegrationUser-Computer Interface2005Jan 115608167http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608167 [21], including annotated domains and known mutations, are displayed between the alignment and RLC/Disorder plots. Users can hover the mouse over these features for additional information.
Getting Help
The SLiMSearch webserver is supported by an extensive help section, including a quickstart guide and walkthrough with screenshots. Example input files are provided. Fully interactive example output (corresponding to running the example input with default parameters) is clearly linked from the help pages. (See example analysis.) Additional details of the algorithms and options can be found in the SLiMSearch manual, which is also clearly linked from the help pages.
Server limits
The server is currently limited to jobs with a run time of fewer than 4 hours. Motifs must have at least two non-wildcard positions defined and individual motif occurrence data is restricted to motifs with no more than 2000 occurrences in the search dataset. Custom UniProt datasets can have no more than 100 proteins. For larger analyses, users must install a local copy of the SLiMSearch software.
Example Analysis: HOX Ligand Motif
Homeobox (HOX) genes are a family of transcription factors controlling organization of segmental identity during embryo development ADDIN EN.CITE Wellik2009167[27]16716717Wellik, D. M.Department of Internal Medicine, University of Michigan Medical Center, Ann Arbor, Michigan, USA.Hox genes and vertebrate axial patternCurr Top Dev BiolCurr Top Dev Biol257-78882009/08/05AnimalsBody Patterning/*geneticsBone and Bones/embryology/*metabolismGene Expression Regulation, DevelopmentalHomeodomain Proteins/*geneticsMiceModels, BiologicalMutationSpine/embryology/metabolismVertebrates/embryology/*genetics20090070-2153 (Print)
0070-2153 (Linking)19651308http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19651308S0070-2153(09)88009-5 [pii]
10.1016/S0070-2153(09)88009-5eng[27] and recognized by a 60 residue DNA binding domain known as a Homeodomain ADDIN EN.CITE Gehring1994168[28]16816817Gehring, W. J.Affolter, M.Burglin, T.Biozentrum, University of Basel, Switzerland.Homeodomain proteinsAnnu Rev BiochemAnnu Rev Biochem487-526631994/01/01Amino Acid SequenceAnimalsBase SequenceGene Expression Regulation, Developmental/*physiologyGenes, HomeoboxHomeodomain Proteins/*chemistry/*physiologyHumansModels, MolecularMolecular Sequence DataSequence Homology, Amino AcidStructure-Activity RelationshipTranscription, Genetic/*physiology19940066-4154 (Print)
0066-4154 (Linking)7979246http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=797924610.1146/annurev.bi.63.070194.002415eng[28]. HOX proteins recruit another Homeobox containing transcription factor PBX via conserved [FY][DEP]WM motif (LIG_HOMEOBOX ADDIN EN.CITE ADDIN EN.CITE.DATA [2]), binding a hydrophobic pocket created upon association of PBX to DNA ADDIN EN.CITE Sprules2003166[29]16616617Sprules, T.Green, N.Featherstone, M.Gehring, K.Department of Biochemistry, McGill University, Montreal, Quebec H3G 1Y6, Canada.Lock and key binding of the HOX YPWM peptide to the PBX homeodomainJ Biol ChemJ Biol Chem1053-827822002/11/01Amino Acid MotifsAmino Acid SequenceDNA/*metabolismDimerizationHomeodomain Proteins/chemistry/*metabolismHydrophobicityMolecular Sequence DataProtein Structure, Secondary2003Jan 100021-9258 (Print)
0021-9258 (Linking)12409300http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1240930010.1074/jbc.M207504200
M207504200 [pii]eng[29]. Alone, the Homeodomain has weak specificity and affinity binding to the short DNA sequence TNAT, however following the formation of a heterodimer complex with TGAT binging Pbx, bi-partite recognition increases specificity and allows HOX to specifically target developmental genes for expression.
A survey of the human proteome for [FY][DEP]WM Pbx binding motifs was completed to illustrate the effect of masking of globular regions and under conserved residues on the ability of a motif discovery tool to return functional motifs. Without any masking, SLiMSearch returned 53 motifs in 53 proteins, including the 16 annotated functional instances from the ELM database ADDIN EN.CITE ADDIN EN.CITE.DATA [2](Supplementary Table 1). Of the 53 human occurrences, however, 30 were no longer returned following masking (IUPred masking cut-off 0.2, relative conservation filtering, domain masking and removal of extracellular and transmembrane regions). Of these 30, only 3 were known to be functional. The 23 remaining instances are all members of the Homeobox family; 13 of these contain a known annotated Pbx binding motif; given the homology of the remaining non-ELM containing proteins to the proteins containing function motifs, it is likely that all 23 instances are functional. The HXA5 occurrence, for example, shows a clear conservation signal characteristic of a functional motif despite not being annotated in ELM (Fig. 3).
Future Work
In addition to evolutionary conservation and structural disorder, successful identification of novel functional motifs in proteins can benefit from keyword or GO term enrichment ADDIN EN.CITE Michael200862[30]626217Michael, S.Trave, G.Ramu, C.Chica, C.Gibson, T. J.Structural and Computational Biology Unit, EMBL, Meyerhofstrasse 1, D-69117, Heidelberg, Germany.Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservationBioinformaticsBioinformatics453-7244Amino Acid MotifsAmino Acid SequenceCell Cycle Proteins/*chemistry*Conserved SequenceHumansMolecular Sequence Data2008Feb 1518184688http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18184688 [30]. We are currently working on the incorporation of GO term enrichment into SLiMSearch analyses for future releases of the webserver. The current server is also limited to the human proteome only. In future we will expand this to include other organisms. Initially, these will be taken from the EnsEMBL database of eukaryotic genomes ADDIN EN.CITE ADDIN EN.CITE.DATA [31] and then expanded to other taxonomic groups ADDIN EN.CITE ADDIN EN.CITE.DATA [32]. We welcome suggestions from users, however, and will work with specific interest groups to add proteomes from appropriate species to the webserver where possible.
Conclusion
Discovering and annotating novel occurrences of Short Linear Motifs is an important ongoing task in biology, which often involves motif searches combined with additional evolutionary analyses (e.g. ADDIN EN.CITE ADDIN EN.CITE.DATA [30, 33]). The SLiMSearch webserver provides the biological community with an important advance in this arena, allowing evolutionary and structural context to be automatically incorporated into motif searches and visualized in user-friendly output. The flexibility of input, allowing known or novel motifs and user-defined protein datasets, combined with the statistical framework of SLiMChance for assessing motif abundance, makes SLiMSearch a powerful tool that should ease future discoveries of functional SLiM occurrences. In addition to the webserver implementation, SLiMSearch is available as standalone open source Python code under a GNU license, making it accessible to analyses of experimental biologists and bioinformatics specialists alike.
The SLiMSearch server is available at: HYPERLINK "http://bioware.ucd.ie/slimsearch.html" http://bioware.ucd.ie/slimsearch.html.
Acknowledgments. This work was supported by Science Foundation Ireland, the University of Southampton and a European Molecular Biology Laboratory,
EMBL Interdisciplinary Postdoc (EIPOD) fellowship (to N.E.D.).
References
ADDIN EN.REFLIST 1. Diella, F., Haslam, N., Chica, C., Budd, A., Michael, S., Brown, N.P., Trave, G., Gibson, T.J.: Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci 13, 6580-6603 (2008)
2. Gould, C.M., Diella, F., Via, A., Puntervoll, P., Gemund, C., Chabanis-Davidson, S., Michael, S., Sayadi, A., Bryne, J.C., Chica, C., Seiler, M., Davey, N.E., Haslam, N., Weatheritt, R.J., Budd, A., Hughes, T., Pas, J., Rychlewski, L., Trave, G., Aasland, R., Helmer-Citterich, M., Linding, R., Gibson, T.J.: ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res 38, D167-180 (2010)
3. Kadaveru, K., Vyas, J., Schiller, M.R.: Viral infection and human disease--insights from minimotifs. Front Biosci 13, 6455-6471 (2008)
4. Neduva, V., Russell, R.B.: Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol 17, 465-471 (2006)
5. Rajasekaran, S., Balla, S., Gradie, P., Gryk, M.R., Kadaveru, K., Kundeti, V., Maciejewski, M.W., Mi, T., Rubino, N., Vyas, J., Schiller, M.R.: Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Res 37, D185-190 (2009)
6. de Castro, E., Sigrist, C.J., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N.: ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34, W362-365 (2006)
7. Gutman, R., Berezin, C., Wollman, R., Rosenberg, Y., Ben-Tal, N.: QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns. Nucleic Acids Res 33, W255-261 (2005)
8. Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., Studholme, D.J., Yeats, C., Eddy, S.R.: The Pfam protein families database. Nucleic Acids Res 32, D138-141 (2004)
9. Sigrist, C.J., Cerutti, L., de Castro, E., Langendijk-Genevaux, P.S., Bulliard, V., Bairoch, A., Hulo, N.: PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 38, D161-166 (2010)
10. Chica, C., Labarga, A., Gould, C.M., Lopez, R., Gibson, T.J.: A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences. BMC Bioinformatics 9, 229 (2008)
11. Davey, N.E., Shields, D.C., Edwards, R.J.: Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics 25, 443-450 (2009)
12. Via, A., Gould, C.M., Gemund, C., Gibson, T.J., Helmer-Citterich, M.: A structure filter for the Eukaryotic Linear Motif Resource. BMC Bioinformatics 10, 351 (2009)
13. Jonassen, I., Collins, J.F., Higgins, D.G.: Finding flexible patterns in unaligned protein sequences. Protein Sci 4, 1587-1595 (1995)
14. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., Noble, W.S.: MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37, W202-208 (2009)
15. Neduva, V., Russell, R.B.: DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res 34, W350-355 (2006)
16. Davey, N.E., Shields, D.C., Edwards, R.J.: SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res 34, 3546-3554 (2006)
17. Edwards, R.J., Davey, N.E., Shields, D.C.: SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins. PLoS ONE 2, e967 (2007)
18. Edwards, R.J., Davey, N.E., Shields, D.C.: CompariMotif: quick and easy comparisons of sequence motifs. Bioinformatics 24, 1307-1309 (2008)
19. Davey, N.E., Haslam, N.J., Shields, D.C., Edwards, R.J.: SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nucleic Acids Res (2010)
20. Dosztanyi, Z., Csizmok, V., Tompa, P., Simon, I.: IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433-3434 (2005)
21. Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O'Donovan, C., Redaschi, N., Yeh, L.S.: The Universal Protein Resource (UniProt). Nucleic Acids Res 33, D154-159 (2005)
22. Davey, N.E., Edwards, R.J., Shields, D.C.: The SLiMDisc server: short, linear motif discovery in proteins. Nucleic Acids Res 35, W455-459 (2007)
23. Russell, R.B., Gibson, T.J.: A careful disorderliness in the proteome: sites for interaction and targets for future therapies. FEBS Lett 582, 1271-1275 (2008)
24. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402 (1997)
25. Neuwald, A.F., Green, P.: Detecting patterns in protein sequences. J Mol Biol 239, 698-712 (1994)
26. Seiler, M., Mehrle, A., Poustka, A., Wiemann, S.: The 3of5 web application for complex and comprehensive pattern matching in protein sequences. BMC Bioinformatics 7, 144 (2006)
27. Wellik, D.M.: Hox genes and vertebrate axial pattern. Curr Top Dev Biol 88, 257-278 (2009)
28. Gehring, W.J., Affolter, M., Burglin, T.: Homeodomain proteins. Annu Rev Biochem 63, 487-526 (1994)
29. Sprules, T., Green, N., Featherstone, M., Gehring, K.: Lock and key binding of the HOX YPWM peptide to the PBX homeodomain. J Biol Chem 278, 1053-1058 (2003)
30. Michael, S., Trave, G., Ramu, C., Chica, C., Gibson, T.J.: Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation. Bioinformatics 24, 453-457 (2008)
31. Hubbard, T.J., Aken, B.L., Ayling, S., Ballester, B., Beal, K., Bragin, E., Brent, S., Chen, Y., Clapham, P., Clarke, L., Coates, G., Fairley, S., Fitzgerald, S., Fernandez-Banet, J., Gordon, L., Graf, S., Haider, S., Hammond, M., Holland, R., Howe, K., Jenkinson, A., Johnson, N., Kahari, A., Keefe, D., Keenan, S., Kinsella, R., Kokocinski, F., Kulesha, E., Lawson, D., Longden, I., Megy, K., Meidl, P., Overduin, B., Parker, A., Pritchard, B., Rios, D., Schuster, M., Slater, G., Smedley, D., Spooner, W., Spudich, G., Trevanion, S., Vilella, A., Vogel, J., White, S., Wilder, S., Zadissa, A., Birney, E., Cunningham, F., Curwen, V., Durbin, R., Fernandez-Suarez, X.M., Herrero, J., Kasprzyk, A., Proctor, G., Smith, J., Searle, S., Flicek, P.: Ensembl 2009. Nucleic Acids Res 37, D690-697 (2009)
32. Kersey, P.J., Lawson, D., Birney, E., Derwent, P.S., Haimel, M., Herrero, J., Keenan, S., Kerhornou, A., Koscielny, G., Kahari, A., Kinsella, R.J., Kulesha, E., Maheswari, U., Megy, K., Nuhn, M., Proctor, G., Staines, D., Valentin, F., Vilella, A.J., Yates, A.: Ensembl Genomes: extending Ensembl across the taxonomic space. Nucleic Acids Res 38, D563-569 (2010)
33. Delpire, E., Gagnon, K.B.: Genome-wide analysis of SPAK/OSR1 binding motifs. Physiol Genomics 28, 223-231 (2007)
Figures
Fig. SEQ Fig. \n 1. SLiMSearch input options pages. Users must first either select a predefined human protein dataset, or enter a list of up to 100 UniProt IDs for a custom dataset. Clicking submit will then progress to Step 2, in which users enter a list of motifs for searching and set any masking options.
Fig. SEQ Fig. \n 2. SLiMSearch results pages. The main results page consists of a table of motif occurrences for each motif (top panel) along with statistics for each occurrence including conservation (RLC) and disorder (IUPred). All fields can be sorted by clicking column headings. Clicking sequence names will open the corresponding UniProt entry, while clicking View generates a visual representation of the motif. Clicking on different motifs in the smaller table on the left switches the motif being viewed. A summary table can also be viewed (bottom panel), which provides summary statistics for each motif. These statistics include SLiMChance assessments of over- or under-representation versus random expectation. Explanations of each field can be found in the SLiMSearch manual, which is available from the website. All the raw results files can also be accessed via the Raw Data link.
Fig. 3. Visualization of LIG_HOMEOBOX in HXA5 containing a multiple alignment of the orthologs of HXA5, drawn using Clustal coloring scheme, surrounded by relevant annotation. The bottom section contains a graph of relative conservation (in red) and IUPred disorder (in blue), with regions below the disorder threshold of 0.2 shaded (in brown). Above this section UniProt features are plotted, for example, in the case of HXA5 the right most region contains a DNA binding Homeodomain. Above the alignment, the motif row specifies regions containing a known functional motif (in white) and the RE row species regions matching the regular expression of a known motif (in green).
Table SEQ Table \n 1. Regular expression elements recognized by SLiMSearch.
ElementDescriptionASingle fixed amino acid.[AB]Ambiguity, A or B. Any number of options may be given, e.g. [ABC] = A or B or C.At least m of a stretch of n residues must match R, where R is one of the above regular expression elements (single or ambiguity). Exactly m of a stretch of n residues must match R and the rest must match B, where R and B are each one of the above regular expression elements (single or ambiguity). E.g. match will match [DE]F, or F[DE].[^A]Not A.X or .Wildcard positions (any amino acid). .{m,n}At least m and up to n wildcards. R{n}n repetitions of R, where R is any of the above regular expression elements. ^Beginning of sequence$End of sequence(R|S)Match R or S, which are both themselves recognizable regular expressions. Unfortunately, these motifs are not currently supported by the SLiMChance statistics and, as such, any motifs in this format with be first split into variants, e.g. (R|S)PP would be split into RPP and SPP and each searched separately.
Supplementary Table 1. Occurrences of the LIG_HOMEOBOX ELM in the human proteome, identified by SLiMSearch without masking. Occurrences are ordered according to whether they were masked and by conservation scores.
Sequence IDDescriptionPosMaskELMRLCHXA5_HUMANHomeobox protein Hox-A51771.56HXC5_HUMANHomeobox protein Hox-C51411.496HXB2_HUMANHomeobox protein Hox-B295Y1.494HXC6_HUMANHomeobox protein Hox-C61231.49HXB5_HUMANHomeobox protein Hox-B5177Y1.458HXB3_HUMANHomeobox protein Hox-B3130Y1.396HXA6_HUMANHomeobox protein Hox-A61371.351HXA1_HUMANHomeobox protein Hox-A1205Y1.155HXC4_HUMANHomeobox protein Hox-C41361.152HXD3_HUMANHomeobox protein Hox-D31611.096HXD4_HUMANHomeobox protein Hox-D4134Y1.079HXA2_HUMANHomeobox protein Hox-A2951.078HXB6_HUMANHomeobox protein Hox-B6128Y1.019PDX1_HUMANPancreas/duodenum homeobox protein 1119Y0.936HXB7_HUMANHomeobox protein Hox-B7127Y0.894HXB1_HUMANHomeobox protein Hox-B1180Y0.824HXB4_HUMANHomeobox protein Hox-B4142Y0.807HXD1_HUMANHomeobox protein Hox-D12050.747TLX1_HUMANT-cell leukemia homeobox protein 1174Y0.711HXA7_HUMANHomeobox protein Hox-A71200.668CDX1_HUMANHomeobox protein CDX-1133Y0.598HXB8_HUMANHomeobox protein Hox-B8135Y0.562HXA3_HUMANHomeobox protein Hox-A31560.537HXA4_HUMANHomeobox protein Hox-A4195Y1.145HXD8_HUMANHomeobox protein Hox-D8186Y0.978MTMRD_HUMANMyotubularin-related protein 131594Y0.975CBPO_HUMANCarboxypeptidase O57Y0.902HXC8_HUMANHomeobox protein Hox-C8139YY0.883GRP3_HUMANRas guanyl-releasing protein 3126Y0.864EXC6B_HUMANExocyst complex component 6B627Y0.816SAST_HUMANS-acyl fatty acid synthase thioesterase, medium chain32Y0.652UBXN8_HUMANUBX domain-containing protein 8218Y0.539MLTK_HUMANMitogen-activated protein kinase kinase kinase MLT170Y0.469PKDRE_HUMANPKD and REJ homolog505Y0.389ZSCA1_HUMANZinc finger and SCAN domain-containing protein 1389Y0.289FOXO3_HUMANForkhead box protein O3184Y0.228RBP17_HUMANRan-binding protein 17741Y0.203XPP3_HUMANProbable Xaa-Pro aminopeptidase 3204Y0.197FOXO4_HUMANForkhead box protein O4128Y0.191UBP24_HUMANUbiquitin carboxyl-terminal hydrolase 241906Y0.183FOXO6_HUMANForkhead box protein O6115Y0.182D42E2_HUMANPutative short chain dehydrogenase/reductase family 42E member 2286Y0CNOT4_HUMANCCR4-NOT transcription complex subunit 4498Y0CLIP3_HUMANCAP-Gly domain-containing linker protein 3537Y0FOXO1_HUMANForkhead box protein O1187Y-0.027HHAT_HUMANProtein-cysteine N-palmitoyltransferase HHAT201Y-0.141FHAD1_HUMANForkhead-associated domain-containing protein 1103Y-0.146TLX2_HUMANT-cell leukemia homeobox protein 2118YY-0.505SRBP2_HUMANSterol regulatory element-binding protein 2532Y-0.644SFXN1_HUMANSideroflexin-1260Y-0.67TLX3_HUMANT-cell leukemia homeobox protein 3125YY-0.764FA13A_HUMANProtein FAM13A179Y-0.927GDE5_HUMANPutative glycerophosphodiester phosphodiesterase 5612Y-0.932
x y 9 : I
J
K
o
p
봧돆돆w hw4 hY 5hw4 h>Q hw4 h_
0J. j hw4 h_
Uhw4 h_
CJ EH aJ hw4 hY CJ EH aJ hw4 hN hw4 h_
CJ EH aJ hw4 h# hw4 hFk hw4 h_
H*hw4 hY CJ EH aJ hw4 h_
hw4 hY hw4 h<