Publications
Download:
File size:
3549 kb
Format:
application/pdf
Author:
Sandve, Geir Kjetil Ferkingstad (Norwegian University of Science and Technology, Department of Computer and Information Science)
Title:
Potentials and limitations of motif-based binding site prediction in DNA
Department:
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science
Publication type:
Doctoral thesis, comprehensive summary (Other scientific)
Language:
English
Publisher:
Fakultet for informasjonsteknologi, matematikk og elektroteknikk
Series:
Doktoravhandlinger ved NTNU, ISSN 1503-8181; 2008:239
Year of publ.:
2008
URI:
urn:nbn:no:ntnu:diva-2265
Permanent link:
http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-2265
ISBN:
978-82-471-1169-7
Abstract(en) :

As the full genomic DNA sequence is now available for several organisms, a major next challenge is determining the function of DNA elements. This task is often referred to as functional genomics. An important part of functional genomics is gene regulation, and particularly the binding of specific proteins called Transcription Factors (TFs) to DNA. This TF binding regulates the production of mRNA, and thereby eventually proteins, from genes. As experimental determination of TF binding sites in DNA is a very laborious process, there is great interest in computational prediction methods.

The basic idea behind computational binding site prediction is to use motifs (sequence patterns) to capture sequence similarity between separate binding sites for a given TF. Based on a set of known binding site examples, the sequence similarity can be exploited for prediction of additional binding sites for a given TF. As motifs representing TF binding sites should occur more frequently than expected by chance alone in co-regulated DNA sequences, computational methods can even be used to discover novel TF binding site motifs and associated binding sites using only un-annotated target DNA sequences as input.

The focus of this thesis is on the computational prediction of TF binding sites, and specifically on understanding the current limitations and potential for improvement of binding site prediction. Two of the papers in the thesis relate to the assessment of computational predictions. The data sets used in a recent benchmark of prediction methods is analyzed in relation to three commonly used motif models, showing some fundamental performance limitations that should be attributed either to the motif models or to the benchmark data sets themselves. A first broad benchmark of methods predicting higher-order organization of TF binding sites is also part of this thesis. The benchmark showed some differences in prediction accuracy between methods, and more generally that a moderate level of prediction accuracy can be expected in the considered scenario.

Two novel motif discovery methods are also presented in the thesis. Both of the methods consider the problem of predicting higher-order organization of binding sites, given motifs representing binding of individual TFs as input. One method takes a Bayesian probabilistic approach to binding site modeling, while the other method uses a discrete approach. Both methods use highly expressive models and show good quantitative performance in relation to existing methods. Each method also introduces some additional elements that may bring qualitative advantages. A third and final direction of research in this thesis concerns the extended process of motif discovery in DNA. Topics considered include how data is compiled before binding site prediction is performed, how prediction results can be interpreted in a multiple-testing scenario, and how prediction can be accelerated by the use of parallel hardware.

Public defence:
2008-09-12
Degree:
PhD in Information and Communications Technology
Available from:
2008-09-12
Created:
2008-09-12
Statistics:
242 hits
FILE INFORMATION
File size:
3549 kb
Mimetype:
application/pdf
Type:
fulltext
Statistics:
29 hits
© 2000-2009 |