Waiting for Regulatory Sequences to Appear

Rick Durrett and Deena Schmidt

Abstract. One possible explanation for the substantial organismal differences between humans and chimpanzees is that there have been changes in gene regulation. Given what is known about transcription factor binding sites, this motivates the following probability question: given a 1000 nucleotide region in our genome, how long does it take for a specified six to nine letter word to appear in that region in some individual? Stone and Wray (2001) computed 5950 years as the answer for six letter words. Here, we will show that for words of length 6, the average waiting time is 100,000 years while for words of length 8, the waiting time is roughly (5/16)exponential(375,000) + (11/16)exponential(650,000,000), where the numbers in parentheses give the means. In biological reality, the match to the target word does not have to be perfect for binding to occur. If we model this by saying that one mismatch is good enough, then almost all of the mass in the probability distribution shifts to the smaller mean.

Preprint (pdf file) of paper to appear in Annals of Applied Probability


Back to Durrett's home page