Power laws for family sizes in a duplication model

Rick Durrett and Jason Schweinsberg

Abstract. Qian, Luscombe, and Gerstein (2001) introduced a model of the diversification of protein folds in a genome that we may formulate as follows. Consider a multitype Yule process starting with one individual in which there are no deaths and each individual gives birth to a new individual at rate one. When a new individual is born, it has the same type as its parent with probability 1 - r and is a new type, different from all previously observed types, with probability r. We refer to individuals with the same type as families and provide an approximation to the joint distribution of family sizes when the population size reaches N. We also show that if 1 << S << N1-r, then the number of families of size at least S is approximately C N S-1/(1-r), while if N1-r the distribution decays more rapidly than any power.

Paper as PDF file

log size verus log of number > size for one simulation of model with C. elegans parameters, r = 0.018, N = 20,0000

Average of 10,000 simulations of model with C. elegans parameters, showing transition from power law to faster decay. Straight line is prediction of our result

Size of first family in 100,000 simulations of syztem with r = 0.1, N = 10,0000. If you recognize this distribution let us know. It has the form YZr-1 where Y, Z are exponential mean 1 (and not independent).


Back to Durrett's home page