The Probability of a Run

Where an event has a probability p, what is the probability of a run of at least k consecutive occurrences in n trials? u(k,n)

This problem is to be found in DeMoivre's Doctrine of Chances (1718), however an exact solution has proven elusive to this very day. This page presents some new results developed in the newsgroup sci.math in a thread initiated by the editor on November 1, 2003.

It is possible to express successive values of u in the form of a recurrence relation

u(m + 1) = u(m) + (1 - u(m - n)) (1 - p) pn

where u(m) = Pr that a run of n or more has happened by the m'th trial.

The initial values are u(n) = pn and u(m - n) = 0 if m < 2n

Here is the logic:

What is the probability that an event shall occur at least n times in a row in m trials?

u(m) = Pr that a run of n or more has happened by the m'th trial

v(m + 1) = Pr that a run of n or more has been completed for the first time at the (m + 1)th trial

then in order for the run to be completed at the m + 1 trial either it has been completed by the m'th trial or it is completed for the first time at the m + 1 trial:

u(m + 1) = u(m) + v(m + 1)

In order that the (m + 1)th trial may complete the first run the following conditions must be satisfied:

1. There must not have been a run of n or more up to and including the (m - n)th trial.

Pr = 1 - u(m - n)

2. It must not happen at the (m - n + 1)th trial. Pr = 1 - p

3. It must happen at each trial from the (m - n + 2)th to the (m + 1)th. Pr = pn so that
u(m + 1) = u(m) + (1 - u(m - n)) (1 - p) pn

This expression resists solution as a difference equation because of its high degree, however it is possible, beginning with u(n) = pn to calculate successive values of u until k is reached. This amounts to solving the difference equation numerically.

The following VBA code (Visual Basic for Applications) was submitted to the newsgroup by Ian Smith:

Public Function Probnmp(n As Long, m As Long, p As Double)
Dim C() As Double, result As Double, pn As Double, q As Double
Dim i As Long, j As Long, next_j As Long
If n <= 0 Or p <= 0 Or p >= 1 Then
   Probnmp = [#VALUE!]
ElseIf n > m Then
   Probnmp = 0
Else
   ReDim C(n + 1)
   For i = 1 To n + 1
      C(i) = 0
   Next i
   pn = p ^ n
   q = 1 - p
   j = n
   For i = n To m
      If j > n Then
         next_j = 1
      Else
         next_j = j + 1
      End If
      result = pn * (1 + q * ((i - n) - C(next_j)))
      C(next_j) = C(j) + result
      j = next_j
   Next i
   Probnmp = result
End If
End Function

Who explains his logic as follows:

As far as I can see, if q is the probabilty of a loss and P(n,m) is the probability of at least 1 run of at least length n in m trials then

P(n,m) = 0 if m < n,
       = q^n.(1 + (1-q).( (m-n) - sum(i=n..m-n-1, P(n,i) ) ).

The logic is simple. Either the first n trials are losses or the first trial is a success and then we have n losses or we have no runs of n or more in i trials followed by a success and then n losses (for i = 1.. m-n-1). I think this enumerates all the possibile combinations without duplication.

Your formula

u(m + 1) = u(m) + (1 - u(m - n)) (1 - p) p^n

is deducible from mine, for m > n. Just calculate P(n,m+1)-P(n,m) in my notation, swap P(n,m) to u(m) and q to p and you'll get the same.

Here is some simple but inefficient code for a language that supports recursive function calls, based on the logic previously given:

function run(m, n, q);
/* Returns the probability of a run of at least n      */
/* consecutive failures in m independent trials where */
/* the probability of failure at any one trial is q    */
integer m, n; float q;
  if m < n then return 0;
    else if m = n then return q^n;
      else return run(m - 1, n, q) + (1 - run(m - n - 1, n, q)) * (1 -
q) * q^n;
  end if;
end run;

Problem

Hawkeye hits the bullseye 90% of the time. What is the probability that he will score at least five successive hits in ten shots?

Here the calculation is simplified by the fact that the number of trials is not greater than twice the length of the run. Either he can score five successive hits at the outset or else he can score a miss followed by five hits. This can happen starting with either shot one, two, three, or four, so that the probability is

(1 + 4(0.1))(0.9)5 = 0.8267

One notes also that in implementing a program to solve the above difference equation one can use

u(2n) = (1 + (n - 1) (1 - p)) pn

as a starting value.

Bibliography

Burnside, William
The Theory of Probability (1928), Chapter 3
Dover Publications, 1959

Uspensky, J. V.
Introduction to Mathematical Probability, Chapter V, Section 3
McGraw Hill, 1937

Villarino, M. B., The Probability of a Run (2006)