I have always been interested in logic, and the physicist E.T. Jaynes regarded probability theory as an extension of logic. Indeed his last book, published posthumously in 2003, was entitled Probability Theory: The Logic of Science. There is a Wikipedia article on him here, and Washington University in St Louis keeps up a splendid website devoted to his work, entitled Probability Theory as Extended Logic. A number of Jaynes' original papers may be consulted there, including the fascinating How does the Brain do Plausible Reasoning?.
Judging the likelihood of events is often an important problem in everyday life, and Laplace's "rule of succession" is meant to be of help in actually attaching numbers to probabilities. Laplace himself applied the rule to calculating the probability that the sun would rise again the next day. He may well have been joking by doing this, but it had the unfortunate result of bringing the rule into disrepute. Anyway, for this kind of example the rule says that if something has happened without fail n times in a row, then the probability of it happening once more is (n + 1)/(n + 2). Clearly the larger n becomes, the greater the probability of a further occurrence, which seems plausible enough. In the case of sunrises, Laplace supposed that there has been a sunrise every day since the creation of the world, an event which took place in 4004 BCE, according to Bishop James Ussher. So if we roughly calculate that 6000 years of 365 days each have passed since then, n is about 2,200,000, making the probability of a sunrise tomorrow somewhere around 2200001/2200002, as close to 1, or certainty, as you might wish!
Clearly what's wrong with that example is that we have far more information about sunrises than the mere fact that they have been happening without fail for a long time. We estimate that a sunrise is pretty certain again tomorrow because of our knowledge of the rotation of the earth and the maintenance of its orbit around the sun - neither are likely to be disrupted so suddenly. A better example concerns the probability of a child's survival, given that it has lived a certain number of years. Before the advent of modern medicine there would have been little specific knowledge to apply. When n = 0, that is, before the child has survived one year, (n + 1)/(n + 2) = 1/2. That might not be a bad estimate: with no information to go on, we can only suppose that the child might live or die with equal probability. This is called a Principle of Indifference - if we have no way of distinguishing one hypothesis from another, we assume they are all equally likely. If the child survives one year, however, n becomes 1, and (n + 1)/(n + 2) = 2/3. The probability of surviving another year has risen somewhat. Again that seems reasonable: if the child was healthy enough to survive one year, maybe it will get through another too. In fact as the years go by the chances of surviving another year become greater and greater. The probability that a child who has lived five years will carry on for another one is 6/7.
That all seems very plausible. The trouble is there is no cut-off point - the chance that a 99 year old person will live another year is 100/101, very excellent odds. But of course we know better about aged human beings. We know that their chances of survival grow less rather than more as time goes by. The rule of succession just gives us something to go on in the absence of any other information, as in the case of the survival of a child a couple of centuries ago.
Let's consider a better case of lack of information. Suppose you are in an unfamiliar city, standing in a shop doorway. Will the next person to pass by be wearing a uniform or not? Here your ignorance may be pretty complete. Maybe the only uniformed people around are in the police: in that case, there might not be so many around, so it's not so likely that the next person will be uniformed. On the other hand, you may be standing near an army base, or a school which requires a uniform. The city is new to you, so who knows?
In the previous examples, of course, there was success or nothing - if the sun doesn't rise, or if someone doesn't survive, there's an end. But this time, we have people in uniform or not in uniform passing by with no final curtain. There is a more general version of Laplace's formula which applies here, namely one that states that if there have been s successes in n trials, and thus (n - s) failures, then the probability of one more success is (s + 1)/(n + 2). Obviously if there have been nothing but successes, then s = n and the formula reduces to the same one as before. In the latest example, suppose five uniformed people come past one after another. Then the probability of one more right after that would be 6/7. We feel pretty confident of another uniform, but in fact the next person is wearing plain clothes. So now six people have gone by, but only five have been in uniform. Given s = 5 and n = 6, then, the probability of another uniform right after is 6/8 or 3/4, a bit less than the 7/8 it would have been otherwise. So the rule makes sense: a procession of people in uniform makes the probability of seeing yet more of them go up, while an array of T-shirts and jeans makes it go down again.
But where does that particular rule come from? What justifies Laplace's rule of succession as opposed to any other? Googling the phrase brings up a vast number of sites, including of course those related to Jaynes. But every one that I have struggled with uses quite advanced integral calculus in justifying the rule. Of course when I use the term "advanced" here I mean "beyond my capabilities", and unfortunately my mathematical capabilities are rather limited.
But it is such a simple looking formula - surely there should be a much simpler notional explanation of it? Eventually I found one for myself, which convinces me at any rate. It can hardly be original, but I'm setting it down here anyway.
Think again about those passers by in uniform or not in uniform. Suppose that after a while, on the 22nd trial, 11 people in uniform have gone by, and also 11 in civilian clothes. There seems to be a trend towards equal numbers of each, so the probability of seeing a uniform on the next, the 23rd, trial, should be just 0.5, surely, the same as that of seeing civilian clothes. But at the 23rd trial either the number of uniforms seen will increase by one, or the number of not-uniforms will increase by one. How can we ensure equality, to get half and half in the probability estimate? A simple way, surely, is to add one to both of the scores up to now, to make twelve and twelve. Dividing by the sum, 24, will then give each possibility a probability of .5.
Let's go straight on to the general case. We are on the nth trial, there have been s successes so far, and (n-s) failures, and this time we suppose that s = (n-s), or n = 2s. As in the numerical example, we increase both of those scores by one to ensure equal probabilities of .5 at the (n+1)th trial. That means (s+1) and (n-s+1), both still clearly equal. They sum to (n+2), so the probability of another success is (s+1)/(n+2), and of another failure (n-s+1)/(n+2). Both fractions are equal to .5, as may readily be checked by replacing n by 2s.
So we have a simple formula which gives a reasonable result whenever the numbers of successes and failures have ended up equal. The formula agrees with the conclusion of the Principle of Indifference right at the start as well, for before anything has happened we are at the zeroth trial, and there have so far been no successes and no failures. Putting s = n = 0 into (s+1)/(n+2) and (n-s+1)/(n+2) gives 1/2 for each fraction.
We are encouraged to see what happens when the number of successes is not the same as the number of failures - when (s+1) ≠ (n-s+1). Let's start with the extreme case where there are never any failures as the trials continue. That means there is a success at the first trial, so then s = 1 and n = 1. Putting those numbers into the formulae makes the probability of success on the second trial (1+1)/(1+2) = 2/3 and that of failure (1-1+1)/(1+2) = 1/3. That seems reasonable too: after one success we will be a bit more inclined to expect another rather than a failure on the second trial. With no failures ever, then n trials will result in n successes, so s = n and (s+1)/(n+2) = (n+1)/(n+2). The probability of another success at the next trial approaches 1 while that of a failure gets nearer to 0.
More generally, if the number of successes increases while that of failures does not, then of course (s+1) will grow larger, while (n-s+1) will stay the same, for n increases together with s. Thus the probability of further successes increases, while that of failures goes down. We should note that the two probabilities do indeed sum to 1: (s+1)/(n+2) + (n-s+1)/(n+2) = (n+2)/(n+2). If the number of failures increases at the expense of the number of successes, (s+1) will not increase but (n-s+1) will increase with increasing n.
So we've got what we wanted: a simple formula derived from first principles which gives reasonable probability estimates for the occurrence of non-occurrence of events in circumstances where we have no information or hypotheses to guide us. What's more, it agrees with Laplace's Rule, and no calculus whatsoever was used in deriving it!