# The Predictive Power of Coins

## Mark Pottenger

An article about probability in the July/August 1995 Skeptical Inquirer used the phrase “You might do just as well to flip a coin!” in describing a diagnostic prediction that would be right about 50% of the time. This phrase was picked up and worked to death by an on-line pseudo-skeptic. Unfortunately, it is based on sloppy thinking.

Something that “everybody knows” is that flipping a coin will produce 50% heads and 50% tails. This is the theoretical behavior of an honest coin, with actual measurements expected to come closest to it with large samples. This makes comparing a 50% success rate to a coin toss seem reasonable to the casual reader.

However, since the issue being discussed is a prediction, to compare any other technique to a coin toss you must use the coin toss to make a prediction. Therefore, it is necessary to know how successfully tossing a coin can predict anything.

The success rate of predicting anything with a coin toss depends on the actual frequency of the predicted phenomenon in the total population. It works like this: assign heads to phenomenon present (positive), assign tails to phenomenon absent (negative), then flip the coin. A false positive is a head when the phenomenon is absent. A false negative is a tail when the phenomenon is present. (Whether true positives, false positives, true negatives or false negatives are most important depends on the nature of each study.) Below is a table based on two different success assumptions showing the predictive accuracy of coin tossing for phenomena present in various percentages of the population.

Best possible success (non-random) Mean success rate (random)

 Population Positives Negatives Positives Negatives Frequency True False True False True False True False 1% 2% 98% 100% 0% 1% 99% 99% 1% 2% 4% 96% 100% 0% 2% 98% 98% 2% 3% 6% 94% 100% 0% 3% 97% 97% 3% 4% 8% 92% 100% 0% 4% 96% 96% 4% 5% 10% 90% 100% 0% 5% 95% 95% 5% 6% 12% 88% 100% 0% 6% 94% 94% 6% 7% 14% 86% 100% 0% 7% 93% 93% 7% 8% 16% 84% 100% 0% 8% 92% 92% 8% 9% 18% 82% 100% 0% 9% 91% 91% 9% 10% 20% 80% 100% 0% 10% 90% 90% 10% 20% 40% 60% 100% 0% 20% 80% 80% 20% 25% 50% 50% 100% 0% 25% 75% 75% 25% 30% 60% 40% 100% 0% 30% 70% 70% 30% 40% 80% 20% 100% 0% 40% 60% 60% 40% 50% 100% 0% 100% 0% 50% 50% 50% 50% 60% 100% 0% 80% 20% 60% 40% 40% 60% 70% 100% 0% 60% 40% 70% 30% 30% 70% 75% 100% 0% 50% 50% 75% 25% 25% 75% 80% 100% 0% 40% 60% 80% 20% 20% 80% 90% 100% 0% 20% 80% 90% 10% 10% 90% 100% 100% 0% 0% 100% 100% 0% 0% 100%

Population Frequency is the frequency of the phenomenon to be predicted. True and false positives are percentages of tosses of heads where the phenomenon is present or absent. True and false negatives are percentages of tosses of tails where the phenomenon is absent or present. The sets of assumptions used to produce the two parts of the table are: “best” is the success rate possible if all possible instances of the phenomenon are matched to coin tosses that come up heads (limited by either population frequency or heads frequency), “mean” is a more normal (random number) success rate that you would expect if half of the instances of the phenomenon are matched to heads and half are matched to tails. These are the formulae used to produce the two sections of the table (with f(i) the frequency of the phenomenon for each line):

Best to 50% frequency: PRINT f(i); f(i) * 2; (50 - f(i)) * 2; 100; 0

Best above 50% frequency: PRINT f(i); 100; 0; (100 - f(i)) * 2; (f(i) - 50) * 2

The first f(i) shows the frequency in the population. Each times two (* 2) in the formulae converts that frequency from a percentage of the population to a percentage of heads or tails (which get 50% of the population each). 50 - f(i) gives the false positives for the best possible success assumption up to 50% (heads left over after we have used up all cases of phenomenon present). Up to 50% frequency in the “best” section, all “phenomenon present” cases were matched to tosses of heads, so all tosses of tails are true negatives and none are false negatives. Above 50% frequency in the “best” section, all heads are used up without using up all “phenomenon present” cases, so heads are 100% true positives for the whole range. Since we now have “phenomenon present” cases left over to match to tails (false negatives), 100 - f(i) gives true negatives and f(i) - 50 gives false negatives.

Mean: PRINT (f(i) / 2) * 2; (50 - f(i) / 2) * 2; (50 - f(i) / 2) * 2; (f(i) / 2) * 2

or, simplified: Mean: PRINT f(i); 100 - f(i); 100 - f(i); f(i)

f(i) / 2 converts the population frequency to a mean set of heads or tails that match phenomenon present (true positives or false negatives)—the division by two applies the assumption that half the cases are heads and half are tails. 50 - f(i) / 2 gives the false positives or true negatives for the mean success assumption—the tosses matched to cases of phenomenon absent.

As you can see from the table, the only time tossing a coin produces the 50/50 figure “everybody knows” is when the phenomenon being predicted occurs 50% of the time and you use the mean success rate assumption (which is the normal statistical assumption). For any phenomenon occurring less than 50% of the time, tossing a coin does not do as well.

This math applies whether you are speaking of a study of the whole population or a prediction for an individual member of the population. You cannot ignore population frequencies when you speak of an individual member of the population.

The Skeptical Inquirer article used an example of a diagnostic prediction correct 50% of the time about something present in 1% of the population. As you can see from the above table, flipping a coin usually would be right 1% of the time. Thus, the phrase “You might do just as well to flip a coin!” is very wrong. The on-line examples continued the same way.

To boil the whole thing down to its simplest form: tossing a coin should produce successful predictions at a rate equal to the frequency of occurrence of the phenomenon in the population. This is the “chance” expectancy that is the baseline for most statistical comparisons. I apologize for such length on such a basic point, but the pseudo-skeptics didn’t get the point in a briefer presentation.

This is a wonderful example of how careless thinking or wording and the innumeracy of much of the population can lead to very misleading conclusions.