MicrobiologyBytes: Maths & Computers for Biologists: Probability  Updated: January 28, 2007  Search 
Further information on this topic can be found in Chapter 9 of:
Maths from Scratch for Biologists
Numerical ability is an essential skill for everyone
studying the biological sciences but many students are frightened by
the
'perceived' difficulty of mathematics, and are nervous about applying
mathematical skills in their chosen field of study. Maths from Scratch
for Biologists
is a highly instructive, informal text that explains step by step how
and why you need to tackle maths within the biological sciences.
(Amazon.co.UK)
OK, lets get the important stuff out of the way first:
Prize:

Probability:

Odds:

Jackpot: Match 6 numbers / 49 
P = (6/49)x(5/48)x(4/47)x(3/46)x(2/45)x(1/44)

1 in 13,983,816

Match 5 numbers / 49 
P = (5/49)x(4/48)x(3/47)x(2/46)x(1/45)

1 in 55,492

Match 4 numbers / 49 
P = (4/49)x(3/48)x(2/47)x(1/46)

1 in 1,033

Win ANY prize: Match 3 numbers / 49 
P = (3/49)x(2/48)x(1/47)

1 in 57

With 2 draws each week, if you a buy a ticket for each draw, you will, on average, win the jackpot every 140 thousand years.
So, with that out of the way, why do we have to know about probabilities? Because:
Statistical methods depend upon probability theory.
Probability, P = number of observations / total number of observations or, to put it another way: P = number of specific outcomes / total number of possible outcomes

The simplest way to understand probabilities is through proportional frequency:
Example:
In a group of mice there are 200 white mice and 50 brown mice:

If we replace the first selection and make a second selection, then the probability of making a given selection is unaltered. Thus, in the above example the probability of picking a brown mouse is still 50/250 = 1/5 = 0.2
If we do not replace our first selection the probability when making the second selection will change:
Example:
Studying repeated samples (selections) from natural populations
is easier if we assume replacement occurs.
We can usually assume this if the population is large.
When the result of the first sample does not affect the probability of the result of subsequent samples, the samples are said to be independent (a requirement of many of statistical tests).
The number of possible combinations of events is given by the factorial product of the number of events (written as "n!")  the product of an integer and all the lower integers, e.g:
For 3 events (A, B, C), number of possible combinations = 3! = 3 * 2 * 1 = 6
1

2

3

4

5

6

ABC

ACB

BAC

BCA

CAB

CBA

Example: A population of 50 brown mice, 200 white mice, selections with replacement:
a) Probability of 3 brown mice in 3 selections = (50/250) * (50/250) * (50/250)
= (1/5) * (1/5) * (1/5) = 0.008
b) Probability of selecting, in order, brown, brown and then white = (50/250) * (50/250) * (200/250)
= (1/5) * (1/5) * (4/5) = 0.032
c) If, however, we are not interested in the order (i.e. brown, brown, white) but just the overall outcome (i.e. 2 brown, 1 white), the probability is different:
Possible outcome of 3 selections with replacement:
Selection outcome: 
Probability of selection: 
Probability of outcome: 

1 
2 
3 
1 
2 
3 
Sum: 
Total: 
B 
B 
B 
1/5 
1/5 
1/5 
(1/5) * (1/5) * (1/5) 
0.008 
B 
W 
B 
1/5 
4/5 
1/5 
(1/5) * (4/5) * (1/5) 
0.032 
B 
B 
W 
1/5 
1/5 
4/5 
(1/5) * (1/5) * (4/5) 
0.032 
B 
W 
W 
1/5 
4/5 
4/5 
(1/5) * (4/5) * (4/5) 
0.128 
W 
B 
B 
4/5 
1/5 
1/5 
(4/5) * (1/5) * (1/5) 
0.032 
W 
W 
B 
4/5 
4/5 
1/5 
(4/5) * (4/5) * (1/5) 
0.128 
W 
B 
W 
4/5 
1/5 
4/5 
(4/5) * (1/5) * (4/5) 
0.128 
W 
W 
W 
4/5 
4/5 
4/5 
(4/5) * (4/5) * (4/5) 
0.512 






TOTAL: 
1.0 
Thus, the sum of probabilities of a set of mutually exclusive, exhaustive outcomes is 1, but the probability of 2 brown mice and 1 white mouse, irrespective of the order of selection is:
Selection outcome: 
Probability of selection: 
Probability of outcome: 

1 
2 
3 
1 
2 
3 
Sum 
Total 
B 
W 
B 
1/5 
4/5 
1/5 
(1/5) * (4/5) * (1/5) 
0.032 
B 
B 
W 
1/5 
1/5 
4/5 
(1/5) * (1/5) * (4/5) 
0.032 
W 
B 
B 
4/5 
1/5 
1/5 
(4/5) * (1/5) * (1/5) 
0.032 






TOTAL: 
0.096 
Note the difference in outcome between an ordered selection (probability = 0.032) and selection irrespective of order (probability = 0.096) = the sum of all the possible ordered selections.
These examples illustrate the two rules of probability:
The binomial probability distribution describes what will happen when there are two possible outcomes of an event, e.g:
Such binary variables turn out to occur quite frequently in biology.
In its simplest form, the binomial expansion summarizes the possible outcomes for any number of samples when there are only two possible outcomes (e.g. brown and white mice).
For independent events, the binomial distribution is given by:
(P + Q)^{n}
where:
P is the probability of one of the possible events
Q is the probability of the second event ( = 1  P )
n is the number of trials in the series
For samples of 1 (n=1): (P + Q)^{1} = (P + Q)
For samples of 2 (n=2): (P + Q)^{2} = P^{2} + 2PQ + Q^{2}
For samples of 3 (n=3): (P + Q)^{3} = P^{3} + 3P^{2}Q + 3PQ^{2} + Q^{3}
etc.
Back to the mice! These expansions of the binomial equation describe all the possible outcomes from the experiment above:
If P = brown mice and Q = white mice, for 3 samples from the population ( n = 3):
These are all the possible outcomes.
In the population from which the samples were drawn:
and we can therefore calculate the distribution of outcomes and from the binomial equation and compare the observed and expected distributions using the c^{2} test.
In this example we can calculate the probability of 2 brown mice and 1 white mouse being selected as:
3P^{2}Q = 3(0.2)^{2}(0.8) = 0.096
(note that this is the same as in the table above)
This is OK when there are a small number of samples and a small number of outcomes but gets progressively more difficult as the sample size increases. For example, try calculating how many different ways there are to select 7 brown mice and 6 white mice in 13 selections!
To perform such calculations, we can use the following equation:
Number of outcomes =
where:
n = number of selections
r = number of one of the outcomes
(remember "!" = factorial)
Let's check this works:
For 2 brown mice and 1 white mouse:
( i.e. BBW, BWB, WBB).
So for 7 brown mice and 6 white mice:
If we know the probability of the outcome for a single selection (e.g. probability of brown or probability of white) we can calculate the total probability for the outcome using:
where: P is the total probability of the outcome (e.g. 2 brown mice and 1 white mouse) p is the probability of the event that occurs r times (1p) is the probability of the event that occurs nr times 
In our example of two brown mice and 1 white mouse:
which is as calculated above in the table or using the binomial equation 3PQ^{2}.
In practice, we can also look up the probability of an event
from a
table of binoimial probabilities.
Suppose that 1% of a population has a characteristic under study, e.g. an inherited
defect in the (mythical) stat gene which restricts the ability of carriers
to understand statistics).
There are no external signs that we can use to recognise carriers, so we must
select individuals from the population at random and test them.
If the sample size used is too small there is a risk of not finding any carriers,
if it is too large scarce testing resources will be wasted.
What sample size is required to give a good likelihood of sampling affected
individuals?
The binomial distribution can be used in a case such as this because the variable
is binary and mutually exclusive, i.e. each individual will or not will not
carry the defective gene.
If 1% of the population is affected then P = 0.01 (affected) and Q = 0.99 (not affected).
To find the probability of finding some (i.e. 1 or more) carriers, the easiest way to obtain the figure is to calculate is the probability of no cases (i.e. P(0) ) for a given sample size, e.g. 20. We can do this by making use of the binomial equation and setting the number of successes, r, to 0, and the number of trials, n, to 20. This will give us the probability of taking a sample of 20 individuals and finding no infected individuals:
^{}
P(0) =  20! 
* 0.01^{0} (10.01)^{200} 
0!*(200)! 
= 1 * 1 * 0.99^{20}
= 0.82
N.B: 1! = 0! = 1
A number raised to the power 0 is 1 and a number raised to the power 1 is
itself, e.g. 20^{0} = 1 and 20^{1} = 20.
Thus, if 1% of the population is affected there is a 82% chance that a sample
of 20 individuals will fail to find any carriers. Consequently
a sample size of 20 would appear to be too small to give a reasonable chance
of finding at least one carrier.
If n = 50, P(0) = 0.99^{50} = 0.61, i.e. a 39% chance of finding an
affected carrier.
If n = 100, P(0) = 0.99^{100} = 0.37, i.e. a 63% chance of finding an
affected carrier.
As the percentage of affected individuals drops the probability of missing
such infections in a sample of 20 individuals increases, e.g. if only 0.1% of
the population are carriers there is only a 2% chance of finding any in a sample
of 20 people, i.e. P(0) = 0.98.
This type of calculation can be useful to determine the minimum sample number needed to obtain at least 1 positive result from a sample for any binary variable, e.g. to find at least one affected carrier in a random sample. All that is required is the probability of the event, e.g. if 1 in 1000 of the population carry a particular genetic polymorphism, then P = 0.001.
Microsoft Excel has builtin binomial probability functions: Use the online Help to find out more about these! BINOMDIST(number_s,trials,probability_s,cumulative) where: BINOMDIST is used in problems with a fixed number of tests or trials, when the outcomes of any trial are only success or failure, when trials are independent, and when the probability of success is constant throughout the experiment. Example: Also: CRITBINOM(trials,probability_s,alpha) Returns the smallest value for which the cumulative binomial distribution is greater than or equal to a criterion value. Use this function for quality assurance applications, e.g. to determine the greatest number of defective parts that are allowed to come off an assembly line run without rejecting the entire lot. where: 
When working with larger numbers than 2 and 3, probability theory has some unexpected results. Many unexpected coincidences are merely the result of large populations, e.g:
Why do "coincidences" matter?
Because when you are trying to determine if an event is statistically significant or not, the "expected" answer can be very misleading  events which might seem very unlikely to occur by chance can do precisely that if enough cases are involved.
Odds ratios are widely used in medical literature because:
The odds are a way of representing probability.
Oh yes, I was going to tell you how to win the National Lottery jackpot:
Buy 14,000,000 tickets.
"The best way to get rich from probability theory is to find someone who knows less about it than you do"
© MicrobiologyBytes 2009.