| MicrobiologyBytes: Maths & Computers for Biologists: Inferential Statistics | Updated: April 26, 2007 | Search |
Further information on this topic can be found in Chapter 10 of:
Maths
from Scratch for Biologists
Numerical ability is an essential skill for everyone
studying the biological sciences but many students are frightened by
the
'perceived' difficulty of mathematics, and are nervous about applying
mathematical skills in their chosen field of study. Maths from Scratch
for Biologists
is a highly instructive, informal text that explains step by step how
and why you need to tackle maths within the biological sciences.
(Amazon.co.UK)
infer: to conclude from evidence
Statistical Inference:allows the formation of conclusions about almost any parameter from a sample taken from a larger population (i.e. are conclusions based on the sample valid for the whole population?) or allows the formation of conclusions about the difference between populations with regard to any given parameter.
|
There are two methods of reaching a statistical inference:
In estimation, a sample from a population is studied and an inference is made about the population based on the sample.
The key to estimation is the probability with which particular values will occur during sampling - this allows the inference about the population to be made.
The values which occur are inevitably based on the sampling distribution of the population. The key to making an accurate inference about a population is random sampling, where:
each possible sample of the same size has the same probability of being selected from the population.
In real life, it is often difficult to take truly random samples from a population. Shortcuts are frequently taken, e.g. every third item on a list, or simply the first n results to be obtained. A better method is to use a table of random numbers or the Microsoft Excel RAND function.
Estimation is a relatively crude method of making population inferences. A much better method and the one which is normally used is:
To answer a statistical question, the question is translated into a hypothesis - a statement which can be subjected to test. Depending on the result of the test, the hypothesis is accepted or rejected.
The hypothesis tested is known as the null hypothesis (H0).
This must be a true/false statement.
For every null hypothesis, there is an alternative hypothesis (HA).
Constructing and testing hypotheses is an important skill, but the best way to construct a hypothesis is not necessarily obvious:
The outcome of a hypothesis testing is "reject H0" or "do not reject H0". If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, only that there is insufficient evidence against H0 in favour of HA. Rejecting the null hypothesis suggests that the alternative hypothesis may be true.
In order to decide whether to accept or reject the null hypothesis, the level
of significance (a) of the result is used (a =
0.05 or a = 0.01).
This allows us to state whether or not there is a "significant
difference" (technical term!) between populations, i.e. whether
any difference between populations is a matter of chance, e.g. due to experimental
error, or so small as to be unimportant.
Procedure for hypothesis testing:
Note that:
z scores define the position of a score in relation to the mean using the standard deviation as a unit of measurement.
Standard scores are therefore useful for comparing datapoints in different distributions.
z = (score - mean) / standard deviation
The z-score is the number of standard deviations that the sample mean departs from the population mean.
Since this technique normalizes distributions, z-scores can be used to compare data from different sets, e.g. a student's performance on two different exams (e.g. did Joe Blogg's performance on module 1 and module 2 improve or decline?):
Note that the z-score is only informative when it refers to a normal distribution - calculating a z-score from a skewed dataset may not produce a meaningful number. Comparing z-scores for different distributions is also meaningless unless:
|
There are two ways to obtain z-scores (standard scores) using MSExcel:
Warning: The MSExcel ZTEST function and Analysis Toolpak Z-test option do not calculate standard scores! ZTEST gives the 2-tailed P-value (probability) for a normal distribution. This function can be used to test if a particular observation is drawn from a certain population. The ToolPak Z-test option is a modified version of the t-test. |
Biological systems are complex, with many different interacting factors.
To compensate for this, the most common experimental design in biology involves comparing experimental results with those obtained under control conditions.
To interpret this type of experiment, we must be able to make objective decisions about the nature of any differences between the experimental and control results - is there a statistically significant difference or are the results due to experimental error or random chance (sampling error)?
A frequently used test of statistical significance is the:
The
Student's t-test (or simply t-test) was developed
by William
Gosset - "Student" in 1908. Gossett was a chemist at the Guiness
brewery in Dublin and developed the t-test to ensure that each batch of Guiness
was as similar as possible to every other batch! The t-test is used to compare
two groups and comes in at least 3 flavours:
Assumptions:The t-test is a parametric test which assumes that the data analyzed:
If you use the t-test under other circumstances, the results will be meaningless! In other situations, non-parametric tests should be
used to compare the groups, e.g. the Wilcoxon
signed rank test for paired data and the Wilcoxon
rank sum test for unpaired data (not covered on this course). |

Paired t-test: The paired t-test is used to investigate the relationship between two groups where there is a meaningful one-to-one correspondence between the data points in one group and those in the other, e.g. a variable measured at the same time points under experimental and control conditions. It is NOT sufficient that the two groups simply have the same number of datapoints!
The advantage of the paired t-test is that the formula procedure involved is fairly simple:
![]()
where:
dav is the mean difference, i.e. the sum of the differences of all the datapoints (set 1 point 1 - set 2 point 2, ...) divided by the number of pairs
SD is the standard deviation of the differences between all the pairs
N is the number of pairs.N.B. The sign of t (+/-) does not matter, assume that t is positive.
df = n-1 (number of pairs - 1)
Groan! Why do we need something as complicated as "degrees of freedom" ?
|
|
|
NULL HYPOTHESIS: |
|
|
|
|
True: |
False: |
|
Decision: |
Reject |
Type I error |
Correct |
|
Accept |
Correct |
Type II error |
|
Falsely rejecting a true Ho is called a type I error (finding an innocent person guilty). The probability of committing a type I error is always equal to a.
Failure to reject a false Ho is called a type II error (finding a guilty person innocent). The probability of committing a type II error depends on the probability of retaining a false H0.
The "power" of a statistical test refers to the probability of claiming that there is a significant difference when this is true.
As scientists are cautious, it is considered "worse" to make a type I error than a type II error - we thus reduce the possibility of making a type I error by having a stringent rejection limit, i.e. 5%. However, as we reduce the possibility of making one type of error, we increase the possibility of making the other type.
Whether you use a one- or two-tailed test depends on your testing hypothesis:
Note that HA states 'there is a difference .... ', it does not state why there is a difference or whether the difference between the two groups if greater or less than. If HA had specified the nature of the difference, this would have been a one-tailed hypothesis. However, since HA does not specify the nature of the difference, hence we can either accept a reduction or an increase. This is therefore a two-tailed hypothesis. For a variety of reasons two-tailed hypotheses are safer than one-tailed. Statistical tables are sometimes tabulated only for one-tailed hypotheses. To convert them to two-tailed, double a.
Unpaired t-test: The unpaired t-test does not require that the two groups be paired in any way, or even of equal sizes. A typical example might be comparing a variable in two experimental groups of patients, one treated with drug A and one treated with drug B. Such situations are common in medicine where an accepted treatment already exists and it would not be ethical to withhold this from a control group. Here, we wish to know if the differences between the groups are "real" (statistically significant) or could have arisen by chance. The calculations involved in an unpaired t-test are slightly more complicated than for the paired test. Note that the unpaired t-test is equivalent to one-way ANOVA used to test for a difference in means between two groups. To perform an unpaired t-test:

where:
= means of groups A and B, respectively, and
![]()
df = (nA + nB) - 2
where nA/B = the number of values in the two groups being compared. Remember that the sign of t (+/-) does not matter - assume that t is positive.
Consider the data from the following experiment. A total of 12 readings were taken, 6 under control and 6 under experimental conditions:
|
|
Experimental: |
Control: |
|
|
11.2 |
10.3 |
|
|
13.1 |
12.6 |
|
|
9.3 |
8.4 |
|
|
10.2 |
9.3 |
|
|
9.6 |
7.8 |
|
|
9.8 |
8.9 |
|
Mean: |
10.53 |
9.55 |
|
VARP |
1.68 |
2.46 |
|
SD |
1.42 |
1.72 |
|
|
0.58 |
0.70 |
Yes,
approximately:
![]()
df = (nA-1)+(NB-1) = 10
|
a:
One Tail:
|
0.250
|
0.100
|
0.050
|
0.025
|
0.010
|
0.005
|
|
a:
Two Tails:
|
0.500
|
0.200
|
0.100
|
0.050
|
0.020
|
0.010
|
| df: | ||||||
| 10 |
0.700
|
1.372
|
1.812
|
2.228
|
2.764
|
3.169
|
|
TTEST(array1,array2,tails,type) where: NB: The TTEST function does not return a value for the t statistic, but instead calculates a probability value (ranging from 0-1) that the null hypothesis (i.e. that there is no significant difference between the means of the two groups) is true. Possibly a better way of performing a t-test is to use the Excel Analysis ToolPak:
|
Student's t-test can only be used for comparison of two groups. Although it is possible to perform many pairwise comparisons to analyze all the possible combinations involving more than two groups, this is undesirable because: a) it is tedious, and b) it increases the possibility of type I errors. However, ANOVA can compare two or more groups:
Assumptions of ANOVA:ANOVA is a parametric test which assumes that the data analyzed:
|
Sir Ronald Fisher (1890-1962) developed the technique of ANOVA, which comes in various flavours:
The F (Fisher) ratio compares the variance within sample groups ("inherent variance") with the variance between groups ("treatment effect") and is the basis for ANOVA:
F = variance between groups / variance within sample groups
|
FTEST(array1,array2) Use the online Help to find out more! The Excel FTEST function returns the result of an F-test - the probability that the variances in array1 and array2 are not significantly different. Use this function to determine whether two samples have different variances, e.g. parameters for bird populations from arable and mixed farms to show whether they have different levels of diversity. F-test is also one of the tools available on the Excel Analysis ToolPak. |
You can perform various types of ANOVA analysis using the Excel Analysis ToolPak:
|
"Pain Score" for 3 Analgesics:
|
|||
| Aspirin: | Paracetemol (Acetaminophen): | Ibuprophen: | Control (no drug): |
|
5
|
4
|
4
|
5
|
|
4
|
4
|
4
|
5
|
|
5
|
3
|
5
|
5
|
|
3
|
4
|
3
|
4
|
|
5
|
5
|
3
|
5
|
|
5
|
3
|
5
|
5
|
|
4
|
4
|
3
|
5
|
|
Apple codling moth (Cydia pomonella)
caught in pheromone traps:
|
||
|
Bait 1:
|
Bait 2:
|
|
Orchard 1:
|
19
|
20
|
|
22
|
22
|
|
|
19
|
18
|
|
|
18
|
19
|
|
|
20
|
19
|
|
|
21
|
20
|
|
Orchard 2:
|
22
|
21
|
|
19
|
19
|
|
|
19
|
18
|
|
|
18
|
18
|
|
|
20
|
20
|
|
|
21
|
22
|
|
© MicrobiologyBytes 2007.