MicrobiologyBytes: Maths & Computers for Biologists: Inferential Statistics Updated: February 6, 2009 Search

Inferential Statistics
- Comparing Groups I

Further information on this topic can be found in Chapter 10 of:

CoverMaths from Scratch for Biologists

Numerical ability is an essential skill for everyone studying the biological sciences but many students are frightened by the 'perceived' difficulty of mathematics, and are nervous about applying mathematical skills in their chosen field of study. Maths from Scratch for Biologists is a highly instructive, informal text that explains step by step how and why you need to tackle maths within the biological sciences. (Amazon.co.UK)

infer:     to conclude from evidence

Statistical Inference:

allows the formation of conclusions about almost any parameter from a sample taken from a larger population

(i.e. are conclusions based on the sample valid for the whole population?)


allows the formation of conclusions about the difference between populations with regard to any given parameter.



ElvisBUT:   (San Francisco Chronicle, October 27th, 1993)

  • When Elvis Presley died in 1977, there were an estimated 37 Elvis impersonators in the world.
  • By 1993, there were 48,000 Elvis impersonators, an exponential increase.
  • Extrapolating from this, by 2010 there will be 2.5 billion Elvis impersonators.
  • The population of the world will be 7.5 billion by 2010.
  • Every 3rd person will be an Elvis impersonator by 2010.

    or will they?

There are two methods of reaching a statistical inference:

a) Estimation

In estimation, a sample from a population is studied and an inference is made about the population based on the sample.

The key to estimation is the probability with which particular values will occur during sampling - this allows the inference about the population to be made.

The values which occur are inevitably based on the sampling distribution of the population. The key to making an accurate inference about a population is random sampling, where:

each possible sample of the same size has the same probability of being selected from the population.

In real life, it is often difficult to take truly random samples from a population. Shortcuts are frequently taken, e.g. every third item on a list, or simply the first n results to be obtained. A better method is to use a table of random numbers or the Microsoft Excel RAND function.

Estimation is a relatively crude method of making population inferences. A much better method and the one which is normally used is:

b) Hypothesis Testing

To answer a statistical question, the question is translated into a hypothesis - a statement which can be subjected to test. Depending on the result of the test, the hypothesis is accepted or rejected.

The hypothesis tested is known as the null hypothesis (H0). This must be a true/false statement.
For every null hypothesis, there is an alternative hypothesis (HA).

Constructing and testing hypotheses is an important skill, but the best way to construct a hypothesis is not necessarily obvious:

The outcome of a hypothesis testing is "reject H0" or "do not reject H0". If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, only that there is insufficient evidence against H0 in favour of HA. Rejecting the null hypothesis suggests that the alternative hypothesis may be true.

In order to decide whether to accept or reject the null hypothesis, the level of significance (a) of the result is used (a = 0.05 or a = 0.01).
This allows us to state whether or not there is a "significant difference" (technical term!) between populations, i.e. whether any difference between populations is a matter of chance, e.g. due to experimental error, or so small as to be unimportant.

Procedure for hypothesis testing:

  1. Define H0 and HA, based on the guidelines given above.
  2. Choose a value for a. Note that this should be done before performing the test, not when looking at the result!
  3. Calculate the value of the test statistic.
  4. Compare the calculated value with a table of the critical values of the test statistic.
  5. If the calculated value of the test statistic is LESS THAN the critical value from the table, accept the null hypothesis (H0).
    If the absolute (calculated) value of the test statistic is GREATER THAN or EQUAL to the critical value from the table, reject the null hypothesis (H0) and accept the alternative hypothesis (HA).

Note that:

Standard Scores (z Scores)

z scores define the position of a score in relation to the mean using the standard deviation as a unit of measurement.

Standard scores are therefore useful for comparing datapoints in different distributions.


z = (score - mean) / standard deviation


The z-score is the number of standard deviations that the sample mean departs from the population mean.

Since this technique normalizes distributions, z-scores can be used to compare data from different sets, e.g. a student's performance on two different exams (e.g. did Joe Blogg's performance on module 1 and module 2 improve or decline?):

Note that the z-score is only informative when it refers to a normal distribution - calculating a z-score from a skewed dataset may not produce a meaningful number. Comparing z-scores for different distributions is also meaningless unless:



There are two ways to obtain z-scores (standard scores) using MSExcel:

  1. Perform the calculations using the spreadsheet: z = (score - mean) / standard deviation
  2. Use the STANDARDIZE function which performs this calculation for you, but assumes the mean and standard deviation are known

Warning: The MSExcel ZTEST function and Analysis Toolpak Z-test option do not calculate standard scores! ZTEST gives the 2-tailed P-value (probability) for a normal distribution. This function can be used to test if a particular observation is drawn from a certain population. The ToolPak Z-test option is a modified version of the t-test.


Comparing Two Populations

Biological systems are complex, with many different interacting factors.

To compensate for this, the most common experimental design in biology involves comparing experimental results with those obtained under control conditions.

To interpret this type of experiment, we must be able to make objective decisions about the nature of any differences between the experimental and control results - is there a statistically significant difference or are the results due to experimental error or random chance (sampling error)?

A frequently used test of statistical significance is the:

Student's t-test (t-test)

William Sealey GossetThe Student's t-test (or simply t-test) was developed by William Gosset - "Student" in 1908. Gossett was a chemist at the Guiness brewery in Dublin and developed the t-test to ensure that each batch of Guiness was as similar as possible to every other batch! The t-test is used to compare two groups and comes in at least 3 flavours:


The t-test is a parametric test which assumes that the data analyzed:

  1. Be continuous, interval data comprising a whole population or sampled randomly from a population.
  2. Have a normal distribution.
  3. If n<30, the variances in the two groups should be similar (t-tests can be used to compare groups with different variance if n>30).
  4. Sample size should not differ hugely between the groups.

If you use the t-test under other circumstances, the results will be meaningless!

In other situations, non-parametric tests should be used to compare the groups, e.g. the Wilcoxon signed rank test for paired data and the Wilcoxon rank sum test for unpaired data (not covered on this course).
In order to to comparing three or more groups, other tests must be used, e.g. ANOVA (below).

Student's t distribution

Paired t-test: The paired t-test is used to investigate the relationship between two groups where there is a meaningful one-to-one correspondence between the data points in one group and those in the other, e.g. a variable measured at the same time points under experimental and control conditions. It is NOT sufficient that the two groups simply have the same number of datapoints!

The advantage of the paired t-test is that the formula procedure involved is fairly simple:

  1. Start with the hypothesis (H0) that the mean of each group is equal (HA: the means are not equal). How do we test this? By considering the variance (standard deviation) of each group.
  2. Set a value for a (significance level, e.g. 0.05).
  3. Calculate the difference for each pair (i.e. the variable measured at the same time point under experimental and controlled conditions).
  4. Plot a histogram of the differences between data pairs to confirm that they are normally distributed - if not, STOP!
  5. Calculate the mean of all the differences between pairs (dav) and the standard deviation of the differences (SD)
  6. The value of t can then be calculated from the following formula:


dav is the mean difference, i.e. the sum of the differences of all the datapoints (set 1 point 1 - set 2 point 2, ...) divided by the number of pairs
SD is the standard deviation of the differences between all the pairs
N is the number of pairs.

N.B. The sign of t (+/-) does not matter, assume that t is positive.

  1. The significance value attached to the resulting value of t can be looked up in a table of the t distribution (or obtained from appropriate software). To do this, you need to know the "degrees of freedom" (df) for the test. The degrees of freedom take account of the number of independent observations used in the calculation of the test statistic and are needed to find the true value in a probability table. For a paired t-test:

df = n-1   (number of pairs - 1)

Groan! Why do we need something as complicated as "degrees of freedom" ?

  1. To look up t , you also need to determine whether you are performing a one-tailed or two-tailed test. In any statistical test we can never be 100% sure that we have to reject (or accept) the null hypothesis. There is, therefore, the possibility of making an error:










Type I error




Type II error


Falsely rejecting a true Ho is called a type I error (finding an innocent person guilty). The probability of committing a type I error is always equal to a.
Failure to reject a false Ho is called a type II error (finding a guilty person innocent). The probability of committing a type II error depends on the probability of retaining a false H0.
The "power" of a statistical test refers to the probability of claiming that there is a significant difference when this is true.
As scientists are cautious, it is considered "worse" to make a type I error than a type II error - we thus reduce the possibility of making a type I error by having a stringent rejection limit, i.e. 5%. However, as we reduce the possibility of making one type of error, we increase the possibility of making the other type.
Whether you use a one- or two-tailed test depends on your testing hypothesis:

Critical values of t for Student's t distribution

  1. If the calculated value of t is greater than the critical value, H0 is rejected, i.e. there is evidence of a statistically significant difference between the groups.
    If the calculated value of t is less than the critical value, H0 is accepted, i.e. there is no evidence of a statistically significant difference between the two groups.


Unpaired t-test: The unpaired t-test does not require that the two groups be paired in any way, or even of equal sizes. A typical example might be comparing a variable in two experimental groups of patients, one treated with drug A and one treated with drug B. Such situations are common in medicine where an accepted treatment already exists and it would not be ethical to withhold this from a control group. Here, we wish to know if the differences between the groups are "real" (statistically significant) or could have arisen by chance. The calculations involved in an unpaired t-test are slightly more complicated than for the paired test. Note that the unpaired t-test is equivalent to one-way ANOVA used to test for a difference in means between two groups. To perform an unpaired t-test:

  1. Plot a histogram of the datasets to confirm that they are normally distributed - if not, STOP and use a non-parametric method!

  2. Start with the hypothesis (H0) "There is no difference between the populations of measurements from which samples have been drawn". (HA: There is a difference).
  3. Set a value for a (significance level, e.g. 0.05).
  4. Check the variance of each group. If the variances are very different, you can already reject the hypothesis that the samples are from a common population, even though their means might be similar. However, it is still possible to go on and perform a t-test.
  5. Calculate t using the formula:

t-test formula

bar x = means of groups A and B, respectively, and
standard error

  1. Look up the value of t for the correct number of degrees of freedom and for a one- or two-tailed test (above).
    For an unpaired t-test:

df = (nA + nB) - 2

where nA/B = the number of values in the two groups being compared. Remember that the sign of t (+/-) does not matter - assume that t is positive.

  1. If the calculated value of t is greater than the critical value, H0 is rejected, i.e. there is evidence of a statistically significant difference between the groups.
    If the calculated value of t is less than the critical value, H0 is accepted, i.e. there is no evidence of a statistically significant difference between the groups.



Consider the data from the following experiment. A total of 12 readings were taken, 6 under control and 6 under experimental conditions:


Group A:

Group B:































graphYes, approximately:


df = (nA-1)+(NB-1) = 10

a: One Tail:
a: Two Tails:


MSExcelMicrosoft Excel offers two ways of performing a t-test:

  • TTEST worksheet function:


array1,array2 are the datasets being compared
tails = 1 or 2 tailed test
type = 1 (paired);   2 (two-sample equal variance);   3 (two-sample unequal variance)

NB: The TTEST function does not return a value for the t statistic, but instead calculates a probability value (ranging from 0-1) that the null hypothesis (i.e. that there is no significant difference between the means of the two groups) is true.

Possibly a better way of performing a t-test is to use the Excel Analysis ToolPak:

  • Unpaired t-test: Two-Sample Assuming Equal Variances
  • Unpaired t-test: Two-Sample Assuming Unequal Variances
  • Paired t-test: Paired Two Sample For Means
  • Use the online Help to find out more about these


ANalysis Of VAriance: ANOVA

Student's t-test can only be used for comparison of two groups. Although it is possible to perform many pairwise comparisons to analyze all the possible combinations involving more than two groups, this is undesirable because: a) it is tedious, and b) it increases the possibility of type I errors. However, ANOVA can compare two or more groups:

Assumptions of ANOVA:

ANOVA is a parametric test which assumes that the data analyzed:

  1. Be continuous, interval data comprising a whole population or sampled randomly from a population.
  2. Has a normal distribution. Moderate departure from the normal distribution does not unduly disturb the outcome of ANOVA, especially as sample sizes increase. Highly skewed datasets result in inaccurate conclusions.
  3. The groups are independent of each other.
  4. The variances in the two groups should be similar.
  5. For two-way ANOVA, the sample size the groups is equal (for one-way ANOVA, sample sizes need not be equal, but should not differ hugely between the groups).

Sir Ronald Fisher (1890-1962) developed the technique of ANOVA, which comes in various flavours:

The F ratio

The F (Fisher) ratio compares the variance within sample groups ("inherent variance") with the variance between groups ("treatment effect") and is the basis for ANOVA:

F = variance between groups / variance within sample groups



Use the online Help to find out more!

The Excel FTEST function returns the result of an F-test - the probability that the variances in array1 and array2 are not significantly different. Use this function to determine whether two samples have different variances, e.g. parameters for bird populations from arable and mixed farms to show whether they have different levels of diversity.

F-test is also one of the tools available on the Excel Analysis ToolPak.


You can perform various types of ANOVA analysis using the Excel Analysis ToolPak:

One-way ("Single Factor") ANOVA:

"Pain Score" for 3 Analgesics:
Aspirin: Paracetemol (Acetaminophen): Ibuprophen: Control (no drug):
Excel Analysis ToolPak One-Way ANOVA



Two-way ANOVA ("Two-factor without replication"):

Apple codling moth (Cydia pomonella) caught in pheromone traps:
Bait 1:
Bait 2:
Orchard 1:
Orchard 2:

Excel Analysis ToolPak Two-Way ANOVA

© MicrobiologyBytes 2009.