| MicrobiologyBytes: Maths & Computers for Biologists: Inferential Statistics | Updated: February 6, 2009 | Search |
Further information on this topic can be found in Chapter 10 of:
Maths
from Scratch for Biologists
Numerical ability is an essential skill for everyone
studying the biological sciences but many students are frightened by
the
'perceived' difficulty of mathematics, and are nervous about applying
mathematical skills in their chosen field of study. Maths from Scratch
for Biologists
is a highly instructive, informal text that explains step by step how
and why you need to tackle maths within the biological sciences.
(Amazon.co.UK)
c2-test (chi-squared test)This is an example of a non-parametric test. Unlike Student's t-test, it makes no assumptions about the distribution of the data. c2 (pronounced "kye-squared") is used when data consists of nominal or ordinal variables rather than quantitative variables, i.e. when we are interested in how many members fall into given descriptive categories (not for quantitative measurements, such as weight, etc). The c2 test of independence asks "Are two variables of interest independent (not related) or related (dependent)?" and deals with integers - the number of variables which fall into different, mutually exclusive categories. The test investigates whether the proportions of certain categories are different in different groups. When the variables are independent, knowledge of one variable
gives no information about the other variable.
The c2 test is by default one-tailed and can only be carried out on raw data (not percentages, proportions or other derived data):
|
As with Student's t frequency distribution, you don't need to know the formula for the c2 probability density function, simply look up the value of c2 in a statistical table.
The basis of the c2 test is:

and:
H0: observed group mean - expected
group mean = 0
(there is no difference between the two groups)
HA: observed group mean
- expected group mean does not equal 0
(there is a difference between the two groups)
The c2 test has two main uses:
Assumptions:The c2 test is a non-parametric test which assumes that the data analyzed:
If you use the c2 test under other circumstances, the results will be meaningless! IMPORTANT:
|
Example:
Of 120 male and 100 female applicants to university, 90 male and 40
female had work experience.
Does the gender of an applicant to university correspond to whether or not they
have prior work experience?
|
|
|
Work experience: |
||
|
|
|
Yes |
No |
Total |
|
Gender of applicant: |
Male |
90 |
30 |
120 |
|
|
Female |
40 |
60 |
100 |
|
|
Total |
130 |
90 |
220 |

|
|
|
Work experience: |
||
|
|
|
Yes |
No |
Total |
|
Gender of applicant: |
Male |
a |
b |
a+b |
|
|
Female |
c |
d |
c+d |
|
|
Total |
a+c |
b+d |
n |
df = (number of columns-1) * (number of rows-1)
For the above test, df = (2-1) * (2-1) = 1
|
a
|
||||||||||
|
df |
0.995 |
0.99 |
0.975 |
0.95 |
0.9 |
0.1 |
0.05 |
0.025 |
0.01 |
0.005 |
|
1 |
0.000 |
0.000 |
0.001 |
0.004 |
0.016 |
2.706 |
3.841 |
5.024 |
6.635 |
7.879 |
ALTERNATIVE METHOD: c2 calculation using observed and expected values:
|
|
Observed: |
Expected: |
O-E: |
(O - E)2 / 2 |
||||
|
|
Yes |
No |
Yes |
No |
Yes |
No |
Yes |
No |
|
Male: |
90 |
30 |
71 |
49 |
19 |
-19 |
5.1 |
7.4 |
|
Female: |
40 |
60 |
59 |
41 |
-19 |
19 |
6.1 |
8.8 |
|
Total: |
130 |
90 |
130 |
90 |
0 |
0 |
11.2 |
16.2 |
The advantage of this method is that it can be applied to problems where there are more than two groups, for example:
Using the method of observed and expected values we can also use the c2 test to compare an observed distribution with a theoretically expected one.
Example:
|
Colour: |
Observed: |
Expected from genetic theory: |
|
White: |
380 |
51% |
|
Brown: |
330 |
40.8% |
|
Black: |
74 |
8.2% |
|
Colour: |
Observed: |
Theoretical proportion: |
Expected: |
O - E |
(O - E)2 / 2 |
|
White: |
380 |
0.510 |
400 (0.510x784) |
-20 |
1.0 |
|
Brown: |
330 |
0.408 |
320 (0.408x784) |
10 |
0.3125 |
|
Black: |
74 |
0.082 |
64 (0.082x784) |
10 |
1.5625 |
|
Total: |
784 |
1.0 |
784 |
0 |
2.8750 |
|
a
|
||||||||||
|
df |
0.995 |
0.99 |
0.975 |
0.95 |
0.9 |
0.1 |
0.05 |
0.025 |
0.01 |
0.005 |
|
2 |
0.010 |
0.020 |
0.051 |
0.103 |
0.211 |
4.605 |
5.991 |
7.378 |
9.210 |
10.597 |
|
MSExcel can still be useful in performing c2 tests
as it saves much work in performing the calculations. However, it is necessary
to construct the contingency table yourself. To see an example, right-click
on the MSExcel icon opposite, i.e. click with the right hand mouse button.
Choose the "Save Target As" option to download an MSExcel document
to your computer.
|
N.B: Limitations of the c2 test:The c2 test is a non-parametric test which assumes that the data analyzed:
|
Sir Ronald Aylmer Fisher (1890-1962) "the father of modern statistics"
Sir Ron developed the concept of likelihood:
The likelihood of a parameter is proportional to the probability of the data and it gives a function which usually has a single maximum value, called the maximum likelihood.
He also contributed to the development of methods suitable for small samples and studied hypothesis testing.
Fisher's exact test is an alternative to c2 for testing the hypothesis that there is a statistically significant difference between two groups. It has the advantage that it does not make any approximations (Fisher's exact test), and so is suitable for small sample sizes.
Assumptions of Fisher's exact test:Fisher's exact test is a non-parametric test which assumes that the data analyzed:
|
The formula for calculating p values from Fisher's exact test is complicated. As long as the test criteria are appropriate, you can perform Fisher's test using one of the many online calculators or other statistics software (there is no built-in function for Fisher's test in MSExcel).
© MicrobiologyBytes 2009.