MicrobiologyBytes: Maths & Computers for Biologists: Correlation  Updated: February 6, 2009  Search 
Further information on this topic can be found in Chapter 11 of:
Maths from Scratch for Biologists
Numerical ability is an essential skill for everyone
studying the biological sciences but many students are frightened by
the
'perceived' difficulty of mathematics, and are nervous about applying
mathematical skills in their chosen field of study. Maths from Scratch
for Biologists
is a highly instructive, informal text that explains step by step how
and why you need to tackle maths within the biological sciences.
(Amazon.co.UK)
Correlation, the relationship between two variables, is closely related to prediction.
The greater the association between variables, the more accurately we can predict the outcome of events. There is rarely an exact correlation of observed results with a mathematical function  the points never fit exactly on the line. The question is therefore whether an association between two variables could have occurred by chance.
There are numerous methods for calculating correlation, e.g:
Pearson correlation calculations are based on the assumption that both X and
Y values are sampled from populations
that follow a normal (Gaussian) distribution,
at least approximately, although with large samples, this assumption is not
too important.
Alternatively, the nonparametric Spearman
correlation is based on ranking the two variables, and so makes no assumption
about the distribution of the values.
A correlation analysis is performed in the same as any other statistical test of significance:
so don't learn this equation  use software to perform the calculation, e.g. MSExcel (see below).
Values of r range from +1 (perfect correlation), through 0 (no correlation), to 1 (perfect negative correlation):
Warning:
Correlation tests are in some ways the most misused of all statistical
procedures!
They are able to show whether two variables could be connected. However,
they are not able
to show that the variables are not connected! If one variable
depends on another, i.e. there is a causal relationship, then it is always possible
to find some kind of correlation between the two
variables.
However, if both variables depend on a third, they can show a correlation
without any causal dependency between them. Take care!
Example: There is a direct correlation between the number of mobile phone masts and the decline in the numbers of house sparrows, Passer domesticus. But do mobile phone masts harm sparrows, or are both effects caused by something else? Or are they both completely independent observations which just happen to correlate? We don't know because correlation tests do not reveal this information  further investigation is necessary. 
MSExcel has two functions which determine parametric correlation coefficients: =CORREL(array1,array2) where: =PEARSON(array1,array2) where: For most purposes, these two functions are identical.
Correlation is also one of the tools available on the MSExcel Analysis ToolPak. 
Example:
In patients undergoing renal (kidney) dialysis, is there any association
between heart rate and blood pressure?


This can be seen visually by plotting a scatter graph of this data and drawing a trendline through it:
Warning: You
cannot accurately assess whether a significant correlation between variables
exists by visual examination alone!

Bestsellers  Music  DVDs  Videos  Electronics 

Bestsellers  Music  DVDs  Videos  Electronics 
While desirable, it is not always possible to use a parametric test such as the Pearson method. Fortunately, there are also nonparametric correlation tests, the most frequentlyused of which is the Spearman test. Unfortunately, there is no builtin Spearman test in MSExcel, so you're going to have to do some work!
Calculation of the Spearman rank order correlation coefficient (r_{s}) is used when the data consists of ordinal variables (i.e. variables with an ordered series where numbers indicate rank order only). Although this is a nonparametric statistic, it may be a better indicator than the Pearson coefficient of a nonlinear relationship between two variables.
To perform the Spearman test, the data must first be converted into rank order. When converting to rank order, the smallest value on X becomes a rank of 1, etc, e.g:
x: 
y: 
Convert to ranks: 
x:

y:

7.3 
4.2 
2 
1 

5.15 
7.6 
1 
2 

8.99 
9.12 
3 
4 

9.01 
8.7 
4 
3 
e.g:
=RANK(number,ref,order) where: 
The equation for the Spearman calculation is:
where:
N is the number of pairs (XY)
D is the difference between each pair (X  Y)
There is no builtin formula for the Spearman calculation in MSExcel, but you can easily perform a Spearman calculation as follows. Start by converting the data to rank order if this has not already been done, then calculate the differences between the pairs (D), the squares of the differences (D^{2}) and the sum of the squares (SD^{2}):
Convert the Spearman formula (above) into an MSExcel formula:
r_{s} = 1(6*SD^2/(N(N^2 1)))
After calculating the value of r_{s}, this is compared with the critical value of r in deciding whether to accept or reject the null hypothesis. For a onetailed test df = n1 and for a twotailed test (most usual) df=n2.
Values of r range from +1 (perfect correlation), through 0 (no correlation), to 1 (perfect negative correlation). In general terms, correlation coefficients:
© MicrobiologyBytes 2009.