MicrobiologyBytes: Maths & Computers for Biologists: Linear Regression Updated: April 26, 2007 Search

Linear Regression

Further information on this topic can be found in Chapter 11 of:

CoverMaths from Scratch for Biologists

Numerical ability is an essential skill for everyone studying the biological sciences but many students are frightened by the 'perceived' difficulty of mathematics, and are nervous about applying mathematical skills in their chosen field of study. Maths from Scratch for Biologists is a highly instructive, informal text that explains step by step how and why you need to tackle maths within the biological sciences. (Amazon.co.UK)

Regression and correlation are related, but different, tests:

Regression or Correlation?

Linear regression and correlation are similar and easily confused. In some situations it makes sense to perform both calculations. Calculate correlation if:

  • You measured both X and Y in each subject and wish to quantity how well they are associated.
  • Calculate the Pearson (parametric) correlation coefficient if you can assume that both X and Y are sampled from normally-distributed populations.
  • Otherwise calculate the Spearman (nonparametric) correlation coefficient.
  • Don't calculate a correlation coefficient if you manipulated the X variable (e.g. in an experiment).

Calculate linear regressions only if:

  • One of the variables (X) is likely to precede or cause the other variable (Y).
  • Choose linear regression if you manipulated the X variable, e.g. in an experiment. It makes a difference which variable is called X and which is called Y, as linear regression calculations are not symmetrical with respect to X and Y. If you swap the two variables, you will obtain a different regression line.
  • In contrast, correlation calculations are symmetrical with respect to X and Y. If you swap the labels X and Y, you will still get the same correlation coefficient.

 

Linear regression works by by minimizing the sum of the square of the vertical distances of the points from the regression line, hence is known as the "least squares" method. The calculation effectively minimizes the sizes of squares drawn between the data points and the regression line:

Regression line

(You don't normally see the squares, or even the line unless you choose to - they are just drawn here for illustration)

Performing a regression analysis is similar to performing a correlation test:

  1. Formulate the null hypothesis. Remember how to do this: a simpler hypothesis has priority over a more complex theory. The null hypothesis (H0) is therefore that "Y is independent of X, therefore the slope of the regression line is 0".

  2. Calculate the test statistics. A regression line is actually a running series of means of the expected value of Y for each value of X and is calculated from the following equation:

    equation

    However, don't learn these equations - we don't expect you to calculate regression lines by hand! Use the MSExcel Analysis ToolPak Regression option.

  1. Interpret the test statistics: (N.B these data are for the renal dialysis example)

Excel regression analysis

An MSExcel regression analysis calculates and displays the potentially confusing set of statistics shown above, so here's what they mean:

Bestsellers - Music - DVDs - Videos - Electronics
Search for ... (keywords):
Search for ... (keywords):

Bestsellers - Music - DVDs - Videos - Electronics

Summary:

There are two ways to perform a regression analysis in MSExcel:

Remember that linear regression is a parametric statistic and may not give reliable results if applied to skewed datasets! This is not a limitation of MSExcel - applies to all regression analysis. In spite of these limitations, MSExcel offers a quick means of performing a regression analysis.


© MicrobiologyBytes 2007.