The R-squared measure is between 0 and 1 where 0 means none of the variance is explained by the predictor variable and 1 means 100% of the variance is explained by the predictor variable. This is a very handy measure – it distills all the math behind regression in to one number and one that has a built in scale (bigger is better, smaller is worse).

To calculate the R-squared measure, you need to calculate the prediction for all of your known data to get pred(x) – read as prediction using x.

You’ll need to get **SStot** which is the **Total Sum of Squares**.

**SStot** = SUM( (avg(y) – y)^2 )

The second step is to get the **Residual Sum of the Squares**. Residual, as defined in the next section, is the difference between your prediction (pred(x)) and the actual results (y)

**SSres** = SUM( (pred(x) – y)^2 )

Just taking a step back for a second, what do these two calculations tell us? **SStot** gives you the total “natural” variance of the y variable. If the **SSres** varies in the same way as the “natural” variance, that means the model has captured a lot of that natural variance – thus the R-squared must be close to one. You want the **SSres** to be as small as you can get – meaning you want the predictions to be as close to the actual values of y.

With that in mind, the final calculation of R-squared is to divide SSres by SStot and subtract that quotient from 1.

R-squared = 1 – (SSres / SStot)

## R-Squared Example

Here’s an example with an imaginary linear regression model f(x).

obs | y | avg(y) – y | (avg(y) – y)^2 | f(x) | f(x)-y | (f(x)-y)^2 |
---|---|---|---|---|---|---|

1 | 4 | 2 | 4 | 5 | 1 | 1 |

2 | 10 | -4 | 16 | 9 | -1 | 1 |

3 | 2 | 4 | 16 | 4 | 2 | 4 |

4 | 8 | -2 | 4 | 8 | 0 | 0 |

5 | 6 | 0 | 0 | 5 | -1 | 1 |

SUM |
30 |
40 |
7 |
|||

AVG | 6 |

The table above shows the individual calculations with **SStot** = 40 and **SSres** = 7. The next step is to take that quotient of **SSres** / **SStot** = 7 / 40 = 0.175

Now, we subtract that quotient from one. 1 – 0.175 = 0.835. We now know that the model explains 83.5% of the natural variance in the y variable.

R-squared is an excellent, simple tool to evaluate a regression model with one variable. However, once you add in a second predictor (x) variable, you’ll need to move on to Adjusted R-squared which take in to account the other variables.