The R-squared measure is between 0 and 1 where 0 means none of the variance is explained by the predictor variable and 1 means 100% of the variance is explained by the predictor variable. This is a very handy measure – it distills all the math behind regression in to one number and one that has a built in scale (bigger is better, smaller is worse).

To calculate the R-squared measure, you need to calculate the prediction for all of your known data to get pred(x) – read as prediction using x.

You’ll need to get **SStot** which is the **Total Sum of Squares**.

**SStot** = SUM( (avg(y) – y)^2 )

The second step is to get the **Residual Sum of the Squares**. Residual, as defined in the next section, is the difference between your prediction (pred(x)) and the actual results (y)

**SSres** = SUM( (pred(x) – y)^2 )

Just taking a step back for a second, what do these two calculations tell us? **SStot** gives you the total “natural” variance of the y variable. If the **SSres** varies in the same way as the “natural” variance, that means the model has captured a lot of that natural variance – thus the R-squared must be close to one. You want the **SSres** to be as small as you can get – meaning you want the predictions to be as close to the actual values of y.

With that in mind, the final calculation of R-squared is to divide SSres by SStot and subtract that quotient from 1.

R-squared = 1 – (SSres / SStot)

## R-Squared Example

Here’s an example with an imaginary linear regression model f(x).

[table]obs, y, avg(y) – y, (avg(y) – y)^2,f(x), f(x)-y, (f(x)-y)^2

1,4,2,4,5,1,1

2,10,-4,16,9,-1,1

3,2,4,16,4,2,4

4,8,-2,4,8,0,0

5,6,0,0,5,-1,1

**SUM**,**30**,,**40**,,,**7**

AVG,6,,,,,[/table]

The table above shows the individual calculations with **SStot** = 40 and **SSres** = 7. The next step is to take that quotient of **SSres** / **SStot** = 7 / 40 = 0.175

Now, we subtract that quotient from one. 1 – 0.175 = 0.835. We now know that the model explains 83.5% of the natural variance in the y variable.

R-squared is an excellent, simple tool to evaluate a regression model with one variable. However, once you add in a second predictor (x) variable, you’ll need to move on to Adjusted R-squared which take in to account the other variables.