Definitions – Learn by Marketing

Bootstrapping

This entry was posted on February 9, 2015 by Will

In data mining, bootstrapping is a resampling technique that lets you generate many sample datasets by repeatedly sampling from your existing data. Why Use Bootstrapping: Sometimes you just don’t have enough data! Statistics requires large amounts of data and repeated samples to be confident in their results. There are two applications of bootstrapping as far as […]

R-Squared

This entry was posted on December 7, 2014 by Will

The R-squared measure is between 0 and 1 where 0 means none of the variance is explained by the predictor variable and 1 means 100% of the variance is explained by the predictor variable. This is a very handy measure – it distills all the math behind regression in to one number and one that […]

Autocorrelation

This entry was posted on January 19, 2014 by Will

Autocorrelation is a way of identifying if a time series data set is correlated with a version of itself set off by a certain number of unit. The equation of the sample autocorrelation function is: The top portion is essentially the covariance between the original data and the k-unit lagged data. The bottom is sum […]

Average of repeated samples plotted in histogram

Central Limit Theorem

This entry was posted on December 28, 2013 by Will

In a nutshell, the Central Limit Theorem states that the larger the sample size, the more confident we can be in our estimate of the true mean. If you were to take repeated samples from a population (e.g. send out surveys to a random set of your customers multiple times, asking the same questions), average […]

Average, Variance and Standard Deviation

This entry was posted on December 22, 2013 by Will

Average (mean or arithmetic mean) is the sum of all values divided by the count of the values. Variance is the sum of the “squared difference between each observation and the average of the observations” divided by the count minus one. These are measured in square units so variance isn’t easily interpreted. Standard Deviation is […]

Learn by Marketing

Data Mining + Marketing in Plain English