Yearly Archives: 2016


Free Data Mining and Data Science Books

I’m on a bit of a reading kick as of late so I wanted to compile a short list of some useful and free data mining / data science books.  Most are of a technical nature and come from academia Free Academic Texts on Data Mining An Introduction to Statistical Learning with Applications in R: Covers […]


Density and CDF Plots from Iris Data Set

Book Review: Data Analysis with Open Source Tools

I’ve had this book on my (digital) shelf for a long time.  It’s an intimidating tome.  It’s big and broad.  However, it’s not exactly what I was expecting.  As the book title explains, the main focus is data analysis.  Not necessarily statistical or data mining analysis.  Instead, chapters one through eleven are focused on plotting, mathematical […]


Information Gain would Select the Number of Images variable while Gini Index would select the more compact Average Token Length.

Decision Tree Flavors: Gini Index and Information Gain

Summary: The Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one.  It favors larger partitions.  Information Gain multiplies the probability of the class times the log (base=2) of that class probability.  Information Gain favors smaller partitions with many distinct values.  Ultimately, you have to experiment with your data […]


Why Thinking Like a Computer Frees Your Thinking

Summary: Taking a big problem, question, or analysis and breaking it into chunks frees you from the burden of trying to keep all the plates spinning at once. Compartmentalizing and identifying the dependencies at each step helps you reason out what needs to be done and looked into further.   One of the first problems […]