Monthly Archives: February 2016


Decision Tree Flavors: Gini Index and Information Gain

Summary: The Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one.  It favors larger partitions.  Information Gain multiplies the probability of the class times the log (base=2) of that class probability.  Information Gain favors smaller partitions with many distinct values.  Ultimately, you have to experiment with your data […]

Information Gain would Select the Number of Images variable while Gini Index would select the more compact Average Token Length.

Why Thinking Like a Computer Frees Your Thinking

Summary: Taking a big problem, question, or analysis and breaking it into chunks frees you from the burden of trying to keep all the plates spinning at once. Compartmentalizing and identifying the dependencies at each step helps you reason out what needs to be done and looked into further.   One of the first problems […]


Cheat Sheet Your Way to Brilliance

Summary: Every great analyst knows the core numbers like the back of their hand.  Build yourself a cheat sheet (whether it’s a list of stats or an interactive dashboard) and keep it up-to-date.  The more you can internalize the numbers, the smarter you’ll look (and be).   The #1 thing that differentiates a great analyst […]

Try to create reports using the top numbers

A Freaking Simple Guide to Github

GitHub is extremely frustrating at times.  Especially when you spend more time tweaking your settings rather than writing / committing code.  I’m going to walk through a couple situations that I’ve ran into.  If you’re looking for a broader, simple tutorial, the git – the simple guide fits that bill nicely as well. Connect an Existing Directory […]