Summary: R offers a handful of packages to automate building models. rpart, randomForest, MASS, and forecast packages help you search through a hypothesis space. The caret package helps crawl through the hyper parameter space.
Summary: The passcode riddle asks for three three whole positive numbers with each one being equal to or larger than the next. Turns out there are only a handful of numbers this could possibly work for. Browsing YouTube one morning, I came across the video from TED-Ed and I was intrigued! I’ll be honest, I […]
Summary: The tm and lsa packages provide you a way of manipulating your text data into a term-document matrix and create new, numeric features. The ngram package lets you find frequent word patterns (e.g. “The cow” is a bi-gram or 2-gram; “The cow said” is a tri-gram or 3-gram). Lastly, for a quick visualization (though […]
Summary: The Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one. It favors larger partitions. Information Gain multiplies the probability of the class times the log (base=2) of that class probability. Information Gain favors smaller partitions with many distinct values. Ultimately, you have to experiment with your data […]