Author Archives : Will


Writing Quality Data Mining Code

Summary: Writing better quality data mining code requires you to write code that is self-explanatory and does one thing at a time well. In terms of analysis, you should be cross-validating and watching for slowly changing relationships in the data.

Split Your Code Apart!

Keeping a Sharp Analytical Mind

Summary: To stay on top of your personal development, try learning new things like a programming language, an instrument, or exposure to a new field (e.g. biology or accounting). Exposure to new ideas helps you avoid confirmation bias and increase you willingness to explore your analysis further.

Optimal Toilet Paper Placement

Test accuracy from using rpart in parallel foreach

Overview of Parallel Processing in R

Summary: The foreach package provides parallel operations for many packages (including randomForest). Packages like gbm and caret have parallelization built into their functions. Other tools like bigmemory and ff solve handling large datasets with memory management.