Monthly Archives: April 2016


ROC Curve Example Plot from ROCR package

Always Have a Baseline

Summary: Always check your numbers with smaller, simpler queries and figures.  Use total sales as a reality check for comparison to sales queries.  When creating models, compare performance to a simpler model.  Don’t assume complexity equals accuracy.  Be prepared to compare against existing “gold standard” models.


Dean Wampler at Data Science at Scale with Spark

Meet-Up Recap: Data Science at Scale with Spark

Summary: Dean Wampler from Lightbend presented at the Direct Supply MSOE offices on Tuesday, 4/5/2016.  Dean covered a high-level overview of Spark and its benefits (business logic is focus of code and it’s faster).  Those wanting to learn more should pick up Learning Spark at O’Reilly books.