Making Analysts More Productive: Tools and Ideas

Summary: Every organization should provide their analysts and data scientists with a few key tools: A Data Dictionary, a Metric Dictionary, a Research Repository, and a Code Repository.  All of these tools need to be searchable to make it easy for analysts to find and use previous work.

A possible process flow a research repo, data, and metric dictionaries

Variable Importance methods include Trees, Regression coefficients, and chi-square test

Finding Important Variables in Your Data

Summary: Advanced analyses can be simplified by calling out which variables are most important.  Decision Trees, Random Forests, Regression, and Chi-Square tests can quickly reveal what variables carry a lot of weight.


Always Have a Baseline

Summary: Always check your numbers with smaller, simpler queries and figures.  Use total sales as a reality check for comparison to sales queries.  When creating models, compare performance to a simpler model.  Don’t assume complexity equals accuracy.  Be prepared to compare against existing “gold standard” models.

ROC Curve Example Plot from ROCR package