Pyspark ALS and Recommendation Outputs

Lately, I’ve written a few iterations of pyspark to develop a recommender system (I’ve had some practice creating recommender systems in pyspark). I ran into a situation where I needed to generate some recommendations on some different datasets. My problem was that I had to decipher some of the prediction documentation. Because of my struggles, […]


Writing Quality Data Mining Code

Summary: Writing better quality data mining code requires you to write code that is self-explanatory and does one thing at a time well. In terms of analysis, you should be cross-validating and watching for slowly changing relationships in the data.

Split Your Code Apart!

Kaggle Winners and Algorithm Associations

Winning a Kaggle Competition Analysis

Summary: XGBoost and ensembles take the Kaggle cake but they’re mainly used for classification tasks. Some tools like factorization machines and vowpal wabbit make occasional appearances.


Keeping a Sharp Analytical Mind

Summary: To stay on top of your personal development, try learning new things like a programming language, an instrument, or exposure to a new field (e.g. biology or accounting). Exposure to new ideas helps you avoid confirmation bias and increase you willingness to explore your analysis further.

Optimal Toilet Paper Placement