Data Science, Business Intelligence, Predictive Analytics, or whatever you call it. There is a wealth of knowledge out there, much of it written decades before “Data Science” became a term! This list is far from comprehensive and will undergo changes as I find more books.
Enjoy your journey toward data mining mastery!
The Short List:
If you have the urge to binge on a small amount of books, here are my top of the top picks. Reading these few books will take you from beginner to pro in no time.
- Sam’s 10 Minute SQL – Forta
- Machine Learning with R – Lantz
- Intro to Statistical Learning with R – James, et. al.
- Python for Data Analysis – McKinney
- Show Me the Numbers – Few
- Mining of Massive Datasets – Leskovec, et. al.
- Learning Spark – Karau, et. al.
Using Statistical Software
Core books to learn the software used in data manipulation and statistical analysis
- [SQL] Sam’s 10 Minute SQL – Forta
- [Python] Python for Data Analysis – McKinney
- [R] Machine Learning with R – Lantz
- [SAS] The Little SAS Book – Delwiche, et. al.
Business Concepts and Data
Sometimes you don’t need all of the details on implementation or using a particular tool.
- Data Mining Techniques – Linoff, et. al.
- Data Science for Business – Provost, et. al.
- How to Measure Anything – Hubbard
Reporting and Visualization
The softer side of data mining but definitely more important than finding the “perfect” algorithm for your application. Writing well and presenting your results are more than half the battle in analysis.
- Show Me the Numbers – Few
- On Writing Well – Zinsser
- Envisioning Information – Tufte
Detailed Data Mining References
If you want to know what each algorithm is really doing and how to get the most out of your model development, you’ll need a solid academic reference book.
- Data Mining: Concepts and Techniques – Han, et. al.
- Intro to Statistical Learning with R – James, et. al.
- The Elements of Statistical Learning – Hastie, et. al.
- Mining of Massive Datasets – Leskovec, et. al.
Implementing Algorithms
Getting your hands dirty with algorithms is a great way to learn the inner workings of machine learning models and can be a great experience.
- [Python] Machine Learning in Action – Harrington
- [Python] Programming Collective Intelligence – Segaran
Big Data, Distributed Databases
The wave of the future is analyzing huge datasets. Spark and (to some extent) Hadoop are important tools to understand well.
- Hadoop: The Definitive Guide – White
- Learning Spark – Karau, et. al.
Testing and Web Analytics
Having the best machine learning algorithms in the palm of your hand won’t do you any good if you’re not sure how to successfully test their performance. Web analytics become an important point as they will be a great source of data for your models.
- Web Analytics 2.0 – Kaushik
- Always Be Testing – Eissenberg
- [Python] Bandit Algorithms for Website Optimization – White
Stories About Statistics
If you’re like me, it’s fun to read about analytics and problem solving with numbers. Take a break from learning your next language or algorithm and read something easy and fun!
- The Goal: A Process of Ongoing Improvement – Goldratt
- Moneyball – Lewis