Data Science Taxonomy: Who Cares About the Name?


Data Science is just applied data mining which is applied machine learning which is...

Every person involved with calculating models and summing up figures has a different name for what they do.  Data Science is the new buzz word but some industry heavyweights have argued the term is different from the rest and others think data scientist isn’t for practitioners.  My belief is that it doesn’t matter what you call it – you’re using math, probability and historical data to predict something.

What’s Data Science?  It is nor data, nor outlier,
Nor aggregate, nor deviation, nor any other statistics
Belonging to a model.  O, be some other name!
What’s in a name?  That which we call a regression
By any other name would work just as well.

Juliet (if she were a statistician)

There are many valid reasons to branch out from one field to another.  Statistics in general is just a branch from mathematics.  Humans are just a branch of an evolutionary tree.  The specialization of skills or study advances the sub-group faster.  We know that it is more efficient to give one person a narrow task on an assembly line so that they become very good at welding one part of a car together.

Data Mining is not an assembly line job.  Every project or  campaign model is a custom job.  You have to re-learn the data and find predictive new variables.  How can someone “specialize” in data science if the job is constantly changing?  It’s the same for every other spin-off field that uses data to predict the future or provide a look into the past.

Where’s the Taxonomy Already?

I’m sure you’ve seen the Data Science Venn Diagram.  It does a great job in showing the intersection of various fields.  Here is my humble attempt at creating a hierarchy:

Data Science / Big Data: Data science relies on making use of huge amounts of data and distributed database systems.  It’s main goal is to produce a data product.  In all cases, you need to focus on science.

Data Mining / CRM: If you’re trying to improve marketing or operations, cal yourself a data miner.  This is the term that “old school” greats like Gordon Linoff and Dean Abott call themselves.  Data Miners need the same model building skills but they apply their knowledge to existing business problems.

Machine Learning / Statistics: This is the foundation on which all data miners and scientists must stand atop.  Without the machine learning techniques or statistical significance tests we would still be guessing at what works.  If your main role is to build the model and explain the results to someone else, you’re a statistician or ML expert.  If you’re also making decisions and taking action on the model you’re probably a data miner.

Business Intelligence / Descriptive Statistics: Counts, sums, averages, percentages and grouping of categorical (like gender, state, or industry) variables is all that business intelligence or descriptive statistics does.  Technically you cannot do statistics or data mining or build a data product without averages and sums.  Organizations that just eyeball the descriptive statistics – “Looks like we got 10 more customers this week than last, we must have done something right last week” – may be incredibly successful but they could be doing much more.

You’re Closer to a Scientist Than a Philosopher

I’m all for standards.  Probably most people who work with data want to have some classification system for the world.  Mining versus Science is all just semantics really.  Companies looking to hire people with data-oriented skillsets are going to use science since it’s sexier.  This comes with a diluting effect – we’ll see Business Intelligence jobs labeled as Data Science or Data Mining.

Your best bet is to just keep learning and be prepared for whatever modeling (or summary) task comes your way.

A modern data professional tends to learn a little bit of everything.  Sure you can have a favorite (like I’m in love with Neural Networks) but one technique is not going to work on a majority of modeling / analysis tasks.  Even if there was one catch-all method, academics and practitioners tend to favor the simpler model.  Both groups are also constrained by the cost of gathering data – maybe credit score adds another 10% predictive power to your model but it costs $500 for each of your 100,000 prospects.

With all that being said, even if you’re a classically trained statistician you’re not going to know every nuance of a statistical method.  Heck, you might not need to know the math behind your statistical methods (and that’s okay).  My point is, all data professionals are in a constant state of learning and no matter the name, their functions overlaps each other.

No matter the name your company has given you or the name you prefer, you’re using methods that are shared by all data professionals.  So what does it matter if LinkedIn is calling their people Data Scientists while Amazon might call them Machine Learning Specialists.

Bottom Line: If you’re in marketing and building models, you’re using statistics and machine learning algorithms but you’re probably not an expert in all of them.  Call yourself a data miner and keep on learning.  If it’s good enough for Dean Abott and Gordon Linoff, it’s good enough for you.