SQL (Structured Query Language)


SQL (pronounced “sequel”) or Structured Query Language is the pseudo-programming language that defines, retrieves and manipulates data stored in a database.

SQL is one of the most important tools for an analyst since Relational Databases are so prevalent across the globe.  It is so commonly used that new programs such as Cloudera’s Impala, Apache Hive and Google’s Tenzing are all adaptions of SQL that pull data from Hadoop or make writing MapReduce queries more simple.

A SQL query has three major parts:

  1. Selecting what you want
  2. Pulling from the correct tables
  3. Filtering data

A Basic SQL Query

SELECT em.Name, em.StartDate, em.StillEmployed
FROM Employees em
WHERE em.StillEmployed=TRUE

The SELECT statement tells your RDBMs what columns should be displayed.

The FROM statement points to the correct table – notice the “em” at the end.  You can give a table a nickname to make it easier to reference.

The WHERE statement filters only for employees that are still employed.

Resources:

SQL Tutorial by W3Schools