5 Practice Interview Questions on SQL Joins
Doing an SQL interview as part of your job application?
Be ready to answer questions about JOIN clauses.
JOIN
clauses are ubiquitous within SQL. Used frequently in almost every enterprise-grade database application, they're the glue that holds a database together and makes data analysis possible.
Simply put, JOIN
clauses help link tables within a database together. They help transform a data store from a series of tables — which track information, but aren't particularly useful for analysis — into an interconnected network of data points that can be aggregated and summarized in many different ways.
If you're on this page, it's probably because you've got an SQL interview coming up — and JOIN
clauses will almost certainly make an appearance. Before the big day, be sure to carefully study the different types of JOIN
clauses and be prepared to answer JOIN
-centric questions from your recruiter.
Here are 5 practice interview questions to get you started.
New: Sign up for a free SQL mini-course
Make sure you're prepared for the interview that could change your career. Sign up to receive five free must-learn lessons in preparation for your SQL interview. Start preparing today!
1. What is a JOIN
clause?
Let's start with the basics: what is a JOIN
clause, anyway? If you're familiar with SQL, you've probably used JOIN
s extensively in the past. But using them is different from being able to explain what they do — so let's go over a high-level definition to start us off.
Simply put, a JOIN
combines records from two separate tables. Oftentimes, we come upon situations within SQL in which data on two separate tables is linked, but separated. JOIN
s help bring that data back together.
Here's an example: let's say we have two tables in a database that tracks sales from an e-commerce company. One table is called customers
, and contains data individual customers, like first_name
and email
. The second table is called orders
, and contains information on individual orders that have been placed — like order_date
and product
.
Each order
in our database is placed by a customer
, but we don't keep the customer's information within the orders
table. Why not? Because if the same customer placed multiple orders, and we kept track of customer information within the orders
table, we'd be duplicating data unnecessarily. By separating customer information into its own customers
table, we can reduce redundancy and make updates and changes much easier to handle.
So, we include a field called customer_id
within each record on the orders
table. this ID is linked to a customer_id
on the customers
table, which contains non-redundant data for each individual customer.
When we want to bring two tables together, we use a JOIN
statement to combine data as necessary.
Here's how these two tables might look in practice:
CREATE TABLE customers (
customer_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
);
CREATE TABLE orders (
order_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_placed_date DATE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
);
2. What are the different types of SQL JOIN
clauses, and how are they used?
In SQL, a JOIN
clause is used to return a table that merges the contents of two or more other tables together. For example, if we had two tables — one containing information on Customers
and another containing information on the Orders
various customers have placed — we could use a JOIN
clause to bring them together to create a new table: a complete list of orders by customer, with all necessary information to make shipments.
There are multiple types of JOIN
clauses, and they all serve slightly different functions:
INNER JOIN
returns a list of rows for which there is a match in both tables specified. It's the default join type, so if you just typeJOIN
without specifying any other conditions, anINNER JOIN
will be used.LEFT JOIN
will return all results from the left table in your statement, matched against rows in the right table when possible. If a row in the left table does not contain a corresponding match in the right table, it will still be listed — withNULL
values in columns for the right table.RIGHT JOIN
will return all results from the right table in your statement, matched against rows in the left table when possible. If a row in the right table does not contain a corresponding match in the left table, it will still be listed — withNULL
values in columns for the left table.FULL JOIN
will return all results from both the left and the right tables in your statement. If there are instances in which rows from the left table do not match the right table or vice versa, all data will still be pulled in — but SQL will outputNULL
values in all columns that are not matched.CROSS JOIN
returns theCartesian product of two tables — in other words, each individual row of the left table matched with each individual row of the right table.
3. What is the difference between INNER JOIN
and LEFT JOIN
?
When constructing a SELECT
query that combines two or more tables, choosing the right JOIN
type is half the battle. So how do we know when to use INNER JOIN
, and when to use the more complex variants like RIGHT JOIN
and LEFT JOIN
?
Simply put, INNER JOIN
should be used when we want to exclude all records that do not match both of the tables we're joining.
Let's check out an example of this: let's say that we have two tables — one called students
, and one called advisors
. The students
table contains a field called advisor_id
that references an id within the advisors
table:
CREATE TABLE `advisors` (
`advisor_id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
PRIMARY KEY (`advisor_id`)
)
CREATE TABLE `students` (
`student_id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
`advisor_id` int(11) DEFAULT NULL,
PRIMARY KEY (`student_id`),
KEY `advisor_id` (`advisor_id`),
CONSTRAINT `students_ibfk_1` FOREIGN KEY (`advisor_id`) REFERENCES `advisors` (`advisor_id`)
);
Here's the catch: not all students have an advisor, and not all advisors are assigned to students:
SELECT * FROM students;
/*
+------------+------------+------------+------------+
| student_id | first_name | last_name | advisor_id |
+------------+------------+------------+------------+
| 1 | Tanisha | Blake | 2 |
| 2 | Jess | Goldsmith | NULL |
| 3 | Tracy | Wu | 3 |
| 4 | Alvin | Grand | 1 |
| 5 | Felix | Zimmermann | 2 |
+------------+------------+------------+------------+
5 rows in set (0.00 sec)
*/
SELECT * FROM advisors;
/*
+------------+------------+-----------+
| advisor_id | first_name | last_name |
+------------+------------+-----------+
| 1 | James | Francis |
| 2 | Amy | Cheng |
| 3 | Lamar | Alexander |
| 4 | Anita | Woods |
+------------+------------+-----------+
4 rows in set (0.00 sec)
*/
If we wanted to pull information on students and their advisors, excluding students who are not assigned to an advisor, we would use an INNER JOIN
:
SELECT s.first_name AS student_name, a.first_name AS advisor_name
FROM students AS s
INNER JOIN advisors AS a ON s.advisor_id = a.advisor_id;
/*
+--------------+--------------+
| student_name | advisor_name |
+--------------+--------------+
| Alvin | James |
| Tanisha | Amy |
| Felix | Amy |
| Tracy | Lamar |
+--------------+--------------+
4 rows in set (0.00 sec)
*/
Since we're using an INNER JOIN
the results of this query exclude students for whom the advisor_id
is set to NULL
. Let's take a look at what happens when we use a LEFT JOIN
instead:
SELECT s.first_name AS student_name, a.first_name AS advisor_name
FROM students AS s
LEFT JOIN advisors AS a ON s.advisor_id = a.advisor_id;
/*
+--------------+--------------+
| student_name | advisor_name |
+--------------+--------------+
| Alvin | James |
| Tanisha | Amy |
| Felix | Amy |
| Tracy | Lamar |
| Jess | NULL |
+--------------+--------------+
5 rows in set (0.01 sec)
*/
Notice that our data set now contains information on all students, even those for whom the advisor is set to NULL
!
4. What is the difference between LEFT JOIN
and RIGHT JOIN
?
LEFT JOIN
and RIGHT JOIN
actually both do very similar things: they display the results of a JOIN
query including all records on a given table. The only difference is that LEFT JOIN
displays all records on the left table of the query, and RIGHT JOIN
displays all records on the right table!
To make this a bit clearer, let's use our example of students and advisors above: a LEFT JOIN
of students
onto advisors
will show a list of all students, even those who do not have advisors — because students
is the LEFT
table:
SELECT s.first_name AS student_name, a.first_name AS advisor_name
FROM students AS s
LEFT JOIN advisors AS a ON s.advisor_id = a.advisor_id;
/*
+--------------+--------------+
| student_name | advisor_name |
+--------------+--------------+
| Alvin | James |
| Tanisha | Amy |
| Felix | Amy |
| Tracy | Lamar |
| Jess | NULL |
+--------------+--------------+
5 rows in set (0.01 sec)
*/
Notice that the advisor 'Anita'
is excluded in the table above, because she is not assigned to any students.
On the other hand, a RIGHT JOIN
of students
onto advisors
will show a list of all students who are assigned to advisors, plus a list of advisors not assigned to students — because advisors
is the RIGHT
table:
SELECT s.first_name AS student_name, a.first_name AS advisor_name
FROM students AS s
RIGHT JOIN advisors AS a ON s.advisor_id = a.advisor_id;
/*
+--------------+--------------+
| student_name | advisor_name |
+--------------+--------------+
| Alvin | James |
| Tanisha | Amy |
| Felix | Amy |
| Tracy | Lamar |
| NULL | Anita |
+--------------+--------------+
5 rows in set (0.00 sec)
*/
Notice that the student 'Jess'
is excluded in the table above, because she is not assigned to an advisor.
5. How should data be structured to facilitate JOIN
clauses in a one-to-many relationship? Whta about a many-to-many relationship?
This one is a bit trickier, and is an interesting database design question.
Generally, one-to-many relationships are structured using a single FOREIGN KEY
. Consider our example of customers
and orders
above:
CREATE TABLE customers (
customer_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
);
CREATE TABLE orders (
order_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_placed_date DATE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
);
This is a one-to-many relationship, because one customer
can place multiple orders
, but one order
cannot be assigned to more than one customer
. As such, we've defined it with a simple foreign key in the orders
table pointing to a given customer_id
, and we can use JOIN
clauses in our SELECT
queries fairly easily.
Many-to-many relationships are a bit more complicated. For example, what if we had an orders
table and a products
table with a many-to-many relationship: any order
can contain multiple products
, and any product
can be assigned to multiple orders
. How would we structure our database?
The answer: we use an intermediary mapping table with two FOREIGN KEY
s. Consider the following:
CREATE TABLE orders (
order_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_placed_date DATE NOT NULL,
);
CREATE TABLE products (
product_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price INT NOT NULL
);
CREATE TABLE products_to_orders (
product_to_order_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Above, we've created a separate table called products_to_orders
that maps items on the products
table to items on the orders
table. Each row in our products_to_orders
table represents one product-order combination, so that multiple products
can be assigned to one order
— and a single product
can be assigned to multiple orders
.
In this example, we need to use two JOIN
statements to link all these tables together: one to link products_to_orders
to products
, and one to link products_to_orders
with orders
.
New: Sign up for a free SQL mini-course
Make sure you're prepared for the interview that could change your career. Sign up to receive five free must-learn lessons in preparation for your SQL interview. Start preparing today!
Free practice question
Want more SQL interview prep? Try your hand with this free practice question:
Cocoa Confections is a small bakery that sells brownies, cookies, pies, and other delicious treats to customers online. It keeps records of all of its online sales in an SQL database that is automatically populated as customers place orders on its site.
In its database, Cocoa Confections has a customers
table to keep track of customer contact information, and an orders
table to keep track of various orders that those customers have placed. The schema of these tables is as follows:
CREATE TABLE customers (
customer_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
address VARCHAR(255) DEFAULT NULL,
city VARCHAR(255) DEFAULT NULL,
state VARCHAR(2) DEFAULT NULL,
zip_code VARCHAR(5) DEFAULT NULL,
);
CREATE TABLE orders (
order_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
customer_id INT NOT NULL,
order_placed_date DATE NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
);
It's the end of 2016, and the owner of Cocoa Confections wants to write an SQL query that finds the COUNT
of orders placed in 2016 by customer e-mail address. She wants to ORDER
the results by the COUNT
of orders placed in 2016, descending, so that she can personally send thank-you e-mails to Cocoa Confection's top customers by order volume.
Can you write a query that will help the owner of Cocoa Confections find the COUNT
of all orders
placed in 2016, by customer e-mail address, sorted descending?
The owner of Cocoa Confections wants to find the COUNT
of orders placed by e-mail address. But the email
field is located within the customers
table, not the orders
table. What type of statement do we need to use to combine the two tables together?
INNER JOIN
To combine our two tables together, we'll use a JOIN
. Since the orders
table has a customer_id
field, we can match that against the customer_id
in the customers
table to link up the two separate data sources. We'll use an INNER JOIN
so that data is only included if the customer_id
listed matches both of our input tables. Try the following to get you started — it'll compile a list of all orders in our database with all requisite customer information appended:
SELECT
*
FROM
orders
INNER JOIN
customers ON orders.customer_id = customers.customer_id;
COUNT(*)
Now that we have a complete list of orders with e-mail addresses included, we can use the COUNT(*)
and GROUP BY
functions to get a list of orders by e-mail address. This will return a COUNT
of all orders in our database by e-mail address. We'll also restrict our SELECT
clause so that it only pulls the e-mail address and the COUNT
of orders from our database, excluding all other fields.
SELECT
email, COUNT(*)
FROM
orders
INNER JOIN
customers on orders.customer_id = customers.customer_id
GROUP BY
email;
/*
Results:
+------------------------------+----------+
| email | COUNT(*) |
+------------------------------+----------+
| adawson@yahoo.com | 6 |
| aferguson@gmail.com | 16 |
| amy.cummings@yahoo.com | 13 |
...100 rows in set (0.01 sec)
*/
WHERE
and ORDER BY
We're not done yet! We still need to restrict the dates of our pull to 2016 only. How do we filter for particular results in SQL? Using a WHERE
clause! We'll also need to use an ORDER BY
clause to ensure that our results are sorted descending by number of orders. These two clauses, combined, should look something like this:
WHERE
order_placed_date BETWEEN CAST('2016-01-01' AS DATE) AND CAST('2016-12-31' AS DATE)
ORDER BY
2016_num_orders DESC;
Note that we need to use the CAST
function to ensure that our numerical inputs are properly interpreted as dates by MySQL.
Your solution will need to combine the customers
table with the orders
table. Does it do so?
INNER JOIN
?To combine the customers
and orders
tables together, an INNER JOIN
query will be most effective. Does the solution you've created use an INNER JOIN
statement?
Don't forget that in order to find the total number of orders placed by e-mail address, you'll need to use SQL's COUNT(*)
or SUM
functions, which also necessitate a GROUP BY
clause in your query. Does your solution contain those features?
Don't forget that we're only pulling orders from the year 2016
, so we'll need to restrict our pull to that year only. We'll also need to arrange our results in descending order by total COUNT
. Have you got those features in your query?
Great! Sounds like you're on the right track, so let's move on to the solution.
Here's our full solution to the problem:
SELECT
customers.email, COUNT(*) AS 2016_num_orders
FROM
orders
INNER JOIN
customers on orders.customer_id = customers.customer_id
WHERE
orders.order_placed_date BETWEEN CAST('2016-01-01' AS DATE) AND CAST('2016-12-31' AS DATE)
GROUP BY
customers.email
ORDER BY
2016_num_orders DESC;
/*
Results:
+------------------------------+-----------------+
| email | 2016_num_orders |
+------------------------------+-----------------+
| doreen.glover@yahoo.com | 14 |
| rpayne@hotmail.com | 10 |
| fobrien@gmail.com | 10 |
...99 rows in set (0.01 sec)
*/
Check out what's happening above:
First, we use a SELECT
statement to pull specific fields from our database. We'll pull the email
field, which is located on the customers
table, and the aggregated COUNT(*)
of records, using the AS
clause to refer to this column as 2016_num_orders
.
We'll use a FROM orders
clause to note that we're pulling this data from our orders
table. But since the email
field isn't located on the orders
table, we'll also need to perform an INNER JOIN
linking the orders.customer_id
field to the customers.customer_id
field.
We'll use a WHERE
clause to restrict our data range to 2016 only using two criteria: >= '2016-01-01'
and <= '2016-12-31'
.
Since we're using a COUNT(*)
function to sum records, we'll need to use a GROUP BY
clause to tell SQL that we want to group our results by email
address.
Finally, we'll tell SQL that we want our results arranged by number of orders, from greatest to least, using ORDER BY
.
Our query is complete!