SQL is extremely powerful and has a lot of functionality. When it comes to data science interviews, however, there are really only a handful of core concepts that most companies test. These 10 concepts show up the most often because they have the most application in real-life settings.
In this article, I’m going to go over what I think are the 10 most important SQL concepts that you should focus the majority of your time on when prepping for interviews.
With that said, here we go!
1. CASE WHEN
You’ll most likely see many questions that require the use of CASE WHEN statements, and that’s simply because it’s such a versatile concept.
It allows you to write complex conditional statements if you want to allocate a certain value or class depending on other variables.
Less commonly known, it also allows you to pivot data. For example, if you have a month column, and you want to create an individual column for each month, you can use CASE WHEN statements to pivot the data.
Example Question: Write an SQL query to reformat the table so that there is a revenue column for each month.
Initial table:
+------+---------+-------+
| id   | revenue | month |
+------+---------+-------+
| 1    | 8000    | Jan   |
| 2    | 9000    | Jan   |
| 3    | 10000   | Feb   |
| 1    | 7000    | Feb   |
| 1    | 6000    | Mar   |
+------+---------+-------+
Result table:
+------+-------------+-------------+-------------+-----+-----------+
| id   | Jan_Revenue | Feb_Revenue | Mar_Revenue | ... | Dec_Revenue |
+------+-------------+-------------+-------------+-----+-----------+
| 1    | 8000        | 7000        | 6000        | ... | null        |
| 2    | 9000        | null        | null        | ... | null        |
| 3    | null        | 10000       | null        | ... | null        |
+------+-------------+-------------+-------------+-----+-----------+
2. SELECT DISTINCT
SELECT DISTINCT is something that you should always have at the back of your head. It’s extremely common to use SELECT DISTINCT statements with aggregate functions (which is #3).
For example, if you have a table that shows customer orders, you may be asked to calculate the average number of orders made per customer. In this case, you would want to count the total number of orders over the count of the total number of customers. It may look something like this:
SELECT COUNT(order_id) / COUNT(DISTINCT customer_id) as orders_per_cust FROM customer_orders3. Aggregate Functions
Related to point #2, you should have a strong understanding of aggregate functions like min, max, sum, count, etc… This also means that you should have a strong understanding of the GROUP BY and HAVING clause. I highly advise that you take the time to go through practice problems because there are some creative ways that aggregate functions can be used.Example Question: Write a SQL query to find all duplicate emails in a table namedPerson.+----+---------+ | Id | Email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ANSWER: SELECT Email FROM Person GROUP BY Email HAVING count(Email) > 14. Left Joins vs Inner Joins
For those who are relatively new to SQL or have not used it in a while, it can be easy to mix up left joins and inner joins. Make sure you clearly understand how each join derives different results. In many interview questions, you’ll be required to do some sort of join, and in some cases, choosing one versus the other is the difference between a right and wrong answer.5. Self-Joins
Now we’re getting to the more interesting stuff! A SQL self-join joins a table with itself. You might think that that serves no purpose, but you’d be surprised at how common this is. In many real-life settings, data is stored in one large table rather than many smaller tables. In such cases, self-joins may be required to solve unique problems.Let’s look at an example.Example Question: Given theEmployeetable below, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.+----+-------+--------+-----------+ | Id | Name | Salary | ManagerId | +----+-------+--------+-----------+ | 1 | Joe | 70000 | 3 | | 2 | Henry | 80000 | 4 | | 3 | Sam | 60000 | NULL | | 4 | Max | 90000 | NULL | +----+-------+--------+-----------+Answer: SELECT a.Name as Employee FROM Employee as a JOIN Employee as b on a.ManagerID = b.Id WHERE a.Salary > b.Salary6. Subqueries
A subquery, also known as an inner query or a nested query, is a query within a query and is embedded in the WHERE clause. This is a great way to solve unique problems that require multiple queries in sequence in order to produce a given outcome. Subqueries and WITH AS statements are both extremely using when querying, so you should absolutely make sure that you know how to use them.Example Question: Suppose that a website contains two tables, theCustomerstable and theOrderstable. Write a SQL query to find all customers who never order anything.Table: Customers.+----+-------+ | Id | Name | +----+-------+ | 1 | Joe | | 2 | Henry | | 3 | Sam | | 4 | Max | +----+-------+Table: Orders. +----+------------+ | Id | CustomerId | +----+------------+ | 1 | 3 | | 2 | 1 | +----+------------+Answer: SELECT Name as Customers FROM Customers WHERE Id NOT IN ( SELECT CustomerId FROM Orders )7. String Formatting
String functions are important especially when working with data that isn’t clean. Thus, companies may test you on string formatting and manipulation to make sure that you know how to manipulate data.String formatting includes things like:
- LEFT, RIGHT
- TRIM
- POSITION
- SUBSTR
- CONCAT
- UPPER, LOWER
- COALESCEIf you are unsure of any of these, check out Mode’s tutorial on string functions for cleaning data.
8. Date-time Manipulation
You should definitely expect some sort of SQL questions that involves date-time data. For example, you may be required to group data by months or convert a variable format from DD-MM-YYYY to simply the month.Some functions you should know are:
- EXTRACT
- DATEDIFFExample Question: Given a
Weathertable, write a SQL query to find all dates' Ids with higher temperature compared to its previous (yesterday's) dates.+---------+------------------+------------------+ | Id(INT) | RecordDate(DATE) | Temperature(INT) | +---------+------------------+------------------+ | 1 | 2015-01-01 | 10 | | 2 | 2015-01-02 | 25 | | 3 | 2015-01-03 | 20 | | 4 | 2015-01-04 | 30 | +---------+------------------+------------------+Answer: SELECT a.Id FROM Weather a, Weather b WHERE a.Temperature > b.Temperature AND DATEDIFF(a.RecordDate, b.RecordDate) = 19. Window Functions
Window Functions allow you to perform an aggregate value on all rows, instead of return only one row (which is what a GROUP BY statement does). It’s extremely useful if you want to rank rows, calculate cumulative sums, and more.Example Question: Write a query to get theempnowith the highest salary. Make sure your solution can handle ties!depname | empno | salary | -----------+-------+--------+ develop | 11 | 5200 | develop | 7 | 4200 | develop | 9 | 4500 | develop | 8 | 6000 | develop | 10 | 5200 | personnel | 5 | 3500 | personnel | 2 | 3900 | sales | 3 | 4800 | sales | 1 | 5000 | sales | 4 | 4800 |Answer: WITH sal_rank AS (SELECT empno, RANK() OVER(ORDER BY salary DESC) rnk FROM salaries) SELECT empno FROM sal_rank WHERE rnk = 1;10. UNION
As a bonus, #10 is UNION! While it doesn’t come up often, you’ll be asked about this the odd time and it’s good to know in general. If you have two tables with the same columns and you want to combine them, this is when you’d use UNION.Again, if you’re not 100% sure how it works, I would do some quick Googling to learn about it. :)Thanks for Reading!
And that’s all! I hope that this helps you in your interview prep and I wish you the best of luck in your future endeavors. I’m sure that if you know these 10 concepts inside-out, you’ll do great when it comes to most SQL problems out there.Refinances : towardsdatascience.com
0 Comments
Post a Comment