Welcome to “SQL Interview Questions (Basic – Advanced),” your go-to resource for preparing your next SQL interviews. This guide is packed with the most important and frequently asked questions in SQL interviews, making it a valuable tool for anyone looking to enter the field of database management or sharpen their existing skills. Whether you’re a complete beginner or an experienced database professional, the questions we’ve compiled here are chosen to reflect what you’re most likely to face in an interview.
However, with this guide, you’ll find yourself gradually building a solid foundation. We start with the very basics, explaining key concepts and simple queries in a way that’s easy to understand. This approach ensures that even if you’re new to SQL, you’ll find the guidance you need to start on the right foot.
As you progress through the guide, the questions become more challenging, covering advanced topics that experienced professionals are expected to know. This structure helps to ensure that by the time you reach the end, you’re well-prepared to tackle questions that range from the basic operations to complex query optimizations and database design principles.
Every question in this guide is accompanied by a clear and concise explanation, often with examples, to help you grasp the concept and learn how to articulate your answers effectively. This method of preparation is designed to boost your confidence, making you feel more prepared and at ease during your actual interview.
This collection of SQL interview questions is an indispensable resource for anyone aiming to succeed in their next SQL interview. It’s a comprehensive tool that supports both freshers looking to make a strong entry into the field and experienced professionals aiming to refresh their knowledge and stand out in a competitive job market. With this guide, you’ll have everything you need to impress your interviewers and take the next step in your career.
Lets start preparing from here
Basic SQL Interview Questions:
1. What is SQL?
SQL, which stands for Structured Query Language, is a standard programming language specifically designed for managing and manipulating relational databases. It allows users to perform various operations on data stored in a database, including creating, querying, updating, and deleting data.
SQL is widely used in database management and administration, data integration processes, and for making data accessible and manageable for business intelligence, data analysis, and reporting purposes.
2. Explain the difference between SQL and NoSQL databases.
Feature | SQL Databases | NoSQL Databases |
---|---|---|
Data Model | Structured, relational tables | Flexible: document, key-value, wide-column, graph |
Schema Flexibility | Requires predefined schema | Schema-less, allowing for dynamic schema |
Scalability | Vertical scaling (scale-up by adding more powerful server) | Horizontal scaling (scale-out by adding more servers) |
Consistency | ACID (Atomicity, Consistency, Isolation, Durability) | BASE (Basically Available, Soft state, Eventually consistent) |
Use Cases | Complex queries, relational data, transactions | Rapid development, large or unstructured data, scalability |
Query Language | SQL (Structured Query Language) with rich querying capabilities | Varied, including simpler query languages and APIs |
Transactions | Strong support for complex transactions | Varies, with some supporting transactions but often in a more limited or different way |
Data Integrity | High, due to strict schema and relationships enforcement | Flexible, depending on the specific NoSQL database used |
Typical Examples | MySQL, PostgreSQL, Oracle, SQL Server | MongoDB, Cassandra, CouchDB, DynamoD |
3. What are the different types of SQL commands?
SQL (Structured Query Language) commands are used to interact with and manipulate relational databases. There are several types of SQL commands, and they can be broadly categorized into the following groups:
- Data Query Language (DQL) Commands:
SELECT
: Used to retrieve data from one or more tables. It allows you to specify the columns you want to retrieve and apply various conditions to filter the data.
- Data Definition Language (DDL) Commands:
CREATE
: Used to create database objects like tables, indexes, or views.ALTER
: Used to modify the structure of existing database objects.DROP
: Used to delete database objects like tables, indexes, or views.TRUNCATE
: Removes all records from a table, but retains the table structure for future use.
- Data Manipulation Language (DML) Commands:
INSERT
: Used to add new records into a table.UPDATE
: Used to modify existing records in a table.DELETE
: Used to remove records from a table.
- Data Control Language (DCL) Commands:
GRANT
: Gives specific privileges or permissions to users or roles.REVOKE
: Removes specific privileges or permissions from users or roles.
- Transaction Control Commands:
COMMIT
: Saves all the changes made during the current transaction.ROLLBACK
: Undoes all the changes made during the current transaction.SAVEPOINT
: Sets a point within a transaction to which you can later roll back.
- Data Query and Manipulation Commands:
MERGE
(also known asUPSERT
): CombinesINSERT
andUPDATE
operations into a single statement, allowing you to insert a new row or update an existing row based on certain conditions.UPSERT
: Not a standard SQL command, but some database systems support it as an alternative toMERGE
.
4. What is a database schema?
A database schema is a logical container or blueprint that defines the structure and organization of a database. It outlines the way data is stored, how tables are related to each other, the constraints that apply to the data, and various other attributes that help in maintaining and querying the data within the database.
5. Tell Me about Primary key and Foreign Key.
Primary Key
- Definition: A primary key is a column in a database table that uniquely identifies each row in the table. No two rows can have the same primary key value, ensuring data integrity and uniqueness.
- Characteristics:
- It must contain unique values.
- It cannot contain NULL values.
- A table can have only one primary key, which can consist of single or multiple columns (composite key).
- It is often used in database operations such as JOINs, and it is the default index for the table.
- Purpose: The primary key’s main purpose is to enforce entity integrity by uniquely identifying each record in a table. It’s essential for establishing relationships with other tables.
Foreign Key
- Definition: A foreign key is a column in one table that references the primary key of another table. The purpose of the foreign key is to enforce referential integrity between two related tables.
- Characteristics:
- It is a field (or fields) in one table that uniquely identifies a row of another table or the same table (in case of self-reference).
- It can contain duplicate values, and unlike primary keys, a foreign key can contain NULL values (unless specifically restricted not to).
- A table can have multiple foreign keys, and each foreign key is linked to a primary key in another table.
- It ensures actions on data in one table can reflect in another table, maintaining the integrity and consistency of the dataset.
- Purpose: The foreign key’s main role is to maintain referential integrity within the database by ensuring that the relationship between two tables remains consistent. That is, it prevents actions that would leave orphaned records in the database. For example, a foreign key can prevent deleting a record in one table if it’s referenced in another table.
6. Difference between WHERE and HAVING clauses?
The WHERE
and HAVING
clauses in SQL are used to filter rows in the result set of a query.
WHERE Clause
The WHERE
clause is used with the SELECT
, UPDATE
, DELETE
, and MERGE
statements to filter rows from the base tables before they are included in the result set.
SELECT column1, column2
FROM table
WHERE column3 > 10;
HAVING Clause
The HAVING
clause is used with the SELECT
statement in combination with GROUP BY
. It filters the results of an aggregation operation, such as COUNT
, SUM
, AVG
, etc., on grouped rows.
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;
7. GROUP BY clause with example.
The GROUP BY
clause in SQL is used to arrange identical data into groups. This clause is often used with aggregate functions (like COUNT, MAX, MIN, SUM, AVG) to perform an operation on each group of data. Essentially, GROUP BY
allows you to collapse a set of rows into a smaller set of rows by aggregating values based on one or more columns.
Example Scenario:
Let’s use the Orders
table example from before, but let’s add a new column, Category
, to categorize products. Suppose we want to find the total quantity of products ordered in each category.
Orders Table:
OrderID | OrderDate | CustomerID | ProductName | Quantity | Price | Category |
---|---|---|---|---|---|---|
101 | 2024-03-27 | 1 | Laptop | 1 | 1200 | Electronics |
102 | 2024-03-28 | 1 | Mouse | 2 | 50 | Electronics |
103 | 2024-03-29 | 2 | Notebook | 4 | 20 | Stationery |
104 | 2024-03-30 | 2 | Pen | 10 | 5 | Stationery |
SQL Query:
To find the total quantity of products ordered in each category, you can use the following SQL query:
SELECT Category, SUM(Quantity) AS TotalQuantity
FROM Orders
GROUP BY Category;
Result:
This query would produce a result set that looks like this:
Category | TotalQuantity |
---|---|
Electronics | 3 |
Stationery | 14 |
This table shows that, across all orders, there were a total of 3 items ordered in the Electronics
category and 14 items in the Stationery
category.
8. How do you retrieve all the columns from a table?
To retrieve all columns from a table in an SQL database, you use the SELECT
statement followed by an asterisk (*
) symbol, and then specify the table from which you want to retrieve the data. The asterisk symbol represents all columns in the table.
Here is the basic syntax:
SELECT * FROM TableName;
Example:
Suppose you have a table named Employees
and you want to retrieve all columns and all rows from this table.
Your SQL query would look like this:
SELECT * FROM Employees;
9. What is Cursor?
Cursor is a Temporary Memory or Temporary Work Station which is Allocated by Database Server at the Time of Performing DML(Data Manipulation Language) operations. Cursors are used to store Database Tables.
Cursors are of 2 types:
1). Implicit Cursors
2). Explicit Cursors
10. What is Trigger?
In SQL, a trigger is a database object that defines a set of actions to be automatically executed when a specific event occurs in a database. These events can include data changes (inserts, updates, deletes) in a table or other database actions.
Example
CREATE TRIGGER log_changes
AFTER INSERT OR UPDATE OR DELETE ON employees
FOR EACH ROW
BEGIN
INSERT INTO employee_audit (action, employee_id, change_date)
VALUES (
CASE
WHEN INSERTING THEN 'INSERT'
WHEN UPDATING THEN 'UPDATE'
WHEN DELETING THEN 'DELETE'
END,
:NEW.employee_id,
SYSDATE
);
END;
In this example, the trigger log_changes
is defined to execute after an INSERT
, UPDATE
, or DELETE
operation on the employees
table. It logs the action (insert, update, or delete), the employee ID, and the change date in an employee_audit
table.
11. What is View?
In the context of a relational database, a view is a virtual table or a saved query result. It does not store data itself but rather presents data from one or more underlying tables in a structured and organized format.
CREATE VIEW employee_info AS
SELECT employee_id, first_name, last_name, department
FROM employees
WHERE status = 'Active';
In this example, the “employee_info” view is defined to retrieve data from the “employees” table but only includes active employees and shows a subset of columns. Users can then query the “employee_info” view as if it were a regular table.
Views are a valuable tool in SQL for data abstraction, simplifying queries, and ensuring data security and consistency.
12. Difference Between Drop, Delete and Truncate.
DROP
- Operation: Removes a table (or an entire database) from the database schema along with all its data, structure, and associated constraints, indexes, triggers, permissions, etc.
- Reversibility: The action is irreversible; once a table or database is dropped, you lose its structure and data permanently (unless you have a backup).
- Use Case: Use
DROP
when you need to permanently remove a table or database and do not need its data or structure anymore. - Syntax:
DROP TABLE TableName;
DELETE
- Operation: Deletes rows from a table based on the condition specified in the
WHERE
clause. If no condition is specified, it removes all rows from the table. However, the table structure, constraints, indexes, triggers, etc., remain intact. - Reversibility: It’s reversible in the sense that you can re-insert the deleted data if needed, but undoing requires manual re-insertion or a rollback if within a transaction.
- Use Case: Use
DELETE
when you want to remove specific rows from a table or all rows but want to retain the table structure. - Syntax:
DELETE FROM TableName WHERE condition;
- If no
WHERE
clause is specified, all rows in the table will be deleted.
- If no
TRUNCATE
- Operation: Removes all rows from a table, but the table structure and its columns, constraints, indexes, etc., remain intact. It is a faster way to delete all records in the table because it does not log individual row deletions.
- Reversibility: Like
DELETE
, it’s reversible in terms of re-inserting data, but not through a simple undo action. TRUNCATE operations cannot be rolled back in some databases, or they behave differently in terms of transactions. - Use Case: Use
TRUNCATE
when you need to quickly remove all rows from a table without affecting the table’s structure. It’s often used for purging data. - Syntax:
TRUNCATE TABLE TableName;
Key Differences:
- Performance:
TRUNCATE
is generally faster thanDELETE
because it does not generate individual row delete logs. - Logging:
DELETE
logs each row deletion, allowing for fine-grained transactional control.TRUNCATE
andDROP
are less granular, often logging at a higher level or not at all for the specific rows/data removed. - Scope:
DROP
is the most drastic, removing the table structure itself.TRUNCATE
andDELETE
affect only data, not the table’s schema.
13. Explain Join and it Types.
1. INNER JOIN
- Description: Returns rows when there is at least one match in both tables. If there is no match, the rows are not returned.
- Usage: Use when you only want to retrieve records that have matching values in both tables.
- Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.common_field = table2.common_field;
2. LEFT JOIN (or LEFT OUTER JOIN)
- Description: Returns all rows from the left table, and the matched rows from the right table. If there is no match, the result is NULL on the side of the right table.
- Usage: Use when you want all records from the left table and the matched records from the right table.
- Syntax:
SELECT columns
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field;
3. RIGHT JOIN (or RIGHT OUTER JOIN)
- Description: Returns all rows from the right table, and the matched rows from the left table. If there is no match, the result is NULL on the side of the left table.
- Usage: Use when you want all records from the right table and the matched records from the left table.
- Syntax:
SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.common_field = table2.common_field;
4. FULL JOIN (or FULL OUTER JOIN)
- Description: Returns rows when there is a match in one of the tables. Essentially, it combines the results of both LEFT JOIN and RIGHT JOIN.
- Usage: Use when you want all records when there is a match in either left or right table.
- Syntax:
SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.common_field = table2.common_field;
5. CROSS JOIN
- Description: Returns a Cartesian product of the two tables, i.e., it joins every row of the first table with every row of the second table.
- Usage: Use when you need to combine each row of one table with every row of another table.
- Syntax:
SELECT columns
FROM table1
CROSS JOIN table2;
6. SELF JOIN
- Description: A self join is a case when a table is joined with itself. It’s useful for comparing rows within the same table.
- Usage: Use when you need to join a table to itself, for example, when comparing rows in the same table.
- Syntax:
SELECT a.columns, b.columns
FROM table1 a, table1 b
WHERE condition;
Special Types:
- NATURAL JOIN: Automatically joins tables based on columns with the same names and compatible types between the tables.
- EQUI JOIN: A type of join that combines rows with equivalent values in specified columns, essentially an INNER JOIN using an equality comparison.
14. Difference between a primary key and a unique key?
Feature | Primary Key | Unique Key |
---|---|---|
Uniqueness | Must be unique. | Must be unique. |
Null Values | Cannot accept null values. | Can accept a single null value (per unique constraint, depending on the DBMS). |
Quantity per Table | Only one primary key allowed per table. | Multiple unique keys can be defined per table. |
Purpose | Uniquely identifies each row in a table. | Ensures data in the column is unique. |
Index | Automatically creates a unique index. | Automatically creates a unique index. |
Role in Foreign Key Constraints | Often used as a reference for foreign keys. | Can be referenced by foreign keys, but less common than primary keys. |
Enforcement | Enforces entity integrity. | Enforces additional business rules regarding uniqueness. |
Modification and Deletion | Altering or dropping a primary key can be more restrictive due to its role in relationships. | Usually, fewer restrictions on altering or dropping unique constraints. |
15. Explain ORDER BY clause?
The ORDER BY
clause in SQL is used to sort the results of a query in ascending (ASC) or descending (DESC) order based on one or more columns. By default, ORDER BY
sorts the data in ascending order if you don’t specify ASC
or DESC
.
Example Scenario:
Suppose you have a Students
table that stores information about students, including their names and scores in a test. You want to retrieve a list of students along with their scores, sorted by their scores in descending order (highest score first).
Students Table:
StudentID | Name | Score |
---|---|---|
1 | Alice | 88 |
2 | Bob | 92 |
3 | Charlie | 75 |
4 | Diana | 85 |
SQL Query:
SELECT Name, Score
FROM Students
ORDER BY Score DESC;
Result
The query will return the list of students ordered by their scores from highest to lowest:
Name | Score |
---|---|
Bob | 92 |
Alice | 88 |
Diana | 85 |
Charlie | 75 |
16. What is Index?
In a database, an index is a data structure that improves the speed of data retrieval operations on a database table. Indexes work by providing a fast way to look up rows in a table based on the values of specific columns.
Instead of scanning the entire table, the database system can use an index to quickly locate the rows that match certain criteria. Indexes are crucial for optimizing query performance in large databases.
17. Tell me about the ROLLBACK and COMMIT.
COMMIT:
COMMIT
is a SQL command that is used to permanently save the changes made during a transaction to the database.- When a
COMMIT
statement is executed, all the changes (inserts, updates, deletes) made within the transaction are saved to the database, making them permanent and visible to other users and transactions. - Once a transaction is committed, it cannot be rolled back, and the changes become a permanent part of the database.
BEGIN TRANSACTION;
-- SQL statements that modify data
COMMIT;
ROLLBACK:
ROLLBACK
is a SQL command used to undo or discard the changes made during a transaction.- When a
ROLLBACK
statement is executed, it reverts all the changes made within the transaction, restoring the database to its state before the transaction started. - It is typically used in situations where an error occurs during a transaction or when the transaction’s changes need to be canceled for some reason.
BEGIN TRANSACTION;
-- SQL statements that modify data
ROLLBACK;
18. What is BETWEEN operator?
The BETWEEN
operator is a SQL operator used to specify a range of values within which a column’s value must fall for a row to be selected in a query.
It is commonly used in the WHERE
clause of a SQL query to filter rows based on a specified range of values. The BETWEEN
operator is inclusive, meaning it includes the boundary values in the range.
SELECT product_name, price
FROM products
WHERE price BETWEEN 10 AND 20;
19. What is a subquery?
A subquery, also known as an inner query or nested query, is an SQL query that is embedded within the clause of another SQL query. Essentially, it’s a query within a query. Subqueries can be used in various parts of a SQL statement, including SELECT
, FROM
, WHERE
, and HAVING
clauses.
They are powerful tools for performing complex data retrieval operations, allowing you to break down complicated queries into more manageable, logical components.
Example
Consider a database with two tables: Employees
(containing EmployeeID
, Name
, DepartmentID
) and Departments
(containing DepartmentID
, DepartmentName
). Suppose you want to find the names of employees who work in the “IT” department.
SELECT Name
FROM Employees
WHERE DepartmentID = (
SELECT DepartmentID
FROM Departments
WHERE DepartmentName = 'IT'
);
Explanation
- Inner Query (Subquery):
(SELECT DepartmentID FROM Departments WHERE DepartmentName = 'IT')
finds theDepartmentID
of the “IT” department. - Outer Query: Uses the result of the inner query to retrieve
Name
from theEmployees
table where theDepartmentID
matches the ID found by the subquery.
20. What is a self-join?
A self-join is a SQL technique used to join a table to itself. This approach is particularly useful when you want to compare rows within the same table or extract relational data stored in a single table.
To perform a self-join, you typically use table aliases to differentiate the “two instances” of the same table involved in the join. An alias is a temporary name assigned to a table, or a column in a table, for the duration of a SQL query.
Example Scenario
Consider a simple Employees
table that includes columns for EmployeeID
, Name
, and ManagerID
, where ManagerID
is a reference to the EmployeeID
of the employee’s manager (also an entry in the same table).
Employees Table
EmployeeID | Name | ManagerID |
---|---|---|
1 | John | NULL |
2 | Jane | 1 |
3 | Doe | 1 |
4 | Smith | 2 |
Objective
Find all employees along with their managers’ names.
SELECT e1.Name AS EmployeeName, e2.Name AS ManagerName
FROM Employees e1
LEFT JOIN Employees e2 ON e1.ManagerID = e2.EmployeeID;
Result
EmployeeName | ManagerName |
---|---|
John | NULL |
Jane | John |
Doe | John |
Smith | Jane |
Explanation of Results:
- John does not have a manager listed (his
ManagerID
isNULL
), which reflects in theManagerName
beingNULL
for him. - Jane and Doe are both managed by John, as indicated by their
ManagerID
pointing to John’sEmployeeID
. - Smith is managed by Jane, as shown by Smith’s
ManagerID
matching Jane’sEmployeeID
.
21. Explain the difference between UNION and UNION ALL.
UNION
and UNION ALL
are SQL set operators used to combine the result sets of two or more SELECT statements.
UNION
UNION
is used to combine the result sets of two or more SELECT statements and return a distinct set of rows. Duplicate rows are automatically removed from the result set.UNION
can be slower thanUNION ALL
because it has to perform the additional step of removing duplicates.
SELECT column1 FROM table1
UNION
SELECT column1 FROM table2;
UNION ALL
UNION ALL
is used to combine the result sets of two or more SELECT statements and returns all rows, including duplicates. It does not remove duplicate rows.UNION ALL
is typically faster thanUNION
because it does not have the overhead of eliminating duplicate rows.
SELECT column1 FROM table1
UNION ALL
SELECT column1 FROM table2;
Intermediate SQL Interview Questions:
1. Difference between an INNER JOIN and a CROSS JOIN.
Feature | INNER JOIN | CROSS JOIN |
---|---|---|
Definition | Combines rows from two tables based on a condition that matches columns from both tables. | Produces a Cartesian product of the two tables, combining each row of the first table with each row of the second table. |
Condition | Requires a condition (ON clause) to specify how to match rows from the joined tables. | Does not require a condition; every possible combination of rows is produced. |
Result Set | Includes only rows where the join condition is true. | Includes all possible combinations of rows from the joined tables, leading to a result set with a number of rows equal to the product of the row counts of the two tables. |
Use Case | Used when you want to retrieve related data from multiple tables based on a logical relationship between columns. | Used when you need to generate every possible combination of rows between two or more tables. Rarely used in practical applications compared to other join types. |
Example
Suppose we have two small tables, A
and B
, and we want to illustrate the difference between using an INNER JOIN and a CROSS JOIN with them.
Table A:
id | name |
---|---|
1 | Alice |
2 | Bob |
Table B:
id | color |
---|---|
1 | Red |
2 | Blue |
INNER JOIN Example:
Let’s join A
and B
on their id
columns.
SELECT A.name, B.color
FROM A
INNER JOIN B ON A.id = B.id;
Result
name | color |
---|---|
Alice | Red |
Bob | Blue |
CROSS JOIN Example:
Now, let’s do a CROSS JOIN of A
and B
.
SELECT A.name, B.color
FROM A
CROSS JOIN B;
Result
name | color |
---|---|
Alice | Red |
Alice | Blue |
Bob | Red |
Bob | Blue |
2. GROUP_CONCAT function in MySQL.
The GROUP_CONCAT
function in MySQL is a powerful aggregation function that concatenates values from multiple rows into a single string. It’s particularly useful when you want to aggregate data from a column and display it as a comma-separated list (or using any other delimiter of your choice).
This function can be very handy for generating compact representations of multiple-row data that is related by some criteria, often used within a GROUP BY
query.
Syntax
GROUP_CONCAT([DISTINCT] expr [,expr ...]
[ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name ...]]
[SEPARATOR str_val])
Example
Consider a Students
table where you want to list all course names that each student is enrolled in, grouped by student.
Students Table:
StudentID | Name | Course |
---|---|---|
1 | Alice | Math |
1 | Alice | Science |
2 | Bob | History |
2 | Bob | Science |
3 | Charlie | Math |
Query
SELECT Name, GROUP_CONCAT(Course ORDER BY Course SEPARATOR ', ') AS Courses
FROM Students
GROUP BY Name;
Result
Name | Courses |
---|---|
Alice | Math, Science |
Bob | History, Science |
Charlie | Math |
3. Difference between a candidate key and a composite key.
Candidate key is a unique key that can be used as a primary key. but not necessarily used as one.
Composite key is a key of two or more attributes that uniquely identifies the row.
Feature | Candidate Key | Composite Key |
---|---|---|
Definition | A candidate key is a set of one or more fields/columns that can uniquely identify every row in a table. | A composite key is a type of candidate key that consists of two or more columns to uniquely identify rows in a table. |
Uniqueness | Must be unique across all rows in a table. | Must be unique across all rows in a table when the columns are combined. |
Nullability | Cannot contain NULL values in the key columns. | Cannot contain NULL values in the combination of key columns. |
Number in a Table | A table can have multiple candidate keys. | A table can have multiple composite keys, but a composite key is considered as one among possible candidate keys. |
Simplicity | Can be a single column or a combination of multiple columns. | Always consists of two or more columns. |
Primary Key Selection | One of the candidate keys is chosen to be the primary key. | If chosen as a primary key, it ensures that the uniqueness constraint is maintained across the combination of columns. |
Purpose | To ensure entity integrity by uniquely identifying each row in a table. | To uniquely identify each row in a table when no single column can serve as a unique identifier. |
4. Explain the concept of a pivot table.
In SQL, a pivot table involves transforming or “pivoting” data from rows into columns to more effectively analyze and summarize data in a relational database.
To pivot data in SQL, we generally use a combination of aggregate functions (like SUM()
, COUNT()
, AVG()
, etc.), CASE
or IF
statements, and GROUP BY
clauses.
Example
Consider a Sales
table with the following columns: Year
, Product
, and TotalSales
.
Objective:
Create a pivot table that shows the total sales for each product per year, with each year as a separate column.
Original Sales
Table:
Year | Product | TotalSales |
---|---|---|
2020 | ProductA | 100 |
2020 | ProductB | 150 |
2021 | ProductA | 200 |
2021 | ProductB | 250 |
SQL Query to Create a Pivot Table:
SELECT Product,
SUM(CASE WHEN Year = 2020 THEN TotalSales ELSE 0 END) AS Sales_2020,
SUM(CASE WHEN Year = 2021 THEN TotalSales ELSE 0 END) AS Sales_2021
FROM Sales
GROUP BY Product;
Resulting Pivot Table
Product | Sales_2020 | Sales_2021 |
---|---|---|
ProductA | 100 | 200 |
ProductB | 150 | 250 |
5. Tell Me the purpose of the COALESCE function?
The COALESCE
function in SQL serves the purpose of returning the first non-null value in a list of arguments. It’s essentially used for handling NULL
values within SQL queries, allowing for more flexible and cleaner data presentation and decision-making in database operations.
Example
Orders Table:
OrderID | CustomerName | TotalAmount | PromoCode |
---|---|---|---|
1 | Alice | 100.00 | PROMO10 |
2 | Bob | 200.00 | NULL |
3 | Charlie | 150.00 | PROMO15 |
4 | Diana | 120.00 | NULL |
Write an SQL query that selects all orders, but instead of showing NULL
for PromoCode
, it should display ‘No Promo Applied’.
SQL Query Using COALESCE
SELECT OrderID,
CustomerName,
TotalAmount,
COALESCE(PromoCode, 'No Promo Applied') AS PromoCodeApplied
FROM Orders;
Result
OrderID | CustomerName | TotalAmount | PromoCodeApplied |
---|---|---|---|
1 | Alice | 100.00 | PROMO10 |
2 | Bob | 200.00 | No Promo Applied |
3 | Charlie | 150.00 | PROMO15 |
4 | Diana | 120.00 | No Promo Applied |
Explanation:
- The
COALESCE
function checks eachPromoCode
value in theOrders
table. - If
PromoCode
is notNULL
,COALESCE
returns thePromoCode
value, indicating the promotional code applied to the order. - If
PromoCode
isNULL
,COALESCE
returns ‘No Promo Applied’, indicating that no promotional code was used for that order.
6. Difference between a UNION and a JOIN?
Feature | UNION | JOIN |
---|---|---|
Basic Function | Combines the result sets of two or more queries into a single result set. | Combines columns from one or more tables based on a related column between them. |
Operation Type | Set operation. | Join operation. |
Result Structure | Creates a result set that stacks rows from the combined queries. Each column in the result set corresponds to the columns in the queries being united. | Creates a result set that includes columns from the joined tables. The result set can contain more columns than the individual tables if columns from both tables are selected. |
Criteria for Combination | The queries being combined must have the same number of columns, with compatible data types, but the data can come from different columns or tables. | Combines data based on a logical relationship, such as matching key values in the tables being joined. |
Duplication of Rows | By default, UNION removes duplicate rows. Use UNION ALL to include duplicates. | Does not inherently remove duplicates. Duplicate rows depend on the table data and the join condition. |
Use Cases | Useful for combining similar data from different tables, views, or queries into a single list. | Useful for retrieving related data from multiple tables, where the tables are related by common columns. |
7. Difference between a subquery and a JOIN.
Subqueries and JOINs are both powerful SQL concepts used to combine data from multiple tables. They serve similar purposes but in different ways.
Subquery
- Definition: A subquery is a query nested inside another query. It can be used in various parts of a main query, including the SELECT, FROM, WHERE, and HAVING clauses.
- Data Combination: Subqueries can return single values, multiple values, or even full result sets depending on where and how they are used.
- Performance: Can be slower than JOINs for large datasets because the subquery may be executed multiple times. Performance varies significantly based on the database system, the complexity of the subquery, and how well the database optimizes queries.
- Readability: Can make queries easier to read by breaking down complex operations into logical steps. However, deeply nested subqueries can become difficult to decipher and maintain.
SELECT Name
FROM Employees
WHERE DepartmentID = (SELECT DepartmentID FROM Departments WHERE Name = 'Sales');
JOIN
- Definition: A JOIN clause is used to combine rows from two or more tables based on a related column between them.
- Data Combination: JOINs explicitly create a relationship between tables, combining columns from each table into a new result set. The resulting table includes data that matches the JOIN condition, and the type of JOIN determines how unmatched rows are handled.
- Performance: Generally faster than subqueries for joining tables, particularly for INNER and OUTER JOINs, as databases are optimized to perform JOIN operations efficiently. However, performance can depend on indexing, the size of the dataset, and the specific operations being performed.
- Readability: JOINs can make the relational aspect of the query more explicit and easier to understand at a glance, especially for those familiar with relational database concepts. However, complex JOINs involving multiple tables can become challenging to follow.
SELECT Employees.Name
FROM Employees
JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID
WHERE Departments.Name = 'Sales';
8. How do you perform pagination in SQL queries?
Performing pagination in SQL queries involves dividing the results of a query into discrete “pages” and retrieving a specific subset of records from a larger query result set.
This is particularly useful for improving the performance of database queries and enhancing user experience in applications that display large datasets. The approach to pagination can vary depending on the SQL database being used (e.g., MySQL, PostgreSQL, SQL Server, etc.).
SQL Server
SQL Server uses the FETCH NEXT
and OFFSET
clauses along with an ORDER BY
clause to paginate results.
Example
Similar to the first example, but for SQL Server, to get the first 10 records:
SELECT * FROM TableName
ORDER BY ColumnName
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY;
For the second page:
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
9. Difference between a view and a table?
Feature | View | Table |
---|---|---|
Physical Storage | Does not store data physically. It’s a virtual table. | Stores data physically in the database. |
Data Retrieval | Data is dynamically generated on demand from an SQL query. | Data is directly retrieved from the physical storage. |
Modification | Direct data modification (insert, update, delete) is possible but often restricted and depends on the database system and the view’s complexity. | Directly supports insert, update, delete, and select operations without restrictions. |
Schema | The schema is defined by the SQL query used to create the view. Can hide certain columns or complexity from the user. | Has a fixed schema defined by columns and data types. Users see the schema as it is. |
Persistence | Provides a persistent logical layer to access or transform data but does not persist data itself. | Data is persistently stored and managed by the database system. |
Performance | Can be slower for data retrieval because the data is generated dynamically from the underlying tables each time the view is accessed. | Generally offers faster data access, especially for direct queries, since the data is stored physically. |
Use Cases | Useful for simplifying complex queries, security (restricting access to specific data), and presenting data in a different perspective or aggregated form. | Used for the physical storage of data, establishing the structure and relationships of the database. |
10. How can you create an auto-incrementing column in SQL?
To create an auto-incrementing column in SQL, which automatically generates a unique number for each row, you use different syntax depending on the SQL database management system you’re working with. Auto-incrementing columns are commonly used for primary keys.
In MySQL, you define an auto-incrementing column using the AUTO_INCREMENT
attribute:
CREATE TABLE MyTable (
ID INT NOT NULL AUTO_INCREMENT,
Name VARCHAR(100),
PRIMARY KEY (ID)
);
11. How do you remove a table from a database?
To remove a table from a database, you use the DROP TABLE
command in SQL. This command deletes the table and all of its data permanently.
Syntax
DROP TABLE table_name;
12. What are window functions, and provide examples?
Window functions are a category of SQL functions that perform calculations across a set of rows that are related to the current row in a way similar to aggregate functions. However, unlike aggregate functions, window functions do not cause rows to become grouped into a single output row.
The rows retain their separate identities. Behind the scenes, window functions operate over a “window” of rows defined by the OVER()
clause to provide calculations for each row in relation to the window of rows.
Common Types of Window Functions
- Aggregation Window Functions:
SUM()
,AVG()
,COUNT()
, etc., when used with theOVER()
clause. - Ranking Functions:
ROW_NUMBER()
,RANK()
,DENSE_RANK()
, etc. - Analytic Functions:
LEAD()
,LAG()
,FIRST_VALUE()
,LAST_VALUE()
, etc.
13. Difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN.
In SQL, joins are used to combine rows from two or more tables based on a related column between them, often used to merge related data from different tables into a single result set. The primary types of joins are INNER JOIN, LEFT JOIN, and RIGHT JOIN, each serving different purposes based on how you want to combine the data.
INNER JOIN
- Description: An INNER JOIN returns rows when there is at least one match in both tables. If there is no match, the rows are not returned.
- Use Case: Use INNER JOIN when you only want the result set to include rows that have matching values in both tables.
- Example: If you want to get a list of users and their orders, but only include users who have placed orders and orders that have an associated user.
SELECT Users.Name, Orders.OrderID
FROM Users
INNER JOIN Orders ON Users.UserID = Orders.UserID;
LEFT JOIN (or LEFT OUTER JOIN)
- Description: A LEFT JOIN returns all rows from the left table, and the matched rows from the right table. The result is NULL from the right side if there is no match.
- Use Case: Use LEFT JOIN when you want to include all rows from the left table regardless of whether they have matches in the right table, supplementing with data from the right table when matches are found.
- Example: If you want to list all users and their orders, including users who have not placed any orders.
SELECT Users.Name, Orders.OrderID
FROM Users
LEFT JOIN Orders ON Users.UserID = Orders.UserID;
RIGHT JOIN (or RIGHT OUTER JOIN)
- Description: A RIGHT JOIN returns all rows from the right table, and the matched rows from the left table. The result is NULL from the left side if there is no match.
- Use Case: Use RIGHT JOIN when you want to include all rows from the right table regardless of whether they have matches in the left table, supplementing with data from the left table when matches are found.
- Example: If you want to list all orders and their associated users, including orders that have not been associated with a user.
SELECT Users.Name, Orders.OrderID
FROM Users
RIGHT JOIN Orders ON Users.UserID = Orders.UserID;
Comparison
- INNER JOIN gives you the intersection of two tables.
- LEFT JOIN gives you all records from the left table and matched records from the right table, filling with NULLs if there are no matches.
- RIGHT JOIN gives you all records from the right table and matched records from the left table, filling with NULLs if there are no matches.
14. What is a SQL injection and how can it be prevented?
SQL injection is a security vulnerability that occurs when an attacker is able to insert or “inject” a malicious SQL query via the input data from the client to the application. When the application fails to properly validate this input before adding it to an SQL query, the injected SQL code can be executed by the database, potentially leading to unauthorized access, data leakage, data deletion, or corruption.
How SQL Injection Works?
Consider a simple example where a web application uses input from a form to construct an SQL query:
String query = "SELECT * FROM users WHERE username = '" + username + "' AND password = '" + password + "';";
An attacker could submit a username input as something like admin' --
and no password. This input would effectively turn the query into:
SELECT * FROM users WHERE username = 'admin' --' AND password = '';
15. How do you calculate the average, sum, and count in SQL?
These functions perform calculations on a set of values and return a single value. They are often used in conjunction with the GROUP BY
clause to aggregate values across distinct categories.
AVG()
The AVG()
function calculates the average value of a numeric column.
Example
To find the average salary of employees:
SELECT AVG(Salary) AS AverageSalary
FROM Employees;
SUM()
The SUM()
function returns the total sum of a numeric column.
Example
To calculate the total sales from a Sales
table:
SELECT SUM(Amount) AS TotalSales
FROM Sales;
COUNT()
The COUNT()
function returns the number of rows that match a specified criterion, including NULL
values if a specific column is not specified.
Examples
To count all rows in the Employees
table:
SELECT COUNT(*) AS NumberOfEmployees
FROM Employees;
16. Difference between a clustered and non-clustered index?
Clustered and non-clustered indexes are types of indexes used in databases to speed up the retrieval of data from a table.
Feature | Clustered Index | Non-Clustered Index |
---|---|---|
Definition | The type of index where row data is stored in order based on the index keys. There can be only one clustered index per table because it dictates the physical storage order of the data. | An index where the data rows are not stored in key sequence. The index contains pointers to the data in the table. A table can have multiple non-clustered indexes. |
Number Per Table | One | Multiple |
Storage of Data | Data is physically stored in the order of the index. | Does not alter the physical storage of the data. Instead, it maintains a separate index structure to point to the physical data rows. |
Key Structure | Index keys are the actual data in the table. | Index structure contains index keys along with pointers to the actual data rows. |
Performance | Generally faster for operations that retrieve a large number of contiguous rows. | Provides flexibility for indexing on various columns, which can improve performance for queries that do not access large ranges of data. |
Impact on Data Manipulation | Insertions, updates, and deletions can affect the physical order of data, potentially impacting performance due to page splits or data reorganization. | Because the data itself is not reordered, insertions and deletions are generally less disruptive. However, the index itself may need to be updated. |
Ideal Use Cases | Best for columns that are often accessed sequentially, such as a primary key. Useful for operations that benefit from the ordered storage, like range queries. | Suitable for columns used in search conditions, joins, and where multiple indexes are beneficial. Ideal for optimizing access to rows based on criteria other than the main ordering of the table. |
17. How do you perform a case-insensitive search in SQL?
Performing a case-insensitive search in SQL depends on the specific SQL database you’re using, as different databases handle case sensitivity in different ways. Here are methods for some of the most commonly used SQL databases:
SQL Server
In SQL Server, the case sensitivity of a search depends on the collation of the column being searched. To perform a case-insensitive search, you can either use a column with a case-insensitive collation or force a case-insensitive comparison using the COLLATE
keyword with a case-insensitive collation.
SELECT * FROM Users WHERE Username COLLATE Latin1_General_CI_AS = 'interviewexpert';
The above SQL query is to select records from a table named Users
where the Username
matches ‘interviewexpert’, regardless of case (uppercase or lowercase letters). This case-insensitive comparison is explicitly enforced using the COLLATE
keyword followed by a collation setting.
18. Explain the concept of a self-referencing table.
A self-referencing table, also known as a recursive table, is a table in a relational database that includes a foreign key that references its own primary key. This concept is used to represent hierarchical or tree-like data structures within a flat relational table, enabling the modeling of relationships where an entity might be related to another entity of the same type.
Example
Consider an Employees
table that tracks employee-manager relationships within an organization. Each employee may have a manager, and that manager is also an employee in the same table.
Employees Table
- EmployeeID (Primary Key)
- Name
- ManagerID (Foreign Key referencing EmployeeID)
In this table, EmployeeID
is the primary key, and ManagerID
is a foreign key that references EmployeeID
. This setup allows each row to indicate who the manager of each employee is, with the manager’s EmployeeID
being placed in the ManagerID
column of their subordinates’ rows.
EmployeeID | Name | ManagerID |
---|---|---|
1 | Alice | NULL |
2 | Bob | 1 |
3 | Charlie | 2 |
19. How can you find the second highest (or nth highest) value in a column?
Example 1:
SELECT DISTINCT Salary FROM Employees
ORDER BY Salary DESC
LIMIT 1 OFFSET 1;
Example 2:
SELECT MAX(Salary) AS SecondHighestSalary FROM Employees
WHERE Salary < (SELECT MAX(Salary) FROM Employees);
20. How can you find the nth highest salary from an Employee table?
SELECT DISTINCT Salary FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET n;
Replace n with the actual offset you need, e.g., 1 for the second highest
Advanced SQL Interview Questions:
1. Explain the ACID properties of a transaction.
The ACID properties are a set of four key properties—Atomicity, Consistency, Isolation, and Durability—that ensure database transactions are processed reliably and guarantee data integrity, even in the event of errors, power failures, or other problems.
- Atomicity: Guarantees that all operations within a transaction are treated as a single unit. Either all operations succeed, or none do, ensuring partial transactions aren’t left in the database.
- Consistency: Ensures that a transaction brings the database from one valid state to another, maintaining all predefined rules and constraints. The integrity of the database remains intact after each transaction.
- Isolation: Provides transaction isolation, meaning the ability to hide concurrent transaction operations from each other. It ensures that transactions executed simultaneously cannot interfere with each other, maintaining data accuracy.
- Durability: Once a transaction has been committed, it remains so, even in the event of a system failure. This property guarantees that committed transactions are permanently recorded in the database.
2. What is a recursive CTE (Common Table Expression)?
A recursive Common Table Expression (CTE) is a special type of CTE that allows you to execute recursive queries. In SQL, a recursive query is one that refers to itself, directly or indirectly, to compute hierarchical or tree-structured data. Recursive CTEs are powerful tools for dealing with complex data patterns like organizational charts, categories and subcategories, and graph-based data.
3. How you can optimize SQL queries for performance?
Optimizing SQL queries is crucial for improving the performance of database operations, especially as data volume grows. Here are several strategies to optimize SQL queries for better performance:
1. Use Specific Columns in SELECT Statements
- Why: Selecting only the columns you need, rather than using
SELECT *
, reduces the amount of data that needs to be processed and transferred.
2. Indexing
- Why: Proper indexing can dramatically speed up data retrieval operations.
- How: Create indexes on columns that are frequently used in
WHERE
clauses,JOIN
conditions, or as part of anORDER BY
.
3. Efficient Joins
- Why: Joins can be resource-intensive; using them efficiently can reduce query execution time.
- How: Ensure that you’re joining tables on indexed columns. Also, consider the size of the tables being joined and whether certain conditions can be applied before the join to reduce the dataset.
4. WHERE Clause Optimization
- Why: Optimizing conditions in the
WHERE
clause can reduce the number of rows the database needs to scan. - How: Use precise conditions and apply
WHERE
clauses beforeJOIN
operations to filter out unnecessary data early.
5. Use EXISTS Instead of IN for Subqueries
- Why: When checking for existence,
EXISTS
can be faster thanIN
because it stops processing as soon as it finds a match. - How: Replace
IN
withEXISTS
where applicable, especially if the subquery returns multiple rows.
6. Limit the Use of Wildcards
- Why: Wildcards (
%
) inLIKE
clauses, especially when used at the beginning of a pattern, can prevent the use of indexes. - How: Use wildcards judiciously and avoid leading wildcards if possible.
7. Avoid Functions on Indexed Columns in WHERE Clause
- Why: Using functions on indexed columns in the
WHERE
clause can lead to full table scans. - How: If you must use a function, consider whether you can apply it to the input value instead of the column, or use computed columns.
8. Analyze Query Execution Plans
- Why: Execution plans show how the database engine executes queries, revealing inefficiencies.
- How: Use tools provided by your database management system (DBMS) to analyze execution plans and identify bottlenecks.
9. Reduce Subqueries and Correlated Subqueries
- Why: Subqueries, especially correlated ones, can be inefficient because they may be executed repeatedly for each row.
- How: Where possible, replace subqueries with joins or temporary tables.
10. Optimizing GROUP BY and ORDER BY
- Why: These clauses can cause the database to do a lot of extra work sorting data.
- How: Only use
GROUP BY
andORDER BY
when necessary, and consider indexing on the columns used.
11. Data Normalization and Denormalization
- Why: While normalization reduces data redundancy, denormalization can sometimes improve query performance by reducing the number of joins.
- How: Analyze your use case; sometimes a denormalized table for specific queries can improve performance.
12. Use Batch Operations for Bulk Data Manipulation
- Why: Inserting, updating, or deleting large numbers of rows one at a time can be slow.
- How: Use batch operations to manipulate many rows at once, which is often more efficient.
4. What are the benefits of using stored procedures?
Stored procedures are sets of SQL statements that are stored and executed on the database server. They offer several benefits that can enhance application performance, security, and development efficiency.
Performance Improvement: Stored procedures are compiled once and stored in executable form, which means the database engine saves time by not having to parse and compile the SQL code with each execution.
Enhanced Security: Permissions can be set on stored procedures, allowing users to execute complex operations without granting direct access to the underlying data tables. This helps in implementing a more granular security model.
Maintainability and Modularity: Business logic can be centralized within the database, making it easier to maintain and update without affecting client applications. Changes made to the stored procedure are immediately available to all applications that use it.
5. Explain the difference between a database and a schema.
Feature | Database | Schema |
---|---|---|
Definition | A database is a collection of data that is organized to facilitate easy access, management, and updating. It serves as the main container for data and can contain multiple schemas. | A schema is a logical grouping of database objects such as tables, views, and stored procedures. It defines the structure and organization of data within a database. |
Scope | A database is the outermost container and can host multiple schemas within it. It represents the entire set of data and objects on a database server for a particular database application. | A schema exists within a database and serves to categorize and isolate groups of database objects under a single name. |
Ownership | Owned by the database administrator and is associated with a particular database instance. | Owned by a database user and can be used to manage access permissions to different objects within the schema. |
Usage | Used to manage and organize data at a high level. A single server can have multiple databases, each serving different applications or purposes. | Used to organize and secure data within a database, especially when multiple users or applications interact with the same database. |
Objects Contained | Contains all the data and objects related to a specific database application, including schemas, tables, views, stored procedures, and more. | Contains specific database objects like tables, views, stored procedures, and other constructs that define data structures and access methods. |
Physical vs Logical | Represents the physical aspect of data storage, although the physical storage details are managed by the DBMS. | Represents a logical grouping of objects and does not dictate how data is stored physically in the database. |
6. Tell the purpose of the EXCEPT and INTERSECT operators.
The EXCEPT
and INTERSECT
operators are used in SQL to compare the results of two queries and return distinct values. They help in filtering the data based on specific conditions. Here’s a breakdown of each:
EXCEPT Operator
Purpose: The EXCEPT
operator returns distinct rows from the left query that aren’t found in the right query. It effectively subtracts the results of one query from another. This operator is useful when you want to find differences between two datasets.
How It Works: Consider two queries, Query A and Query B. When you use Query A EXCEPT Query B
, the operator compares the results of both queries. It then returns the rows from Query A that do not have corresponding rows in Query B, based on matching values in each column.
Use Case Example: If you have two tables, Employees
and FormerEmployees
, and you want to find out which employees in the Employees
table are not in the FormerEmployees
table, you could use the EXCEPT
operator to get this information.
INTERSECT Operator
Purpose: The INTERSECT
operator returns distinct rows that are found in both the left and right queries. It is used to find the common data between two datasets.
How It Works: Similar to the EXCEPT
operator, if you have Query A and Query B, using Query A INTERSECT Query B
compares the results from both queries. It returns rows that are present in both Query A and Query B, effectively finding the intersection of the two sets.
Use Case Example: Using the same Employees
and FormerEmployees
tables example, if you wanted to find out which employees are listed in both tables, you could use the INTERSECT
operator to identify employees who are or were employed.
7. Difference between a materialized view and a regular view?
Regular View
- Definition: A regular view is essentially a stored SQL query. When you create a view, you define a SQL statement that pulls data from the underlying tables. The view itself does not store any data.
- Data Freshness: Since a regular view does not store data but merely saves a query, it always displays the most current data from the underlying tables whenever you query the view.
- Performance: Querying a view can be slower than querying a materialized view, especially for complex queries or large datasets, because the view’s SQL query is executed every time the view is accessed.
- Use Case: Regular views are useful for abstracting the complexity of the database schema, providing a simpler interface to users, or restricting access to specific data within the tables.
Materialized View
- Definition: A materialized view is like a regular view but with a key difference: it stores the result of the SQL query in a physical table. This means that the materialized view actually contains the data returned by the query at the time it was last refreshed.
- Data Freshness: The data in a materialized view can become stale, as it does not automatically update when the underlying data changes. It needs to be refreshed manually or on a schedule, depending on the database system.
- Performance: Querying a materialized view is generally faster than querying a regular view for complex queries or large datasets, as the data is already precomputed and stored in the view. However, there is a trade-off in terms of storage space and the time/overhead required to refresh the materialized view.
- Use Case: Materialized views are beneficial when you need to improve query performance for complex calculations, aggregations, or when working with very large datasets. They are also useful in data warehousing scenarios where the data does not change frequently.
8. How will you handle missing values (NULLs) in SQL?
Handling missing values (NULLs) in SQL is a common challenge in database management and analysis, as NULLs can affect the outcome of your queries and reports. The approach to managing NULLs can vary depending on the context and the specific requirements of your database or application. Here are several strategies for handling missing values in SQL:
1. Using IS NULL
and IS NOT NULL
- Purpose: To filter rows based on whether a column contains NULL values or not.
- Example: Select all rows where the
email
column is NULL.
SELECT * FROM users WHERE email IS NULL;
2. COALESCE
Function
- Purpose: To return the first non-NULL value in a list of arguments.
- Example: If you want to display a default value when a column value is NULL.
SELECT COALESCE(column_name, 'default_value') FROM table_name;
3. IFNULL
Function (or ISNULL
in some SQL dialects)
- Purpose: Similar to
COALESCE
, but specifically designed to replace NULL with a specified value. - Example: Replace NULL in the
address
column with ‘Not Provided’.
SELECT IFNULL(address, 'Not Provided') FROM contacts;
9. How do you implement row-level security in SQL databases?
Row-level security (RLS) is a mechanism that controls access to rows in a database table based on the characteristics of the user executing a query. It’s an effective way to ensure that users can only access data that’s relevant to them, enhancing data security and privacy. Implementing RLS typically involves defining security policies that specify which rows users can access.
10. How can you improve the performance of a slow-performing SQL query?
Improving the performance of a slow-performing SQL query involves analyzing and optimizing various aspects of the query and the database environment.
Here are several strategies:
1. Optimize Indexes
- Add Necessary Indexes: Ensure that all columns used in
WHERE
,JOIN
,ORDER BY
, andGROUP BY
clauses have indexes to speed up searches. - Avoid Over-indexing: Having too many indexes can slow down write operations (
INSERT
,UPDATE
,DELETE
), so keep the indexing to necessary columns. - Use Index Hints: Some databases allow you to suggest which index to use for a query, although this should be used with caution as it overrides the query optimizer.
2. Refine Query Conditions
- Use Specific Conditions: Narrow down the result set as much as possible in your
WHERE
clause to minimize the amount of data processed. - Avoid Functions on Indexed Columns in WHERE: Applying functions to columns in the
WHERE
clause can prevent the use of indexes. For example, avoid conditions likeWHERE YEAR(date_column) = 2023
.
3. Optimize Joins
- Reduce the Number of Joins: Each join can add complexity and slow down a query. Make sure you only join tables that are necessary for your result set.
- Use Appropriate Join Types: Understand the differences between
INNER JOIN
,LEFT JOIN
,RIGHT JOIN
, etc., and choose the most efficient one for your needs.
4. Limit the Result Set
- Use
LIMIT
: If you only need a subset of the data, useLIMIT
to reduce the amount of data that needs to be processed and returned. - Selective Column Selection: Only select the columns you need instead of using
SELECT *
.
5. Query and Database Structure Optimization
- Normalize Your Data: Ensure your database is properly normalized to eliminate redundant data and improve overall efficiency.
- Denormalize if Necessary: In some cases, denormalization can improve performance by reducing the number of joins needed.
- Partition Tables: For very large tables, consider partitioning them to improve query performance by limiting the number of rows to scan.
6. Use Caching
- Application-Level Caching: Cache frequent queries at the application level to avoid hitting the database for repeated queries.
- Database Caching Options: Explore and configure caching options provided by your database system.
11. Difference between a database index and a database view.
Feature | Database Index | Database View |
---|---|---|
Definition | A data structure that improves the speed of data retrieval operations by creating a pointer to the data in a table. | A virtual table based on the result-set of an SQL statement that does not store data itself but presents data stored in other tables. |
Purpose | To speed up the retrieval of records from a table by minimizing the number of disk accesses required. | To simplify complex queries, present data in a specific format, and provide a level of abstraction and security by hiding the complexity of the underlying database schema. |
Storage | Requires additional disk space for storing the index structure. Index maintenance may impact write operations. | Does not require additional storage for data as it is a virtual layer; however, it depends on the underlying tables for data. |
Operation | Used to quickly locate and access the data without searching every row in a table every time it is accessed. | When queried, it runs the underlying SQL statement to return a result set as if it were a regular table. |
Maintenance | Managed automatically by the database system but can be manually optimized by creating or removing indexes. | Defined by SQL queries that encapsulate selection, joins, and formatting from one or more tables. Needs to be updated if underlying data structures change significantly. |
Performance | Speeds up query performance by reducing the search space but requires additional time and space for index creation and maintenance. | Can consume more computational resources at query time, depending on the complexity of the SQL statement it encapsulates. |
Visibility | Typically invisible to users and managed by database administrators. | Can be used by users and applications to interact with data in a simplified or specific format without knowing the complexities of the database schema. |
12. Difference between optimistic and pessimistic locking.
Feature | Optimistic Locking | Pessimistic Locking |
---|---|---|
Basic Concept | Assumes that multiple transactions can complete without interfering with each other and checks for conflicts before committing. | Assumes that conflicts are likely and locks resources to prevent other transactions from accessing them simultaneously. |
Concurrency Control | High, as it allows multiple transactions to proceed concurrently until the commit point. | Lower, as it restricts concurrent access to resources by locking them. |
Locking Mechanism | No locks are used during the transaction. Conflicts are detected at commit time using versioning or timestamps. | Resources (rows, tables) are locked explicitly for the duration of a transaction to prevent other transactions from access. |
Use Case | Suitable for environments with low to moderate contention, where conflicts are rare. | Preferred in high-contention environments where the likelihood of conflicts is high. |
Performance | Generally better performance in low-contention scenarios due to reduced overhead from lock management. | Can lead to decreased performance in scenarios with frequent access conflicts due to locking overhead and wait times. |
Risk of Deadlocks | Low, as resources are not explicitly locked. | Higher, due to the explicit locking of resources, which can lead to deadlocks if not carefully managed. |
System Overhead | Lower, because it does not require the overhead of lock management. | Higher, due to the overhead of managing locks and possibly dealing with lock contention and deadlocks. |
Complexity | Higher complexity in conflict resolution at commit time, requiring mechanisms to retry or abort transactions. | Higher complexity in lock management but straightforward handling of transactions once locks are acquired. |
Data Freshness | May work with stale data and only checks for conflicts at commit time, which could lead to rollbacks. | Always works with the most current data, as locks ensure serialized access to resources. |
Failure Recovery | Easier, as fewer transactions are affected by rollbacks due to the late detection of conflicts. | More complex, as locked resources must be released properly on transaction failure to prevent hanging locks. |
13. How can you find duplicate records in a table?
Finding duplicate records in a SQL table involves using queries that can identify rows where one or more columns have the same values. There are several ways to achieve this, depending on the specific requirements and the SQL dialect you’re using.
Here’s a general approach using standard SQL:
Using GROUP BY
and HAVING
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
Using Window Functions
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY column_name) AS rn
FROM table_name
) tmp
WHERE rn > 1;
14. Write a SQL query to retrieve distinct values from a column.
To retrieve distinct values from a column in a SQL database, you can use the DISTINCT
keyword in your SELECT
statement.
SELECT DISTINCT column_name
FROM table_name;
Example:
If you have a table named employees
with a column named department
, and you want to list all unique departments, your query would look like this:
SELECT DISTINCT department
FROM employees;
15. Write a query to calculate the average of a numeric column.
To calculate the average of a numeric column in a SQL table, you can use the AVG()
aggregate function. This function computes the average value of a specified numeric column.
Here’s a basic example of how to use the AVG()
function:
SELECT AVG(numeric_column_name) AS average_value
FROM table_name;
Example
If you have a table named sales
with a column named amount
that records the sale amount for each transaction, and you want to find the average sale amount, your query would look like this:
SELECT AVG(amount) AS average_sale_amount
FROM sales;
16. Write a query to find the top N records in a table.
To find the top N records in a table based on a specific column, you can use the ORDER BY
clause to sort the data and then limit the number of records returned with the LIMIT
clause (in databases like MySQL, PostgreSQL, SQLite) or the TOP
clause (in SQL Server).
Syntax:
SELECT *
FROM table_name
ORDER BY column_name DESC
LIMIT N;
Example:
SELECT *
FROM employees
ORDER BY salary DESC
LIMIT 5;
17. How can you update existing records in a table?
Updating existing records in a SQL table is performed with the UPDATE
statement. This command allows you to modify values in one or more columns of existing rows based on specific conditions.
Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Example
Suppose you have a table named employees
with columns employee_id
, name
, salary
, and department
. If you want to increase the salary of an employee with employee_id
of 101 by 10%, you could use the following query:
UPDATE employees
SET salary = salary * 1.1
WHERE employee_id = 101;
18. What is a stored procedure?
A stored procedure is a prepared SQL code that you can save and use it over and over again. Suppose there are a SQL query that you are writing again and again. So save it as a stored procedure, and then just call it to execute it. You can also pass the parameter to store procedure.
- BEGIN: This is the executable part.
- END: Up to this, the code will get executed.
Stored Procedure Syntax
CREATE PROCEDURE procedure_name
(parameter1 data_type, parameter2 data_type, …)
AS
BEGIN
— SQL statements to be executed
END
Execute a Stored Procedure
EXEC procedure_name;
Example
-- Example of a stored procedure
CREATE PROCEDURE sp_GetEmployeeName
@EmployeeID INT
AS
BEGIN
SELECT EmployeeName
FROM Employee
WHERE EmployeeID = @EmployeeID;
END;
-- Calling the stored procedure
EXEC sp_GetEmployeeName @EmployeeID = 3001;
19. How do you handle transactions in SQL?
Handling transactions in SQL involves managing a sequence of one or more SQL operations as a single, atomic unit of work. A transaction must conform to the ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure data integrity and consistency, even in the event of errors or system failures.
Here’s how you can handle transactions in SQL:
1. Begin Transaction
A transaction is started with the BEGIN TRANSACTION
statement (or just BEGIN
in some RDBMS like PostgreSQL). This marks the starting point of a transaction.
BEGIN TRANSACTION;
2. Perform SQL Operations
After starting a transaction, you can execute one or more SQL statements (such as INSERT
, UPDATE
, DELETE
) as part of this transaction. These operations will be treated as a single unit of work.
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
3. Commit Transaction
If all operations within the transaction are successful, you can commit the transaction using the COMMIT
statement. This makes all changes made during the transaction permanent and visible to other users.
COMMIT;
4. Rollback Transaction
If any operation within the transaction fails or if you detect an error in your application logic, you can roll back the transaction using the ROLLBACK
statement. This undoes all changes made during the transaction, returning the database to its state at the beginning of the transaction.
ROLLBACK;
Handling Transactions in Different RDBMS
- SQL Server: Uses
BEGIN TRANSACTION
,COMMIT
, andROLLBACK
. - MySQL: Uses
START TRANSACTION
orBEGIN
,COMMIT
, andROLLBACK
. - PostgreSQL: Uses
BEGIN
,COMMIT
, andROLLBACK
. - Oracle: Uses
BEGIN
(implicitly starts with any DML statement if not already in a transaction),COMMIT
, andROLLBACK
. Also, Oracle includes aSAVEPOINT
feature that allows you to set savepoints within a transaction to which you can roll back.
20. Use of LAG and LEAD functions in window functions?
The LAG
and LEAD
functions are powerful tools in SQL that belong to the family of window functions. They are used to access data from a previous row (LAG
) or a following row (LEAD
) in the same result set, without the need for a self-join.
LAG Function
The LAG
function allows you to access data from a previous row in the same result set. It’s useful for comparing current row data with data from a preceding row.
LAG(column_name, offset, default_value) OVER (PARTITION BY partition_column ORDER BY order_column)
LEAD Function
The LEAD
function allows you to access data from a subsequent row in the same result set. It’s used for comparing current row data with data from a following row.
LEAD(column_name, offset, default_value) OVER (PARTITION BY partition_column ORDER BY order_column)
Database Design Interview Questions
1. What is database normalization, and why is it important?
Database normalization is a process applied during the design of a database to minimize redundancy and dependency by organizing fields and tables of a database. It involves dividing large tables into smaller, related tables and defining relationships between them to achieve a more efficient database structure. The primary goals of normalization include improving data integrity, reducing data redundancy, and enhancing database performance.
Importance of Database Normalization:
- Data Integrity: Normalization helps in maintaining data integrity by ensuring that each piece of data is stored only once. This prevents inconsistencies and errors that can occur when data is duplicated across the database.
- Reduced Data Redundancy: By eliminating duplicate data, normalization reduces the amount of data storage required, which can improve database performance.
- Improved Query Performance: A well-normalized database structure can lead to more efficient queries. Smaller, more focused tables reduce the amount of data that needs to be scanned for most queries.
- Easier Database Maintenance: Updates, inserts, and deletes are simpler and less error-prone in a normalized database since changes are made in one place. This also makes the database more scalable.
- Flexibility for Future Modifications: A normalized database provides a clear structure that can be easily modified to accommodate future data requirements without a major overhaul of the database structure.
2. What is denormalization, and When is it appropriate?
Denormalization is the process of intentionally adding redundancy to a database to improve performance in certain read-heavy operations. It involves reversing the principles of normalization by combining tables and allowing duplicate data within the database structure. While normalization aims to minimize redundancy and ensure data integrity by dividing a database into well-structured tables, denormalization focuses on optimizing read performance, sometimes at the cost of increased storage and potential maintenance challenges.
When is Denormalization Appropriate?
- Improved Query Performance: Denormalization can reduce the number of joins needed in queries, which may improve performance for read-heavy applications. This is particularly beneficial in situations where complex joins across multiple tables significantly degrade query execution times.
- Simplified Queries: By consolidating data into fewer tables, denormalization can simplify the complexity of SQL queries, making them easier to write and understand. This can be advantageous for applications where data retrieval patterns are predictable and highly specific.
- Reporting and Analytics: In data warehousing and analytical processing, denormalization is often used to structure data in a way that aligns with reporting and analysis needs, facilitating faster data retrieval for analytical queries.
- Handling High-Load Read Operations: For databases that serve high volumes of read requests, denormalization can help distribute the load more efficiently, especially when coupled with caching mechanisms.
- Scalability: In some cases, denormalization can aid in scaling the database to handle more reads by reducing the computational overhead associated with table joins.
3. What is a surrogate key, and why might you use one?
A surrogate key is a type of primary key that is artificially generated within a database. It does not derive from the data itself but is usually an incrementing number (such as a sequence number or an auto-incremented ID) assigned to each record uniquely. Surrogate keys are commonly used in relational database systems.
Example
Consider an Employees
table in a company’s database designed to store information about each employee. Instead of using a piece of employee information (like Social Security Number) as a primary key, you decide to use a surrogate key.
MySQL
CREATE TABLE Employees (
EmployeeID INT AUTO_INCREMENT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100),
SocialSecurityNumber VARCHAR(11)
);
In these examples, the EmployeeID
column serves as the surrogate key. It is automatically generated by the database system for each new employee record added to the table, starting with 1 and incrementing by 1 for each new record (the increment value can be adjusted if needed). This EmployeeID
is used to uniquely identify each employee within the database, regardless of their personal details.
4. What is an ER diagram, and how is it used in database design?
An Entity-Relationship (ER) diagram is a graphical representation of entities and their relationships to each other, typically used in database design. An ER diagram helps in structuring and organizing data, illustrating how different entities (like products, employees, or customers) relate to each other within a system.
Components of an ER Diagram
- Entities: Represented as rectangles, entities are objects or concepts about which data is stored. Each entity corresponds to a table in the database.
- Attributes: Represented as ovals connected to their entity, attributes are the properties or details of an entity, corresponding to columns in a table.
- Relationships: Represented as diamonds or lines connecting entities, relationships show how entities are related to each other. They can be one-to-one, one-to-many, or many-to-many, indicating how many instances of one entity can be associated with instances of another entity.
- Primary Keys and Foreign Keys: Often indicated in ER diagrams, these keys enforce the relationships between entities and ensure data integrity.
5. What are cardinality and ordinality in a database relationship?
Cardinality and ordinality are concepts used in database design to describe the nature of relationships between entities in an Entity-Relationship (ER) diagram or database schema.
Cardinality
Cardinality refers to the numerical relationship between two entities, indicating how many instances of one entity relate to one instance of another entity. It is essential for defining the business rules that govern the relationship between data items. The main types of cardinality are:
- One-to-One (1:1): Each instance of Entity A is related to one and only one instance of Entity B, and vice versa. For example, each person has one birth certificate, and each birth certificate is associated with one person only.
- One-to-Many (1:N): An instance of Entity A can be associated with zero, one, or multiple instances of Entity B, but each instance of Entity B is related to only one instance of Entity A. For example, a mother (Entity A) can have multiple children (Entity B), but each child has only one mother.
- Many-to-One (N:1): This is the inverse of 1:N, where many instances of Entity A are associated with one instance of Entity B. For example, many books (Entity A) can be published by one publisher (Entity B).
- Many-to-Many (M:N): Instances of Entity A can be associated with multiple instances of Entity B, and instances of Entity B can be associated with multiple instances of Entity A. For example, students (Entity A) can enroll in multiple courses (Entity B), and each course can have multiple students enrolled in it.
Ordinality
Ordinality, sometimes referred to as participation, describes the level of participation of an entity in a relationship and indicates whether the presence of an entity in a relationship is mandatory or optional. The two types of ordinality are:
- Mandatory Participation: In this relationship, an entity must participate in the relationship. For example, in a relationship between employees and departments, if every employee must belong to a department, the participation of the employee in the relationship is mandatory.
- Optional Participation: In this relationship, an entity does not need to participate in the relationship. For example, in a relationship between employees and projects, if some employees might not be assigned to any project, the participation of the employee in the relationship is optional.
6. How do you model a many-to-many relationship in a database?
Modeling a many-to-many (M:N) relationship in a relational database involves creating an intermediate table, often called a junction table, association table, or linking table. This table serves to break down the many-to-many relationship into two one-to-many (1:N) relationships, thereby maintaining normalization standards and ensuring the integrity and scalability of the database design.
Steps to Model a Many-to-Many Relationship:
- Identify the Entities: Determine the two entities that have a many-to-many relationship.
- Create a Junction Table: Create a new table specifically to represent the relationship between these two entities. This table should have its own primary key (either a composite key made up of the foreign keys or a new surrogate key).
- Add Foreign Keys: Include in the junction table foreign key columns that reference the primary keys of the two entities involved in the many-to-many relationship.
- Define Additional Attributes (if necessary): Besides the foreign keys, the junction table can also contain additional columns for attributes that are specific to the relationship.
Example:
Consider a scenario where you have two entities, Students
and Courses
, with a many-to-many relationship: a student can enroll in multiple courses, and a course can include multiple students.
Step 1: Identify the Entities
Students
Table:StudentID
(Primary Key)StudentName
Courses
Table:CourseID
(Primary Key)CourseName
Step 2 & 3: Create a Junction Table with Foreign Keys
StudentCourses
Table (the junction table):StudentID
(Foreign Key referencingStudents
)CourseID
(Foreign Key referencingCourses
)- In this case, the combination of
StudentID
andCourseID
can serve as a composite primary key for theStudentCourses
table, uniquely identifying each row.
Step 4: Define Additional Attributes (Optional)
- If there are attributes specific to the enrollment of a student in a course (like
EnrollmentDate
), these can be added to theStudentCourses
table as well.
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100)
);
CREATE TABLE Courses (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100)
);
CREATE TABLE StudentCourses (
StudentID INT,
CourseID INT,
EnrollmentDate DATE,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
7. Concept of referential integrity.
Referential integrity is a key concept in the relational database model, ensuring the consistency and accuracy of data within relationships. It is based on the idea that every foreign key in one table should have a corresponding primary key in another table, maintaining the validity of the links between tables.
8. What is a composite key, and when should it be used?
A composite key is a type of candidate key that is composed of two or more columns used together to uniquely identify a row in a table. Each column in a composite key may not be unique by itself within the database, but the combination of columns is guaranteed to be unique. This is particularly useful in scenarios where a single column cannot serve as a unique identifier for the records.
Example
Consider a database for a school where a StudentCourses
table is used to track which courses students are enrolled in. The table might have the following columns: StudentID
, CourseID
, and EnrollmentDate
. No single column is unique by itself; a student can enroll in multiple courses, and a course can have multiple students. However, the combination of StudentID
and CourseID
can uniquely identify each row in the StudentCourses
table, making it a composite key.
CREATE TABLE StudentCourses (
StudentID INT,
CourseID INT,
EnrollmentDate DATE,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
9. What is the purpose of indexing in database design?
Key Purposes
- Improved Query Performance: Indexing can drastically reduce the time it takes to retrieve data from a database, especially for SELECT queries and WHERE clauses. By using an index, the database engine can find the data without scanning every row in a table, effectively reducing the amount of data it needs to examine.
- Efficient Data Access: Indexes provide a structured and quick way to access rows in a table directly, which is particularly beneficial for operations that need to look up or sort data based on specific columns.
- Optimization of Join Operations: In databases where tables are often joined on certain columns, indexing those columns can make join operations much more efficient.
- Enhanced Sorting Capabilities: Indexes can improve the speed of sorting operations by maintaining an ordered list of column values, which can be particularly useful for ORDER BY and GROUP BY clauses in SQL queries.
- Facilitation of Uniqueness: Unique indexes enforce uniqueness for the indexed columns, ensuring that no two rows have the same values in those columns. This is crucial for maintaining the integrity of primary keys and unique constraints.
- Support for Text Searches: Some specialized indexes, like full-text indexes in certain NoSQL databases or extensions in SQL databases, optimize text search queries, making it faster to search through large volumes of text-based data.
10. Concept of data redundancy in database design.
Data redundancy in database design refers to the unnecessary duplication of data within a database. It occurs when the same piece of data is stored in multiple places, either within the same table or across different tables in the database. While sometimes intentional for specific purposes, such as improving query performance or ensuring data availability, unintended redundancy can lead to several issues and inefficiencies.
Causes of Data Redundancy
- Poor Database Design: A database schema that has not been normalized properly can lead to data redundancy. Without normalization, tables may be designed in a way that replicates data unnecessarily.
- Denormalization: As a deliberate design choice, denormalization introduces redundancy to optimize read operations, especially in data warehousing and reporting databases where query performance is critical.
- Lack of Data Management Policies: Inconsistent data entry and lack of strict data management policies can lead to duplicate records and inconsistent data across tables.
11. How do you ensure data consistency and integrity in a database design?
Ensuring data consistency and integrity in database design is crucial for maintaining accurate, reliable, and meaningful data across a database system. Data integrity involves safeguarding the accuracy and consistency of data over its entire lifecycle, while data consistency ensures that data is the same and remains intact across all copies and after any operation such as transfer, storage, or retrieval.
Several strategies to achieve data consistency and integrity:
1. Use of Constraints
- Primary Key Constraints: Ensure that each row in a table has a unique identifier, preventing duplicate records.
- Foreign Key Constraints: Enforce referential integrity between tables, ensuring that relationships between them remain consistent.
- Unique Constraints: Guarantee that all values in a column are unique, preventing duplicate values in fields that require uniqueness.
- Check Constraints: Specify rules that the data in a column must follow, ensuring data validity according to business logic (e.g., age > 18).
2. Data Types and Column Properties
- Carefully select data types that match the nature of the data being stored (e.g.,
DATE
for dates,INT
for integers). This prevents incompatible data from being stored in a column. - Use column properties like
NOT NULL
to ensure that critical fields do not remain empty.
3. Normalization
- Apply normalization principles to eliminate redundant data, reduce complexity, and ensure logical data storage. Normalization typically involves dividing a database into two or more tables and defining relationships between the tables to minimize redundancy.
4. Transactions
- Implement transactions with ACID (Atomicity, Consistency, Isolation, Durability) properties to ensure that all database operations are performed accurately and consistently, even in the case of errors or failures. Transactions ensure that operations on the database take it from one consistent state to another.