Writing JOINs vs Nested Select Queries
I have seen a lot of is queries written without using the JOIN syntax. Instead the developer chose to write the query using an IN clause in the WHERE clause with a SELECT statement; a Nested Query. While many people blame nested queries on developers that don't understand joins, they can actually be beneficial. Let's take a query out of the AdventureWorks DB.
An example would be:
SELECT Distinct C.AccountNumber
FROM Sales.Customer C
INNER JOIN Sales.SalesOrderHeader O ON O.CustomerID = C.CustomerID
INNER JOIN Sales.SalesPerson SP ON SP.SalesPersonID = O.SalesPersonID
WHERE SP.CommissionPct >= .015
this can also written as
SELECT AccountNumber
FROM Sales.Customer
WHERE CustomerID IN (SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IN (SELECT SalesPersonID
FROM Sales.SalesPerson
WHERE CommissionPct >= .015))
The second query is very messy to read and I personally hate looking at queries like this, but it is faster. To note I have an index on the SalesOrderHeader table with columns SalesPersonID, CustomerID. A distinct has to be used because multiple orders exist for a SalesPerson and Multiple Orders belong to multiple customers.
The following are the two execution plans.


Basically, the point is to not throw out a query because it looks messy. Every query must be analyzed for performance and different approaches should be looked at.
An example would be:
SELECT Distinct C.AccountNumber
FROM Sales.Customer C
INNER JOIN Sales.SalesOrderHeader O ON O.CustomerID = C.CustomerID
INNER JOIN Sales.SalesPerson SP ON SP.SalesPersonID = O.SalesPersonID
WHERE SP.CommissionPct >= .015
this can also written as
SELECT AccountNumber
FROM Sales.Customer
WHERE CustomerID IN (SELECT CustomerID
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IN (SELECT SalesPersonID
FROM Sales.SalesPerson
WHERE CommissionPct >= .015))
The second query is very messy to read and I personally hate looking at queries like this, but it is faster. To note I have an index on the SalesOrderHeader table with columns SalesPersonID, CustomerID. A distinct has to be used because multiple orders exist for a SalesPerson and Multiple Orders belong to multiple customers.
The following are the two execution plans.


Basically, the point is to not throw out a query because it looks messy. Every query must be analyzed for performance and different approaches should be looked at.


Comments