ProblemI was a running a routine query using an equal operator on the only column of the primary key for a table and I noticed that the performance was terrible. These queries should have been flying because all I was doing was retrieving one row of data which should have been doing an index seek. When I looked at the query plan it was doing a scan instead. This tip shows you what I found and how to resolve the problem.
Solution
The problem I was facing can be seen by doing a similar query in the SQL Server 2005 AdventureWorks database on the HumanResources.Employee table. To help us understand what SQL Server is doing when we run these queries, let's ask for IO statistics and also use the SSMS menu command Query\Include Actual Execution Plan.
To use the AdventureWorks database and turn on IO statistics, start with this query:
use AdventureWorks
go
SET STATISTICS IO ON
go
|
Here's a query on the Employee table which is similar to the ones that caused me so much difficulty:
SELECT EmployeeID, NationalIDNumber, LoginID
FROM HumanResources.Employee
WHERE NationalIDNumber = 112457891
go
|
It seems pretty innocuous. The HumanResources.Employee table has an index that begins with NationalIDNumber so executing the query should just be a matter of seeking to the location of 112457891 and then doing a lookup to get the table row. But the statistics and the query plan show otherwise. Here are the messages:
EmployeeID NationalIDNumber LoginID
----------- ---------------- ----------------------
4 112457891 adventure-works\rob0
(1 row(s) affected)
Table 'Employee'. Scan count 1, logical reads 6,
physical reads 0, read-ahead reads 0,
lob logical reads 0, lob physical reads 0,
lob read-ahead reads 0.
(1 row(s) affected) |
The statistics show 1 scan and that's the problem. Adventureworks.HumanResources.Employee only has 291 rows, so this probably ran really fast and doesn't seem like a problem. The table that I was working on had millions of rows and the table scan was a killer taking several seconds for each query..
Since the NationalIDNumber column is at the start of an index and the only column for this index, why was there a scan and not a seek? The query plan below tells us why. Here's the overall plan where you can see the index scan:
The tool tip for the index scan gives the details that make all the difference, as shown below.
The red arrow points to the problem. The function call CONVERT_IMPLICIT(int, [AdventureWorks].[HumanResources].[Employee].[NationalIDNumber, 0) is modifying the NationalIDNumber column before it is compared to the integer constant 1124579811 which we are passing into this query. If you look at the table definition you'll see that NationalIDNumber is declared to be nvarchar(15). Once there is a function, even an implicit one as we see here, used on the column SQL Server can't use the index on NationalDNumber and it falls back on a scan.
Changing the query to compare to a string constant and the problem goes away:
SELECT EmployeeID, NationalIDNumber, LoginID
FROM HumanResources.Employee
WHERE NationalIDNumber = '112457891'
go
|
The messages now shows that there are zero scans, which is what we want. In this case the difference in logical reads is only 2, that's because the Employee table is so small. Working with a million row table, the difference in logical reads grows into the thousands.
EmployeeID NationalIDNumber LoginID
----------- ---------------- ----------------------
4 112457891 adventure-works\rob0
(1 row(s) affected)
Table 'Employee'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected) |
The query plan shows the seek and the key lookup which is what we would have expected to occur.
This problem is common where a character string has been used to store a numeric key. SQL Server will dutifully perform the implicit conversion and return the correct result, but at the cost of poor performance by doing a scan instead of a seek and using the index correctly.
No comments:
Post a Comment