menu

Friday, May 27, 2011

Improving SQL Server performance when using table variables

Problem
Batches or store procedures that execute join operations on table variables may experience performance problems if the table variable contains a large number of rows.
Solution
Table variable were introduced in SQL Server with the intention to reduce recompiles, however if they are used in batches or store procedures they may cause a performance issue.
Let’s see how this can occur with an example. 
First let's create a permanent table and populate it with 100,000 rows.
SET STATISTICS profile OFF GO USE tempdb
GO 
IF OBJECT_ID ('TestPermTable'IS NOT NULL DROP TABLE TestPermTable
GO 
CREATE TABLE TestPermTable (a2 VARCHAR(10),c2 INTGO CREATE INDEX index_TestPermTable ON TestPermTable(c2GO --insert 100,000 rows into the perm table SET NOCOUNT ON
BEGIN TRAN
DECLARE 
@i INT
SET 
@i WHILE @i 100000 BEGIN
INSERT INTO 
TestPermTable VALUES ('asdfghjkli',@iSET @i @i END
COMMIT TRAN 
GO --update stats UPDATE STATISTICS TestPermTable WITH fullscan
GO
Let's join the permanent table TestPermTable with a variable table containing 100,000 rows.
SET NOCOUNT ON
DECLARE 
@VariableTable TABLE (c1 INTBEGIN TRAN
DECLARE 
@i INT
SET 
@i WHILE @i 100000 BEGIN
INSERT INTO 
@VariableTable VALUES (@iSET @i @i END
COMMIT TRAN

SET STATISTICS 
profile ON
SET STATISTICS 
time ON
SELECT 
@i=TestPermTable.c2 FROM @VariableTable INNER JOIN TestPermTable ON c1=c2
GO
SET STATISTICS time OFF
SET STATISTICS 
profile OFFGO
As we can see below, when we started the batch the variable table was empty, so the optimizer estimated a low number of rows. The EstimateRows value for @VariableTable is 1 and therefore the optizmer selected a Nested Loopsjoin.
An under estimation of rows caused the optimizer to pick an inefficient plan, as we can see a Nested Loops join was used with the table variable as the outer table. The Nested Loops join uses one join input as the outer input and one as the inner input. The outer loop consumes the outer input table row by row. The inner loop, executed for each outer row, searches for matching rows in the inner input table. A Nested Loops join is particularly effective if the outer input is small and the inner input is preindexed and large. In many small transactions, such as those affecting only a small set of rows, index Nested Loops joins are superior to both Merge joins and Hash joins. In large queries however, Nested Loops joins are often not the optimal choice.
The solution to our problem, is to add the recompile option to the statement that involves the table variable joining with the tables.   By doing this, SQL Server will be able to detect the number of rows at recompile, because the rows have already been populated.  This option is only available in SQL Server 2005 and later.
SET NOCOUNT ON
DECLARE 
@VariableTable TABLE (c1 INTBEGIN TRAN
DECLARE 
@i INT
SET 
@i WHILE @i 100000 BEGIN
INSERT INTO 
@VariableTable VALUES (@iSET @i @i END
COMMIT TRAN

SET STATISTICS 
profile ON
SET STATISTICS 
time ON
SELECT 
@i=TestPermTable.c2 FROM @VariableTable INNER JOIN TestPermTable ON c1=c2 OPTION (recompile)GOSET STATISTICS time OFF
SET STATISTICS 
profile OFFGO
As you can see, the optimizer knows that our variable table contains 100,000 rows and consequently picks a Hash join which for this example is much faster as the statistics time shows below.

No comments:

Post a Comment