menu

Wednesday, June 1, 2011

Stairway to SQL Server Indexes: Step 2, Deeper into Nonclustered Indexes

Indexes are fundamental to database design, and tell the developer using the database a great deal about the intentions of the designer. Unfortunately indexes are too often added as an afterthought when performance issues appear. Here at last is a simple series of articles that should bring any database professional rapidly 'up to speed' with them
Step 1 in the SQL Server Indexes Stairway introduced SQL Server indexes, in general, and nonclustered indexes specifically.  As our first case study, we demonstrated the potential benefit of an index when retrieving a single row from a table.  In this step, we continue our investigation of nonclustered indexes; examining their contribution to good query performance in cases that go beyond retrieving a single row from a table.
As will be the case in most of these our steps, we introduce a small amount of theory, examine some index internals to help explain the theory, and then execute some queries.  These queries are executed with and without indexes, and with performance reporting statistics turned on, so that we can view the impact of the indexes.
We will be using the subset of tables from the AdventureWorks database that we used in step 1, concentrating on the Contact table throughout this step. We will use just one index, the FullName index that we used in step 1, to illustrate our points.  To ensure that we control the indexing on the Contact table, we’ll make two copies of the table in the dbo schema, and only build the FullName index on one of them. This will give us our controlled environment: two copies of the table: one with a single nonclustered index and one without any indexes.
Note:
All TSQL code shown in this Stairway step can be downloaded at the bottom of the article.
The code in Listing 1 makes the copies of the Person.Contact table, and we can rerun this batch anytime we wish to start with a ‘clean slate’.
IF EXISTS (

    SELECT * 

        FROM sys.tables 

        WHERE OBJECT_ID = OBJECT_ID('dbo.Contacts_index'))
DROP TABLE dbo.Contacts_index;
GO
IF EXISTS (

    SELECT * 

        FROM sys.tables 

        WHERE OBJECT_ID = OBJECT_ID('dbo.Contacts_noindex'))

    DROP TABLE dbo.Contacts_noindex;
GO
SELECT * INTO dbo.Contacts_index 

    FROM Person.Contact;
SELECT * INTO dbo.Contacts_noindex 

    FROM Person.Contact;
Listing 2.1: Make copies of the Person.Contact table
A snippet of the Contacts table is shown here:
ContactID   FirstName  MiddleName LastName  EmailAddress
                                    .
                                    .

1288        Laura      F          Norman    laura1@adventure-works.com
651         Michael               Patten    michael20@adventure-works.com
1652        Isabella   R          James     isabella6@adventure-works.com
1015        David      R          Campbell  david8@adventure-works.com
1379        Balagane              Swaminath balaganesan0@adventure-works.c
742         Steve                 Schmidt   steve3@adventure-works.com
1743        Shannon    C          Guo       shannon16@adventure-works.com
1106        John       Y          Chen      john2@adventure-works.com
1470        Blaine                Dockter   blaine1@adventure-works.com
833         Clarence   R.         Tatman    clarence0@adventure-works.com
1834        Heather    M          Wu        heather6@adventure-works.com
1197        Denise     H          Smith     denise0@adventure-works.com
560         Jennifer   J.         Maxham    jennifer1@adventure-works.com
1561        Ido                   Ben-Sacha ido1@adventure-works.com
924         Becky      R.         Waters    becky0@adventure-works.com

.

The Nonclustered Index Entry

The following statement creates our FullName nonclustered index on the Contacts_index table.
CREATE INDEX FullName

            ON Contacts_index

    ( LastName, FirstName );
Listing 2.2 - Creating a nonclustered index
Remember that a nonclustered index stores the index keys in order, along with a bookmark that is used to access the actual data in the table itself. You can think of the bookmark as a kind of pointer. Future steps will describe the bookmark, its form and its use, in more detail. 
A snippet of the FullName index is shown here, consisting of the LastName and FirstName as Key columns, plus the bookmark:
:---      Search Key Columns   :         Bookmark
                                     .
Russell          Zachary                  => 
Ruth             Andy                     => 
Ruth             Andy                     => 
Ryan             David                    => 
Ryan             Justin                   => 
Sabella          Deanna                   => 
Sackstede        Lane                     => 
Sackstede        Lane                     => 
Saddow           Peter                    => 
Sai              Cindy                    => 
Sai              Kaitlin                  => 
Sai              Manuel                   => 
Salah            Tamer                    => 
Salanki          Ajay                     => 
Salavaria        Sharon                   =>  
Each entry contains the index key columns and the bookmark value.  In addition, a SQL Server nonclustered index entry has some internal-use-only header information and may contain some optional data values.  Both of these will be covered in later steps; neither is important at this time for a basic understanding of nonclustered indexes.
For now, all we need to know is that the key value enables SQL Server to find the appropriate index entry(s); and that the entry’s bookmark value enables SQL Server to access the corresponding data row in the table.

The Benefit of Index Entries being in Sequence

An index’s entries are sequenced by index key value(s), so SQL Server can rapidly traverse the entries sequentially in either direction.  This scanning of sequenced entries can commence from the start of the index, the end of the index, or from any entry within the index. 
Thus, if a request asks for all the contacts whose last name begins with the letter “S” (WHERE LastName LIKE 'S%'), SQL Server can quickly navigate to the first “S” entry (“Sabella, Deanna”), and then traverse the index, using the bookmarks to access the rows, until it reaches the first “T” entry; at which point it knows that it has retrieved all the “S” entries.
The above request would execute even faster if all the selected columns were in the index.  Thus, if we issued:
SELECT FirstName, LastName 

    FROM Contact 

    WHERE LastName LIKE 'S%';
SQL Server can quickly navigate to the first “S” entry and then traverse through the index entries, ignoring the bookmarks and retrieving the data values directly from the index entries, until it reaches the first “T” entry.  In relational database terminology, the index has “covered” the query.
Any SQL operator that benefits from sequenced data can benefit from an index.  This includes ORDER BY, GROUP BY, DISTINCT, UNION (not UNION ALL), and JOIN…ON.
For instance, if a request asks for a count of contacts by last name, SQL Server can start counting at the first entry, and proceed down the index.  Every time the value of the last name changes, SQL Server outputs the current count and starts a new count.  As with the previous request, this is a covered query; SQL Server accesses just the index, ignoring the table completely.
Note the importance of the left-to-right sequence of the key columns.  Our index is very helpful if a request asks for everyone whose last name is “Ashton”, but it is of little or no help if the request is for everyone whose first name is “Ashton”.

Testing Some Sample Queries

If you want to execute the test queries that follow, make sure you run the script to create both versions of the new Contact tables,dbo.Contacts_index and dbo.Contacts_noindex; and also run the script to create the LastName, FirstName index on dbo.Contacts_index.
To validate the assertions in the previous section, we turn on the same performance statistics that we used in step 1 and run some queries; with and without indexes. 
SET STATISTICS io ON

SET STATISTICS time ON
Because the Contacts table from the AdventureWorks database has only 19972 rows in it, it will be difficult to get meaningful values for statistics time. Most of our queries will show a CPU time value of 0, so we are not showing the output from statistics time; only from statistics IO, which reflects the possible number of pages that will have to be read.  These values will allow us to compare queries in a relative sense, to determine which queries with which indexes perform better than others.  If you want a bigger table for more realistic timing tests, a script to build a million row version of the Contact table is available with this article. All of the discussion that follows will assume you’re using the standard 19972-row table.

Testing a Covered Query

Our first query is a query that will be covered by the index; one that retrieves a limited set of columns for all contacts whose last name begins with “S”. Query execution information is given in Table 2.1.
SQLSELECT FirstName, LastName
FROM dbo.Contacts  -- execute with both Contacts_noindex and
-- Contacts_index
WHERE LastName LIKE 'S%'
Without Index(2130 row(s) affected)
Table 'Contacts_noindex'. Scan count 1, logical reads 568.
With Index(2130 row(s) affected)
Table 'Contacts_index'. Scan count 1, logical reads 14.
Index ImpactIO reduced from 568 reads to 14 reads.
CommentsAn index that covers the query is a good thing to have. Without an index, the entire table is scanned to find the rows.
The “2130 rows” statistic indicates that “S” is a popular initial letter for last names, occurring in ten percent of all contacts.
Table 2.1: Execution results when running a covered query

Testing a Non-Covered Query

Next, we modify our query to request the same rows as before, but include columns that are not in the index. Query execution information is given in Table 2.2. 
SQLSELECT *
FROM dbo.Contacts  -- execute with both Contacts_noindex and
-- Contacts_index
WHERE LastName LIKE 'S%'
Without IndexSame as previous query. (Because it is a table scan).
With Index(2130 row(s) affected)
Table 'Contact_index'. Scan count 1, logical reads 568.
Index ImpactNo impact at all.
CommentsThe index was never used during the execution of the query!
SQL Server decided that jumping from an index entry to the corresponding row in the table 2130  times (once for each row) was more work than scanning the entire table of one million rows to find the 2130  rows that it needed.
Table 2.2: Execution results when running a non-covered query

Testing a Non-Covered Query but Being More Selective

This time, we make our query more selective; that is, we narrow down the number of rows being requested.  This increases the probability that the index will be beneficial to that query.  Query execution information is given in Table 2.3.
SQLSELECT *
FROM dbo.Contacts  -- execute with both Contacts_noindex and
-- Contacts_index
WHERE LastName LIKE 'Ste%'
Without IndexSame as previous query. (Because it is a table scan).
With Index(107 row(s) affected)
Table 'Contact_index'. Scan count 1, logical reads 111.
Index ImpactIO reduced from 568 reads to 111 reads..
Comments
SQL Server accessed the 107 “Ste%” entries, all of which are located consecutively within the index.  Each entry’s bookmark was then used to retrieve to corresponding row.  The rows are not located consecutively within the table.
The index benefitted this query; but not as much as it benefitted the first query, the “covered” query; especially in terms of number of IOs required to retrieve each row. 
You might expect that reading 107 index entries plus 107 rows would require 107 + 107 reads.  The reason why only 111 reads were required will be covered at a higher step.  For now, we will say that very few of the reads were used to access the index entries; most were used to access the rows.
Since the previous query, which requested 2130 rows, did not benefit from the index; and this query, which requested 107 rows, did benefit from the index - you might also wonder “where does the tipping point lie?”  The calculations behind SQL Server’s decision also will be covered in a future step.
Table 2.3: Execution results when running a more selective non-covered query

Testing a Covered Aggregate Query

Our last sample query will be an aggregate query; that is a query that involves counting, totaling, averaging, et cetera.  In this case, it is a query that tells us the extent of name duplication within the Contact table.  
The results, in part, look like this:
Steel         Merrill             1
Steele        Joan                1
Steele        Laura               2
Steelman      Shanay              1
Steen         Heidi               2
Stefani       Stefano             1
Steiner       Alan                1


Query execution information can be seen in Table 2.4. 
SQLSELECT LastName, FirstName, COUNT(*) as 'Contacts'
FROM dbo.Contacts  -- execute with both Contacts_noindex and
-- Contacts_index
WHERE LastName LIKE 'Ste%'
GROUP BY LastName, FirstName
Without IndexSame as previous query. (Because it is a table scan).
With Index(104 row(s) affected)
Table 'Contacts_index'. Scan count 1, logical reads 4.
Index ImpactIO reduced from 568 reads to 4 reads.
Comments
All the information needed by the query is in the index; and it is in the index in the ideal sequence for calculating the counts.  All the “last name begins with ‘Ste’” entries are consecutive within the index; and within that group, all the entries for a single FirstName / LastName value are grouped together.
No accessing of the table was required; nor was any sorting of intermediate results needed.  Again, an index that covers the query is a good thing to have.
Table 2.4: Execution results when running a covered aggregate query

Testing a Non-Covered Aggregate Query

If we change the query to include columns that are not in the index, we get the performance results that we see in Table 2.5.
SQLSELECT LastName, FirstName, MiddleName, COUNT(*) as 'Contacts'
FROM dbo.Contacts  -- execute with both Contacts_noindex and
-- Contacts_index
WHERE LastName LIKE 'Ste%'
GROUP BY LastName, FirstName, MiddleName
Without IndexSame as previous query. (Because it is a table scan).
With Index(105 row(s) affected)
Table 'ContactLarge'. Scan count 1, logical reads 111.
Index ImpactIO reduced from 568 reads to 111 reads; same as the previous non-covered query
CommentsIntermediate work done while processing the query does not always appear in the statistics. Techniques that use memory or tempdb to sort and merge data are examples of this.  In reality, the benefit of an index may be greater than that shown by the statistics.
Table 2.5: Execution results when running a noncovered aggregation query

Conclusion

We know now that a nonclustered index has the following features. A nonclustered index:
  • Is a sorted set of entries.
  • Has one entry per row of the underlying table.
  • Contains an Index Key and a Bookmark.
  • Is created by you.
  • Is Maintained by SQL Server.
  • Is used by SQL Server to minimize the effort required to satisfy a client request.
And we have seen examples in which SQL Server could satisfy the request from the index alone; and some in which it ignored the index completely; and still others the used a combination of both the index and the table.  For this reason, we close step 2 by updating a statement that we made at the start of Step 1.
When a request arrives at your database, SQL Server has only three possible ways to access the data requested by the statement: 
  • Access just the nonclustered index and avoid accessing the table.  This in only possible if the index contains all the data, for this table, being requested by the query
  • Use the index key(s) to access the nonclustered index, and then use the selected bookmark(s) to access individual rows of the table.
  • Ignore the non-clustered indexes and scan the table for the requested rows.
In general, the first is ideal; and the second is preferable to the third.  In upcoming steps we show what you can do to increase the probability that your indexes will cover your popular queries, and how you can determine if your non-covered queries are sufficiently selective to benefit from your indexes.  But that will require more detailed information on the internal structure of indexes than we have yet presented.
Before we can reach that point, we need to introduce the other type of SQL Server index; the clustered index. And that is the subject of step 3.

Downloadable Code

No comments:

Post a Comment