menu

Wednesday, June 1, 2011

Index Management

Index management has always been an integral part of database management and performance tuning strategy and we do have a lot of articles and scripts dedicated to it. Well, here's my take on Index Management.
An index management strategy should be comprehensive enough to touch all aspects of indexes. A quick google gave me 1000's of link for index management; however none of them were complete enough to cater to all needs of index management.
After a bit of brainstorming I penned down 4 essential requirements for an Index Management strategy:
1. Rebuilding used indexes.
2. Filtering out unused indexes.
3. Creating missing indexes.
4. Minimum User Intervention.
If we look into the above steps, the steps are recursive, in a way that the missing indexes created in step 4 would undergo test for used or unused indexes in step 1 and step 2.
Here are some more benefits:
  • Only highly used indexes are rebuild, which decreases the overhead of rebuilding each and every indexes.
  • All unused indexes are filtered out and are saved for future analysis, where one can decide to keep them or do away with them.
  • At each step the result goes in to a history table, which helps in tracking the index information and will also help in avoiding duplicate indexes.
  • As most of index tuning revolves around DMV's, this strategy provides a controlled environment where in the index information exists even if the server is restarted.
I will now detail each step.

Step 1: Rebuilding Used Indexes

The key thing in this step is to determine the used indexes. I used sys.dm_db_index_usage_stats dmv to insert the relevant stats into a table tblIndexUsageInfo. The table schema is given below.
Create Table tblIndexUsageInfo
(
Sno int identity(1,1), 
Dbname Varchar(100),
TableName varchar(100),
IndexName Varchar(300),
Index_id int,
ConstraintType varchar(25),
Type_desc varchar(100),
IndexKeyColumn Varchar(1000),
IncludedColumn Varchar(1000),
user_seeks int,
user_scans int,
user_lookups int,
user_update int,
IndexUsage int,
IndexSizeKB int,
IndexUSageToSizeRatio decimal(10,2)
)
Besides regular information such as database, table and index name, the tblindexusage includes few important columns:
  1. ConstraintType: It tells whether the index in concern is an index or a primary key constraint.
  2. IndexKeyColumn: It Includes the Index key columns.I have used the function Uf_GetIndexCol to get the values. The code for which is given below.
  3. IncludedColumn: It Includes the Index Included columns.I have used the function Uf_GetIndexCol to get values. The code for which is given below.
  4. IndexUsage: It is the sum of user_seeks,user_scans,user_lookups and user_updates column.
  5. IndexSizeKB: It is the size of the index in question. I have used the function Uf_GetIndexSize to get the size. The code for which is given below.
  6. IndexUSageToSizeRatio: It is the index usage to index size ratio. (IndexUsage/IndexSizeKB)
The function Uf_GetIndexCol returns index key columns when @isincluded is 0 and included columns when @isincluded is 1. The code is given below.
CREATE function Uf_GetIndexCol
(@index_id int,@tableid int,@isincluded bit)
returns varchar(3000)
As
BEGIN
return 
(
stuff( 
(select ',' + sc.name from sys.columns sc,
sys.index_columns sic,sys.indexes si
where sc.column_id=sic.column_id 
and si.index_id=sic.index_id 
and sc.object_id=sic.object_id 
and si.object_id=sic.object_id 
and sic.is_included_column=@isincluded 
and si.object_id=@tableid
and si.index_id=@index_id
for xml path('')),1,1,'')
)
END
The function Uf_GetIndexSize returns the index size based on the number of reserved pages. For a heap or clusetered index the reserved is the count of pages allocated for index and table data and for a non-clustered index, it is the count of datapages used for the index. The fucntion will return NULL in case an index or a table is partitioned. The code is given below.
Create Function [dbo].[Uf_GetIndexSize]
(@index_id int,@tableid int)
Returns float
AS
BEGIN
return (select sum(cast(reserved as float))*8192/(1024) 
from sysindexes where indid=@index_id and id=@tableid)
End
I used the following code to populate the table tblIndexUsageInfo.
Create Proc proc_FilltblIndexUsageInfo
AS
Begin
Truncate table tblIndexUsageInfo
insert into tblIndexUsageInfo 
select distinct db_name(db_id()) DbName,
so.name as 'TableName',
ISNULL(si.name,'No Index') as IndexName,
si.index_id,
Case When is_primary_key=1 then 'Primary Key Constraint' 
Else 'Index' End ConstraintType, 
si.type_desc,
dbo.Uf_GetIndexCol(si.index_id,so.object_id,0) As IndexKeyColumn,
dbo.Uf_GetIndexCol(si.index_id,so.object_id,1) As IncludedCols,
spi.user_seeks,spi.user_scans,spi.user_lookups,spi.user_updates,
(user_seeks+user_scans+user_lookups+user_updates) as 'IndexUsage ',
dbo.Uf_GetindexSize(si.index_id,so.object_id) 'IndexSizeKB',
Cast(
(user_seeks+user_scans+user_lookups+user_updates)/
dbo.Uf_GetindexSize(si.index_id,so.object_id) As decimal(10,2)) 
As IndexUsagetoSizeRatio
from sys.objects so inner join sys.indexes si 
on so.object_id=si.Object_id
inner join sys.dm_db_index_usage_stats spi 
on spi.Object_id=so.Object_id
inner join sys.index_columns sic 
on sic.object_id=si.object_id 
and sic.index_id=si.index_id
inner join sys.columns sc 
on sc.Column_id=sic.column_id and sc.object_id=sic.object_id
inner join INFORMATION_SCHEMA.TABLE_CONSTRAINTS c 
on so.name=c.TABLE_NAME where so.type='u'
END
The IndexUsage and IndexUsagetoSizeRatio will be used to filter out the used and unused indexes. The higher the value of Indexusage column the higher is the significance of the index, similarly the higher the value of IndexUsagetoSizeRatio the greater is the size of an index than its usage and the less is the significance of the index.
The lower limit to an index usage or IndexUsagetoSizeRatio will depend from system to system and can only be decided by monitoring the index usage. Once the used indexes are filtered, they will be saved in a table tblMostUsedIndexes and will then be reindexed using a stored procedure. Thus there are two procedures, one to insert data into tblMostUsedIndexes based on the Index Usage criteria and another one to rebuild them.
The code for the table tblMostUsedIndexes is:
Create table tblMostUsedIndexes
(
Sno int identity(1,1),
TableName varchar(100),
IndexName varchar(1000),
Index_id int,
SchemaName Varchar(100),
TableId int,
IndexUsage int,
IndexUSageToSizeRatio decimal(10,2)
)
The procedure proc_InsertMostUsedIndexes selects the most used indexes based on the Indexusage and IndexUSageToSizeRatio.
Create Proc proc_InsertMostUsedIndexes
(@IndexUSageToSizeRatio decimal(10,2),@indexusage int)
As
Begin
insert into tblMostUsedIndexes
select b.TableName,b.IndexName,
(select index_id from sys.indexes where name=b.IndexName) As Index_id,
ss.name As Schemaname,object_id(tablename),IndexUsage,IndexUSageToSizeRatio
from tblIndexUsageInfo b,sys.tables st,sys.schemas ss 
where (b.indexusage>=@indexUsage Or IndexUSageToSizeRatio>=@IndexUSageToSizeRatio)
and st.name=tablename and st.schema_id=ss.schema_id
and b.indexname not in (select indexname from tblMostUsedIndexes)
group by b.indexname,b.tablename,ss.name,b.IndexUSageToSizeRatio,b.indexusage
End
For example, If the Indexusage and IndexUsageToSizeRatio threshold are 100 and 10.00 respectively, then
Execute proc_InsertMostUsedIndexes 10.00,100
will filter out all indexes with Indexusage greater then or equal to 100 or IndexUsageToSizeRatio greater than or equal to 10.00.
Once we get all used indexes, It's the time to rebuild them. The procedure proc_RebuildSelectedIndexes rebuilds/reorganises all indexes from the table tblMostUsedIndexes, based on the average fragmentation value. Indexes with average fragmentation less than or equal to 20 are reorganised and the rest are rebuild. I have used sys.dm_db_index_physical_stats to get the average fragmentation of an index.
Create Proc proc_RebuildSelectedIndexes
As
BEGIN
SET NOCOUNT ON
/* Code to Rebuild or Reorganise index */
Declare @Schema varchar(200),@Tablename varchar(200)
Declare @indexName varchar(400),@Qry varchar(1000),@RecCount int
Declare @avg_frag decimal,@dbid int,@ObjectId int
Declare @IndexCount int,@TotalRec int,@Index_type varchar(50)
Declare @IndexRebuildCount int,@IndexReorgCount int,@IxOpr varchar(10)Declare @index_id int
Set @IndexRebuildCount = 0
Set @IndexReorgCount = 0
set @IxOpr=''
set @dbid=db_id()
select @RecCount=sno from tblMostUsedIndexes
set @TotalRec=@RecCount
While(@RecCount>0)
BEGIN
select @Schema=schemaname,@TableName=TableName,@ObjectId=tableid,
@index_id=index_id from tblMostUsedIndexes 
where sno=@RecCount

SELECT IDENTITY(int,1,1) as Sno,a.[name] IndexName,avg_fragmentation_in_percent as avg_frag,
type_desc,a.index_id into #temp_2
FROM sys.dm_db_index_physical_stats(@dbid, @objectid, @index_id, NULL , 'Limited') as b 
join sys.indexes as a on a.object_id = b.object_id 
AND a.index_id = b.index_id and a.index_id>0 
select @IndexCount=sno from #temp_2
While(@IndexCount>0)
BEGIN
select @avg_frag=avg_frag,@IndexName=indexname,@Index_Type=type_desc
from #temp_2 where sno=@IndexCount
IF(@avg_frag<=20) 
BEGIN 
set @Qry='Alter index ' + @IndexName + ' ON ' + @Schema + '.' + @TableName + ' REORGANIZE;'
Set @IndexReorgCount=@IndexReorgCount + 1
set @IxOpr='REORGANIZE'
END
IF(@avg_frag>20) 
BEGIN 
set @Qry='Alter index ' + @IndexName + ' ON ' + @Schema + '.' + @TableName + ' REBUILD;'
Set @IndexRebuildCount = @IndexRebuildCount + 1 
set @IxOpr='REBUILD'
END 
print @qry
EXECUTE(@qry)
set @IndexCount=@IndexCount-1
END
drop table #temp_2
set @RecCount=@RecCount - 1 
END
SET NOCOUNT OFF
END
This concludes the first step where in the used indexes are filtered based on their usage and are rebuild or reorganised based on their fragmentation level.

Step2: Eliminating Unused Index

The unused indexes are filtered based on two criteria:
  1. Indexes that does not exist in the sys.dm_db_index_usage_stats dmv.
  2. Indexes with Indexusage Or IndexUsageToSizeRatio less than the threshold value
The table tblUnusedIndexes is used to save the unused index information. The table definition is given below.
Create table tblUnusedIndexes
(
UnusedIndid int identity(1,1),
Schemaname varchar(100),
tablename varchar(100),
IndexName varchar(500),
IndexUsage int,
IndexUsageToSizeRatio decimal(10,2),
IndexKey varchar(1000),
IncludedCol varchar(1000),
ConstraintType varchar(1000),
IndexSizeKB int,
DropQry varchar(4000),
IndexStatus varchar(20) default 'Active'
)
The drop query for an index or a primary key constraint is generated on the fly and is stored in DropQry column. The indexstatus column tells whether the index/primary key is dropped or not. Active means that the index/primary key is active and Drop signifies that it has been dropped. The default value is Active.
The indexkey and IncludedCol contain index key and included columns respectively, which can be used to create the index in case one feel the need for the index in future.
The procedure that will be used to populate the above table is proc_filltblUnusedIndex. The unused indexes are filtered out as explained above.
Create Procedure proc_FilltblUnusedIndexes
(@IndexUsageToSizeRatio decimal(10,2),@indexusage int)
As
Begin
insert into tblUnusedIndexes
(
Schemaname,
tablename,
IndexName,
IndexUsage,
IndexUsageToSizeRatio,
IndexKey,
IncludedCol,
ConstraintType,
IndexSizeKB,
DropQry
)
-- Indexes that does not exist in sys.dm_db_index_usage_stats
select ss.name SchemaName,so.name as TableName,
ISNULL(si.name,'NoIndex') as IndexName,0 IndexUsage,0 IndexUsageToSizeRatio,
dbo.Uf_GetIndexCol(si.index_id,so.object_id,0) As IndexKey,
dbo.Uf_GetIndexCol(si.index_id,so.object_id,1) As IncludedCol,
Case When is_primary_key=1 then 'Primary Key Constraint' 
Else 'Index'End ConstraintType,
dbo.Uf_GetIndexSize(si.index_id,so.object_id) As IndexSizeInKB,
Case When (is_primary_key=1) 
then ('alter table ' + so.name + ' drop constraint ' + si.name)
Else ('Drop Index ' + ss.name + '.' + so.name + '.' + si.name) 
End As DropQry
from sys.objects so inner join sys.indexes si on so.object_id=si.Object_id
inner join sys.schemas ss on ss.schema_id=so.schema_id
where not exists 
(select * from sys.dm_db_index_usage_stats spi where 
si.object_id=spi.object_id and si.index_id=spi.index_id)
and so.type='U' and ss.schema_id<>4 and si.index_id>0
and si.name not in (select indexname from tblUnusedIndexes)
union 
-- Indexes that doesn't satisfy the Indexusage criteria.
select ss.name,b.TableName,b.IndexName,
b.IndexUsage ,b.IndexUsageToSizeRatio,
dbo.Uf_GetIndexCol(b.index_id,object_id(b.tablename),0) As IndexKey,
dbo.Uf_GetIndexCol(b.index_id,object_id(b.tablename),1) As IncludedCol,
b.ConstraintType,
dbo.Uf_GetIndexSize(b.index_id,object_id(b.tablename)) As IndexSizeInKB,
Case b.ConstraintType When 'Index' 
then ('Drop Index ' + ss.name + '.' + b.TableName + '.' + b.IndexName)
Else ('alter table ' + b.TableName + ' drop constraint ' + b.IndexName)
End DropQry
from tblIndexUsageInfo b,sys.tables st,sys.schemas ss
where(b.indexusage<=@indexUsage Or IndexUsageToSizeRatio<=@IndexUsageToSizeRatio)
and st.name=tablename and st.schema_id=ss.schema_id
and b.indexname not in (select indexname from tblUnusedIndexes)
group by b.indexname,b.tablename,ss.name,ss.schema_id,
b.ConstraintType,b.index_id,b.indexusage,b.IndexUsageToSizeRatio
END
Once we have a list of all unused indexes, we can analyse them and can even choose to drop them. The procedure proc_DropUnusedIndex drops an index based on the UnusedIndID from the table tblUnusedIndexes. The index is dropped and the IndexStatus column is set to Dropped.
Create Proc proc_DropUnusedIndex
@UnusedIndID int
As
Begin
Declare @SqlStr Varchar(4000)
select @SqlStr=DropQry from tblunusedindexes where UnusedIndid=@UnusedIndID
BEGIN TRAN
BEGIN TRY
Execute(@SqlStr)
Update tblunusedindexes Set IndexStatus='Dropped' where UnusedIndID=@UnusedIndID
END TRY
BEGIN CATCH
select ERROR_MESSAGE() as ErrorMessage
IF @@TRANCOUNT > 0
 ROLLBACK TRANSACTION;
END CATCH
IF @@TRANCOUNT > 0
 COMMIT TRANSACTION
print 'Index dropped Successfully'
END
This concludes the Step 2 where in the unused indexes are filtered out into table tblUnusedIndexes and can be dropped if required.

Step 3: Creating Missing Indexes

Missing indexes can be easily queried from the DMVs sys.dm_db_missing_index_details, sys.dm_db_missing_index_group_stats and sys.dm_db_missing_index_groups. I will use them in the below given query to figure out all missing indexes. The missing index details are saved in table tblMissingIndexes. The table definition is given below.
Create table tblMissingIndexes 
(
Sno int identity(1,1),
DatabaseName varchar(100),
tablename varchar(200),
Significance decimal(10,0),
CreateIndexStatement varchar(8000),
Status varchar(20) default ('NotCreated')
)
The procedure proc_FindMisisngIndexes is used to populate the above table.
Create procedure proc_FindMisisngIndexes
As
BEGIN
insert into tblMissingIndexes(DatabaseName,tablename,
Significance,Createindexstatement)
select db_name(sid.database_id),sid.statement,
(avg_total_user_cost * avg_user_impact) * (user_scans + user_seeks) 
As Significance,
dbo.fn_CreateIndexStmt
(
sid.statement,
sid.equality_columns,
sid.inequality_columns,
sid.included_columns)
from sys.dm_db_missing_index_details sid,
sys.dm_db_missing_index_group_stats sigs,
sys.dm_db_missing_index_groups sig 
where sig.index_group_handle=sigs.group_handle and sid.index_handle=sig.index_handle
order by significance desc
END
The above procedure uses the DMV's sys.dm_db_missing_index_details , sys.dm_db_missing_index_group_stats and sys.dm_db_missing_index_groups to find the missing indexes and their significance. The Significance of an index is calculated as (avg_total_user_cost * avg_user_impact) * (user_scans + user_seeks) which indicates the importance of an index for the system. The greater the value the greater is the importance.
The query for missing indexes is generated on the fly by function fn_CreateIndexStmt. It takes statement, equality_columns, inequality_columns and included_columns as input and returns the create index index query for the indexes. The code for the function is given below.
Create function fn_CreateIndexStmt
(
@statement varchar(1000),
@Equalitycols varchar(1000),
@InEqualitycols varchar(1000),
@Includedcols varchar(1000)
)
Returns varchar(5000)
AS
Begin
Declare @str varchar(5000),@tablename varchar(100)
set @tablename=substring(substring(@statement,charindex('.',@statement)+1,len(@statement)),
charindex('.',substring(@statement,charindex('.',@statement)+1,len(@statement)))+1,
len(substring(@statement,charindex('.',@statement)+1,len(@statement))))
set @Includedcols=Replace(Replace(@Includedcols,']',''),'[','')
set @Equalitycols=Replace(Replace(Replace(@Equalitycols,']',''),', ','_'),'[','')
set @InEqualitycols=Replace(Replace(Replace(@InEqualitycols,']',''),', ','_'),'[','')
set @str='Create Index Ix_' + replace(replace(@tablename,']',''),'[','') 
set @str=Case WHEN @Equalitycols is NULL THEN @str ELSE (@str + '_' + ISNULL(@Equalitycols,'')) END
set @str=Case WHEN @InEqualitycols is NULL THEN @str ELSE (@str + '_' + ISNULL(@InEqualitycols,'')) END 
set @str=@str + ' ON ' + @statement + '(' + 
CASE WHEN @Equalitycols IS NULL THEN '' ELSE replace(ISNULL(@Equalitycols,''),'_',',') END
+ 
CASE WHEN @InEqualitycols IS NULL THEN '' ELSE ',' + replace(ISNULL(@InEqualitycols,''),'_',',') END
+')'
set @str=Case WHEN @Includedcols is NULL THEN @str ELSE @str + 'Include (' + ISNULL(@Includedcols,'') + ')' END
return @str
END
It is not neccessary to create all the missing indexes. The choice depends on the significance of the index; indexes with significance greater than 80000 are expected to improve the performance.The below given procedure proc_CreateMissingIndexes executes the create index statements from the table tblMissingIndexes for indexes with significance value greater than @significance. The procedure also sets the status column to "Created" which indicates that the index has been created.The default value for status is "NotCreated".
Create Procedure proc_CreateMissingIndexes
@significance decimal(10,0)
AS
Begin
Declare @Count int,@SqlStr varchar(8000)
set @SqlStr=''
Select Identity(int,1,1) AS Sno,CreateIndexStatement 
into #temp
from tblmissingindexes where significance>@significance
select @count=count(*) from #temp
While(@count>=0)
Begin
select @SqlStr=CreateIndexStatement from #temp where sno=@count
update tblmissingindexes set Status='Created' where sno=@count
exec(@sqlStr)
set @count=@Count - 1
END
END
This finishes the Step 3 where in I showed how we can find all missing indexes and can create them to increase the database performance. I would like to mention that the missing index created will be tested for their effectiveness in step 1 and 2. Thus one need not to worry about the effectiveness of the missing index created as if it doesn't improves the database performance, it will be filtered out as unused index under step 2.

Step 4 : Minimum User Intervention

The steps 1-3 discussed above can be easily grouped together in a job, which can be scheduled to run at weekends. As all the results are recorded in a table, the results can be analysed later. The job steps are listed below.
Step 1: proc_filltblindexusageinfo
Step 2: proc_InsertMostUsedIndexes 10.00,1000
Step 3: proc_RebuildSelectedIndexes
Step 4: proc_filltblUnusedIndex 1,0
Step 5: proc_findMissingIndexes
Step 6: proc_CreateMissingIndexes 80000

Conclusion:

The article demonstrated an index management strategy to automate the day to day index management tasks.The strategy filters out the used index based on their usage and rebuilds them based on their level of fragmentation. It helps in recognising the unused index and also facilitates the detection and creation of missing indexes. The strategy also helps in recognising the duplicate indexes.New index can be compared with the existing one to check for the duplicacy.

No comments:

Post a Comment