Friday, August 5, 2011

Best Practices for Creating Indexes

Creating indexes is a vital for system performance and future maintenance (we will discuss about it later in this article) of your SQL server database. Choosing appropriate indexes can improve your application performance significantly, often by order of magnitude. This is not as simple task as it might sound. There are several points you must consider when choosing indexes. Each index you define can improve performance of a SELECT query, but on the other hand can decrease performance of INSERT and UPDATE queries. The reason for this is that SQL Server automatically maintains index keys. Therefore, each time you issue a data modification statement, not only data modification SQL Server provides, but also index it updates each index defined on affected table. The performance degradation is noticeable in case of large tables with many indexes., or few indexes with long keys. In some cases it is suitable to drop index before updating or inserting new records and than recreate index. This is in case of large tables where you need to update or insert a large amount of data. You will find out that insert into table without indexes will be much more faster than into table with indexes.
There are a several tips you should keep in mind when implementing indexing.
  • Keep indexes lean. Try to build indexes on one or few columns at most. Wide indexes take longer to scan than narrow indexes.
  • Create the clustered index on every table. However, choose the column(s) for the clustered index judiciously. Try to create the clustered index on the column which is used most frequently for retrieving data.
  • Try to create the clustered index on the column with high selectivity; that is the column that does not have many duplicate values.
  • Try to create the clustered index on the column that will never be updated or will be updated infrequently. Every time the clustered index key is updated, SQL Server must maintain not just the clustered index but also the non-clustered indexes since non-clustered indexes contain a pointer to the clustered index. This is yet another reason why you shouldn't create the clustered index on multiple columns.
  • By default, SQL Server creates the clustered index on the PRIMARY KEY column(s) unless the clustered index is created prior to creating the primary key. It is often beneficial to have the clustered index on the primary key, but sometimes you're better off saving the clustered index for other column(s). Feel free to override the default behavior if your testing shows that clustered index on a non-key column will help your queries perform better.
  • SQL Server has to perform some maintenance when index keys are updated, so each additional index could impose a slight performance penalty on the performance of INSERT, UPDATE and DELETE statements. Nevertheless the cost of index maintenance could be far less than the performance improvement you'll reap for your SELECT statements. Therefore you can be fairly liberal with the number of non-clustered indexes.
  • Be sure to eliminate duplicate indexes, that is, multiple indexes created on the same set of columns. Such indexes provide no additional benefit but use up extra disk space and might hinder performance of INSERT, UPDATE and DELETE statements.
  • Check the default fill factor level configured at the server level. If you don't specify the FILLFACTOR option with CREATE INDEX statement, every new index will be created using the default fill factor. This may or may not be what you intend.
  • Non-clustered indexes can be created in different file groups which can reside on separate disk drives to improve the data access i.o. I/O operations.
In a couple of articles I have found that the identity column is the best choice for clustered index, but you must keep in mind that this way shouldn’t be the best way because it force users to enter new data into the last data page of the table. This condition is sometimes referred to as a “hotspot” since there may be multiple users competing for the last available spot on a page and therefore making INSERT statements slow.

In some cases you can find tables that are never queried based on one column. In this case some developers prefer to create clustered index on set of columns that are most frequently used for data retrieving and uniquely identifies each record. This types of indexes are called composite clustered indexes. Maybe you consider this as a good idea because identity column has no business meaning in most cases. However, from a performance view you should avoid composite clustered indexes. Generally speaking, the leaner index, the faster SQL Server can scan or seek through it. For small tables (or data sets) composite indexes perform relatively well, but as number of records grows, performance decreases.

Creating indexes has some limitations. Not every columns can be added on index. Especially columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max)data types cannot be specified as index key columns. However, varchar(max), nvarchar(max),varbinary(max), and xml data types can participate in a nonclustered index as nonkey index columns.

No comments:

Post a Comment