Programming blog: Index

Creating indexes is a vital for system performance and future maintenance (we will discuss about it later in this article) of your SQL server database. Choosing appropriate indexes can improve your application performance significantly, often by order of magnitude. This is not as simple task as it might sound. There are several points you must consider when choosing indexes. Each index you define can improve performance of a SELECT query, but on the other hand can decrease performance of INSERT and UPDATE queries. The reason for this is that SQL Server automatically maintains index keys. Therefore, each time you issue a data modification statement, not only data modification SQL Server provides, but also index it updates each index defined on affected table. The performance degradation is noticeable in case of large tables with many indexes., or few indexes with long keys. In some cases it is suitable to drop index before updating or inserting new records and than recreate index. This is in case of large tables where you need to update or insert a large amount of data. You will find out that insert into table without indexes will be much more faster than into table with indexes.
There are a several tips you should keep in mind when implementing indexing.

Keep indexes lean. Try to build indexes on one or few columns at most. Wide indexes take longer to scan than narrow indexes.
Create the clustered index on every table. However, choose the column(s) for the clustered index judiciously. Try to create the clustered index on the column which is used most frequently for retrieving data.
Try to create the clustered index on the column with high selectivity; that is the column that does not have many duplicate values.
Try to create the clustered index on the column that will never be updated or will be updated infrequently. Every time the clustered index key is updated, SQL Server must maintain not just the clustered index but also the non-clustered indexes since non-clustered indexes contain a pointer to the clustered index. This is yet another reason why you shouldn't create the clustered index on multiple columns.
By default, SQL Server creates the clustered index on the PRIMARY KEY column(s) unless the clustered index is created prior to creating the primary key. It is often beneficial to have the clustered index on the primary key, but sometimes you're better off saving the clustered index for other column(s). Feel free to override the default behavior if your testing shows that clustered index on a non-key column will help your queries perform better.
SQL Server has to perform some maintenance when index keys are updated, so each additional index could impose a slight performance penalty on the performance of INSERT, UPDATE and DELETE statements. Nevertheless the cost of index maintenance could be far less than the performance improvement you'll reap for your SELECT statements. Therefore you can be fairly liberal with the number of non-clustered indexes.
Be sure to eliminate duplicate indexes, that is, multiple indexes created on the same set of columns. Such indexes provide no additional benefit but use up extra disk space and might hinder performance of INSERT, UPDATE and DELETE statements.
Check the default fill factor level configured at the server level. If you don't specify the FILLFACTOR option with CREATE INDEX statement, every new index will be created using the default fill factor. This may or may not be what you intend.
Non-clustered indexes can be created in different file groups which can reside on separate disk drives to improve the data access i.o. I/O operations.

In a couple of articles I have found that the identity column is the best choice for clustered index, but you must keep in mind that this way shouldn’t be the best way because it force users to enter new data into the last data page of the table. This condition is sometimes referred to as a “hotspot” since there may be multiple users competing for the last available spot on a page and therefore making INSERT statements slow.

In some cases you can find tables that are never queried based on one column. In this case some developers prefer to create clustered index on set of columns that are most frequently used for data retrieving and uniquely identifies each record. This types of indexes are called composite clustered indexes. Maybe you consider this as a good idea because identity column has no business meaning in most cases. However, from a performance view you should avoid composite clustered indexes. Generally speaking, the leaner index, the faster SQL Server can scan or seek through it. For small tables (or data sets) composite indexes perform relatively well, but as number of records grows, performance decreases.

Creating indexes has some limitations. Not every columns can be added on index. Especially columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max)data types cannot be specified as index key columns. However, varchar(max), nvarchar(max),varbinary(max), and xml data types can participate in a nonclustered index as nonkey index columns.

Generally scans and seeks are the iterators that SQL Server uses to read data from tables and indexes. This types of iterators belog to the most fundamental ones that SQL Server supports. They appear in nearly every query plan.

Scan

Scan returns entire table (or index). That means, that scan touches every row in the table whether or not it qualifies. For this reason, tha cost of scan is proportional to the total number of rows in table. Scan is an efficient stratedy if table is small or if most the rows qualify for the predicate.

Index Scan is nothing but scanning on the data pages from the first page to the last page. If there is an index on a table, and if the query is touching a larger amount of data, which means the query is retrieving more than 50 percent or 90 percent of the data, and then optimizer would just scan all the data pages to retrieve the data rows. If there is no index, then you might see a Table Scan (Index Scan) in the execution plan.

If the table is large and if most of the rows do not qualify, we touch many more pages and rows and perform many more I/Os than is necessary.

Seek

Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table.

In general query optimizer tries to use an Index Seek which means that optimizer has found a useful index to retrieve recordset. But if it is not able to do so either because there is no index or no useful indexes on the table then SQL Server has to scan all the records that satisfy query condition.

MCSD	Web Applications
MCSE	Data Platform
MCSA	SQL Server 2012
MCTS	.NET Framework 3.5, ASP.NET Applications SQL Server 2008, Database Development SQL Server 2008, Implementation and Maintenance .NET Framework 4, Data Access .NET Framework 4, Service Communication Applications .NET Framework 4, Web Applications
MCPD	ASP.NET Developer 3.5 Web Developer 4
MCITP	Database Administrator 2008 Database Developer 2008
MS	Programming in HTML5 with JavaScript and CSS3 Specialist

Programming blog

Monday, August 22, 2011

New articles on CodeProject.com: Database performance optimization (part 1 and part 2)

Friday, August 5, 2011

Best Practices for Creating Indexes

Friday, October 29, 2010

SQL Server - Table (Index) Seeks and Scans

About Me

Blog Archive

Search This Blog

Certifications

Total Page Views