Index Fragmentation (rowstore)

Rowstore Index fragmentation can be divided into two groups: internal fragmentation which is measured in page density and external fragmentation that relates to the logical order the pages themselves are stored on disk.

The observant reader will note that internal fragmentation, which can be caused by rows being updated in a manner that causes them to be moved in the index, sounds very similar to what happens when you set fill factor. Indeed, internal fragmentation is measured in average page fullness, or page density, which is directly affected by fill factor. By the way, a page split by definition sets the page density to about 50% for that page as it takes about half the page content and moves it to a new page leaving about 50% empty space on both pages. Good to know. As far as I know, internal fragmentation can only be remedied by an index reorganization or an index rebuild, which also resets the “fill of your pages” in accordance with the fill factor setting.

External fragmentation (referred to as logical fragmentation for rowstore indexes and extent fragmentation for heaps by Microsoft) happens when your leaf pages are not ordered consecutively inside an extent. Another way of saying this is that the next page allocated to the index is not the page in the next page pointer on the current page, which is the definition Microsoft uses for logical fragmentation. An extent is the smallest block of disk space SQL server will allocate and is 64 KB in size, which is to say 8 pages large. Each page has a pointer to the next page in the order of the sorting key(s). External fragmentation happens when these pages are not in sequence in the extent, which happens if one of the pages in the extent undergoes a page split causing SQL server to jump in and out of the extent to read the pages in logical sequence.

Does all this matter? Well, it depends. It depends on your usage patterns, hardware, data model and so on. Generally, I would say that internal fragmentation should be managed due to the resulting empty space and its effect on memory and I/O. I would care about this on tables experiencing page density reduction over time and at least reorganize these on a monitored basis. I have not managed to find any recommended internal fragmentation thresholds from Microsoft.

When it comes to external fragmentation the old school recommendations from Microsoft was (and still is) to reorganize the index between 5% and 30% external fragmentation and rebuild the index above 30% fragmentation. On todays systems, with NVMe flash drives and fairly affordable RAM available I see this to be too aggressive. Also, avg_fragmentation_in_percent only measures logical fragmentation, while you might want to pay more attention to the internal fragmentation (avg_page_space_used_in_percent in DETAILED mode) to optimize buffer pool usage and I/O. For more information on this see Erik Darling’s blog.

All applications and use cases have different needs, but for an unknown system I would start with the following rule of thumb:

  • Reorganize and update statistics fairly often. For instance every night or three times a week, whatever works for your use case. Use the WITH FULLSCAN option if you can for the statistics update. The reorg will help optimize buffer cache and I/O performance and pack the pages properly resolving internal fragmentation. It is also possible to monitor statistics on row changes, but I redirect the avid reader to Ellen Stellatos EightKB session Demystifying Database Statistics in SQL Server for more information on statistics.
  • Rebuild indexes on user tables on a set schedule if you can, but use a percentage threshold to avoid unneeded work. Maybe use a schedule of once a week, once a fortnite or once a month (again depending on your usecase) to check the fragmentation. You can experiment with the percentage you want to use as a threshold, but I would start high, say 50% and adjust/experiment from there. I would do this as I do not fully trust the storage performance given to SQL server inside the often virtual environment and this way I know I am managing it within reason. If you want to use 30%, go ahead. In any case, make sure you are not updating statistics right after rebuilding your indexes.

I realize many smart and competent people in the SQL community are saying to largely ignore index fragmentation these days, at least external fragmentation. Currently, it is my opinion that we still need to manage this somewhat, but I reserve the right to change my opinion if so persuaded. Also, since I tend to go with the default fill factor, there will be page splits. So cleaning them up once in a while is just good manners. Fragmentation information of an index can be found in sys.dm_db_index_physical_stats.