Heaps 101

In my post about last page contention I proposed heaps as a possible solution for that problem. But before you start using heaps, there are a few consequences that must be considered.

It is my observation that heaps have a bad reputation in the community when it comes to OLTP. Reasons given vary from page deallocation issues, the assumption that forwarding records are a problem and maintenance issues.

Lets start with forwarding records. A forwarding record is created when a row is updated in a manner that causes it to no longer fit on the page where it resides. For clustered indexes, this is when a page split occurs. For heaps, there are no page splits. Instead, the one affected row will be moved to a new page leaving behind a pointer to the new location. This pointer is the forwarding record. The performance hit occurs when SQL server reads the page since it will follow the pointer as soon as it encounters it (and it must read the entire new page into memory to read that one row) and then jump back to continue reading the rest of the page it was on. If there are a lot of these forwarding records, that can lead to a lot of jumping around and extra page reads.

Regarding deallocation of empty pages, heaps behave very differently than rowstore indexes. In rowstore indexes pages are deallocated immediately, while a heap does not behave this way. As long as you are not using snapshot isolation level you can get deallocation by applying a table lock (or get a lock escalation to table lock). However, the adoption of RCSI is progressing rapidly and on this isolation level the only ways (as far as I know) to deallocate empty space in a heap is to rebuild it or to place a clustered index on it (in which case it is no longer a heap unless you drop the clustered index again). Paul Randal has a couple of short and easily digested posts on this with links for furter reading that will guide you into an interesting rabbit hole, one from 2018 and one from 2010. Granted, the latter is now fairly old, but these fundamentals have not changed (as of time of writing).

Moving on to heap maintenance issues, lets start picking up the trail from the previous paragraph; empty pages are not deallocated automatically when using RCSI. This means that you have to manage this through active maintenance. Your three options are: 1) leave the empty pages in the heap (they will be reused for future inserts), 2) rebuild your heap (which will deallocate the empty pages) or 3) apply a clustered index on the heap (the pages will be deallocated but the clustered index must be dropped again to maintain a heap).

All of these choices have drawbacks to them. What you need to know is that option 2) will rebuild your now invalid non-clustered indexes and option 3) will force two rebuilds of all non-clustered indexes on the table, once for applying the clustered index and once for removing it. That is a very costly operation if you only want to deallocate some pages. But it gets worse, the only way to get rid of forwarding records are options 2) and 3) as well.

Another factor that needs addressing when deciding between a clustered index or a heap is how the data will be accesses/read and how well these queries can be supported by narrow non-clustered indexes on your heap. For single row queries with a suitable supporting narrow non-clustered index, you will probably not see issues on a heap. But if you wind up creating large non-clustered indexes to support sorting and grouping, you might be better off making the clustered index one of those supporting indexes. By introducing wide supporting non-clustered indexes on a high frequency insert heap you will see contention anyway when inserting into the wide non-clustered indexes, so the high frequency insert gain with using a heap is null and void.

So, my thoughts on the matter is that heaps are suited for tables with high insert activity, but very little update and delete activity causing forwarding records and empty pages respectively. However, as long as empty pages and forwarding records are not causing you issues, a heap should be considered a candidate for tables having high insert activity. If you need to do frequent rebuilds of your heap, maybe it is not the right choice for your use case. My default rule of thumb for OLTP is to make my tables clustered indexes and consider a heap if the table is suffering from last page contention.