Last week’s Quizletter featured a quiz on keys vs included columns in SQL Server.
I got a great question from a reader:
What do you consider a good reference and/or description of indexes for SQL Server? For example where would you have documentation that explains how the answer to #1 is right?
Instead of finding links, I thought: what a great subject for a blog series! I’m going to step through the questions in the quiz and use some undocumented commands to “prove” each answer, and we’ll learn about the physical structure of disk-based rowstore indexes as we go.
Where are keys physically stored in an index?
Our quiz is about a simple table named dbo.UserDatabaseTable, which has a clustered index on a column named RowID. You can recreate the table using the script here to play along.
The table has a nonclustered index:
CREATE NONCLUSTERED INDEX ix_UserDatabaseTable_FirstName_RowID_INCLUDES on dbo.UserDatabaseTable (FirstName, RowID) INCLUDE (CharCol) WITH (FILLFACTOR = 90); GO
The structure for the nonclustered index is summarized as the output from querying the index from sys.dm_db_index_physical stats:
The question is: which levels of the index contain the RowID column? (Remember: RowID is a key column in the index).
What do key columns do?
In a disk-based, rowstore index, key columns define how the index is structured.
This is useful, because you can seek on those columns.
Our index has two key columns, in this order: FirstName, RowID.
So our index is sorted primarily by FirstName, then after that, it is sorted by RowID.
Imagine that our table contains only two first names, Mister and Stormy
Let’s say that our table has 3 ‘Mister’s and 5 ‘Stormies’, with unique RowIDs. We’d expect the key columns to be ordered like this:
This data would all fit easily on a single 8KB page in SQL Server. But for the sake of illustration, imagine that it didn’t. What if SQL Server had to put this data on three pages?
In that case, the key columns of the index would be distributed across multiple pages like this, still sorted first by FirstName, and then by RowID:
A few notes about the diagram:
- This index has two levels. Those levels are level 0 (the “leaf” of the index), and level 1 (the “root page”). If a bunch of data were added that added enough pages to the leaf such that there wasn’t room on the root page to track them all, another level would be added to the index, and the root page would become level 2.
- The FirstName and RowID columns exist at Level 1 (the root page), but not all rows are present there. Only enough information is stored so that we can seek to lower pages to find the more detailed information we need.
- I’ve completely ignored the included column, CharCol from the index definition in this diagram, because I’m just focusing on the keys. It will show up in the more detailed examination below. (Spoiler: it’s only at the leaf level of the index.)
- I’ve not included some metadata like the FileId where the page is stored, just for simplicity.
Lets prove it with the real table and DBCC PAGE
We’re going to use two undocumented tools in SQL Server to look at sample pages at each level of dbo.UserDatabaseTable.
First, I’m going to get a list of one page per level from the undocumented dynamic management view sys.dm_db_database_page_allocations, with this query:
SELECT page_level, allocated_page_file_id, min(allocated_page_page_id) as min_page_id FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('UserDatabaseTable'), 2 ,NULL, 'detailed') WHERE page_type=2 and is_allocated=1 GROUP BY page_level, allocated_page_file_id ORDER BY page_level DESC; GO
This returns my list of sample pages, all of which are in file #1:
OK! Now we are going to dig in and look at which columns are really on those pages using the undocumented (but well known) command DBCC PAGE. My first step is to turn on trace flag 3604 for my session to divert the output of DBCC page to my session window (instead of the SQL Server error log):
DBCC TRACEON(3604); GO
Level 5: the root page
When you run DBCC PAGE against an index page with the “dump style” set to 3, you get a nice list of which columns are on the page.
Here’s what the root page of our nonclustered index looks like:
At this level of the index, we have both key columns of the index. The included column, CharCol, is not present.
If you’re wondering why a first name is ‘LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL…’, that is a real value in the table. Much of the data is auto-generated junk across the different letters of the alphabet, so that it’s easy for you to create the data with the script above without having to download and restore anything.
And for my expert friends out there, my screenshots today don’t include the KeyHashValue and Row Size columns returned by DBCC page, just to get a narrow view of the keys and save a little space.
Level 4 to level 1: intermediate levels
Now we’re going to peek at pages at the intermediate levels of the index. Remember, what we’re looking at here is the names of the columns in the table, which show whether or not that column is present at that level.
Sure enough, both FirstName and RowID are present at every intermediate level in the index.
Level 0: the leaf
Now, let’s peek at a page in the leaf of the index:
Just like in the root page and the intermediate pages, the FirstName and RowID columns are present.
Also in the leaf: CharCol, our included column appears! It was not in any of the other levels we inspected, because included columns only exist in the leaf of a nonclustered index.
You might look at this and wonder: our query against sys.dm_db_database_page_allocations selected the minimum page number for each level of the index. We know from checking previous pages at other levels that there is data for ‘Arnold’ in the index, and ‘Arnold’ is alphabetically before ‘Ingred’… so why are we seeing data for ‘Ingred’ on the lowest page number in the leaf?
The answer is that the script that generates this table fragments the index. The lowest page number in a level is not necessarily the first page “logically” at that level! Our pages are all out of order.
Grab the script above and play around with defragmenting the index to see how the page numbers change at each level.
Conclusions: revisiting the question
The quiz question that we set out to prove is this: which levels of the index contain the RowID column (which is a key column in the index)?
We proved that all key columns in the index are at all levels of the index.
There’s even more questions on the quiz. If you haven’t taken it yet, try your hand at it now! I’ll diagram out the answers to more questions in future blog posts right here.