This question came up in a webcast recently: if a filtered index is causing data modifications to fail, does disabling the filtered index fix the problem?
I wasn’t 100% sure — I couldn’t remember the last time I’d tried, if I ever had. So….
Let’s test it!
First, let’s reproduce the problem: we’ll create a filtered index on a table, then set up a session who can’t read from it due to an ANSI_SETTINGS conflict. (A list of required SET OPTIONS for filtered indexes is documented here.)
CREATE INDEX ix_SalesOrders_filtered_IsUndersupplyBackordered_OrderDate_INCLUDES ON
Sales.Orders ( OrderDate )
INCLUDE ( OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
WHERE (IsUndersupplyBackordered = 0);
Now, let’s make our session incompatible with the index
All I have to do to make an insert fail is violate one of those required ANSI settings in my session.
Here we go!
SET ANSI_NULLS OFF;
INSERT Sales.Orders ( [CustomerID], [SalespersonPersonID], [PickedByPersonID], [ContactPersonID], [BackorderOrderID], [OrderDate], [ExpectedDeliveryDate], [CustomerPurchaseOrderNumber], [IsUndersupplyBackordered], [Comments], [DeliveryInstructions], [InternalComments], [PickingCompletedWhen], [LastEditedBy], [LastEditedWhen])
SELECT TOP (1) [CustomerID], [SalespersonPersonID], [PickedByPersonID], [ContactPersonID], [BackorderOrderID], [OrderDate], [ExpectedDeliveryDate], [CustomerPurchaseOrderNumber], [IsUndersupplyBackordered], [Comments], [DeliveryInstructions], [InternalComments], [PickingCompletedWhen], [LastEditedBy], [LastEditedWhen]
This results in the Error 1934
Msg 1934, Level 16, State 1, Line 25
INSERT failed because the following SET options have incorrect settings: 'ANSI_NULLS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.
OK! Now to disable the filtered index
I suck all the pages out of the filtered index, leaving only the metadata behind:
ALTER INDEX ix_SalesOrders_filtered_IsUndersupplyBackordered_OrderDate_INCLUDES ON
And now, when I retry my insert….
(1 row affected)
It worked! As long as this filtered index is disabled, the insert works again for the session with the incompatible ANSI_NULLS setting.
Personally, I’d rather drop the filtered index
Rebuilding the filtered index will cause this problem to happen again — until all sessions modifying the table have the required se ssion settings in place.
So to prevent actual rebuilds, I’d rather drop the index until everything is in order for the index to be recreated.
The ORDER BY clause is used only to determine the rows that are returned by the TOP or OFFSET clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.
The question wasn’t about views, though, it was about queries.
Let’s prove it! Bring on the demo
The question came with some demo code. For fun, I adapted the code to use 1 million results in the demo table, which introduces the extra factor of parallelism on my test instance. (I have the Cost Threshold for Parallelism setting at the default of 5, which is super low– I have it that way just for demo purposes.)
If you’d like to play along, the sample code for this is in a Gist, which I’ve also included at the bottom of this post.
Our demo table has a simple schema: there are three columns. The column named Id is a clustered primary key.
If we do a simple unordered SELECT statement from the table, the results look like SQL Server will default to returning the rows to us in an order defined by the clustered primary key.
However, that just happened to be what we got this time. We can’t rely on that in other queries, as we’re going to see.
Query 1 – Order by Extra, DESC
Notice in our sample data that the “Extra” column is an integer which always has the value 555. That is true for every row in the table.
What happens if we order by “Extra”? All the rows have the same value. Will SQL Server “fall back” on ordering by the Clustered Index– which is the only index it has?
Nope! Id = 1 is nowhere in sight.
And to make this really fun, run the exact same query a second time. Here’s what I got on my second run:
The ordering is different. None of the data has changed in the table, I just happened to get the rows back in a different order.
Let’s look into the execution plan
Here’s the plan I got for Query 1:
This query got parallelism on my test instance. It did:
An unordered scan on the clustered index, separating the work cross multiple threads
A sort on the Extra column (no other columns to sort by)
A re-gathering of the threads
SQL Server is trying to get the results back to me as quickly as possible. It is only sorting things by the Extra column, and since that value is the same for all rows in this case, I can get rows back in whatever order happens first on that run.
Query 2 – What if we add in TOP 2?
For query 2, you’ll get back two rows. But because the only column specified in the ORDER BY is Extra (and all rows for Extra have the same value), if you run the query repeatedly you will likely see that you don’t always get the same two rows back (and they aren’t the rows for Id = 1 and Id = 2).
The TOP does change the execution plan, in this case.
Why do I have both a Top N Sort and then another Top?
Well, I have 4 cores on this instance, and the query is going parallel. What it’s doing is scanning the clustered index with 4 cores. For each of those cores, it gets the top 2 rows (based on Extra), then it pulls those 8 rows together and does a final TOP on them.
Since I’m only ordering by the column Extra, SQL Server doesn’t consider anything else again.
Query 3 – OFFSET and FETCH
Query gets rid of TOP, and instead tries out OFFSET and FETCH.
Again, we get two rows back (because of the FETCH limitation), but they can be any two rows in the table since our ORDER BY column is essentially meaningless (all rows have the same value).
I get the exact same plan as I got for the TOP query in this case– same costs, same estimates, everything.
Query 4 – OFFSET and FETCH with local variables
Query 4 gets a little fancier — instead of using literal values for offset and fetch, it introduces local variables. Again, we get two rows back, but they can be any two rows in the table.
But the execution plan is different:
We don’t have an estimate of 8 rows anymore– we have an estimate of 100.
That’s because we used a local variable, @PageSize. Local variables anonymize their contents, unless you use a RECOMPILE hint. When SQL Server optimized this query, it wasn’t sure how many rows it needed to fetch, so it just guessed. If you set @PageSize = 200000, you’ll get the exact same estimates and plan for this query.
Is it only parallelism that can cause this?
Nope, there are other things, such as allocation order scans, which might change up the order, too.
tldr: When it comes to ordering of results, SQL Server does what you ask in an ORDER BY– and no more!
Except in the case of ORDER BY inside a view, when it actually does less. (Details back at the top of the post.)
Around 26 minutes in, I’m talking about hash match operators and I mention tempdb. Hugo Korneilis pointed out on Twitter that hash matches don’t always spill to tempdb. That’s totally correct– I should have said that this happens on a memory scratch pad that may spill to a place called tempdb.
What do you consider a good reference and/or description of indexes for SQL Server? For example where would you have documentation that explains how the answer to #1 is right?
Instead of finding links, I thought: what a great subject for a blog series! I’m going to step through the questions in the quiz and use some undocumented commands to “prove” each answer, and we’ll learn about the physical structure of disk-based rowstore indexes as we go.
Where are keys physically stored in an index?
Our quiz is about a simple table named dbo.UserDatabaseTable, which has a clustered index on a column named RowID. You can recreate the table using the script here to play along.
The table has a nonclustered index:
CREATE NONCLUSTERED INDEX ix_UserDatabaseTable_FirstName_RowID_INCLUDES on
dbo.UserDatabaseTable (FirstName, RowID) INCLUDE (CharCol)
WITH (FILLFACTOR = 90);
The structure for the nonclustered index is summarized as the output from querying the index from sys.dm_db_index_physical stats:
The question is: which levels of the index contain the RowID column? (Remember: RowID is a key column in the index).
What do key columns do?
In a disk-based, rowstore index, key columns define how the index is structured.
This is useful, because you can seek on those columns.
Our index has two key columns, in this order: FirstName, RowID.
So our index is sorted primarily by FirstName, then after that, it is sorted by RowID.
Imagine that our table contains only two first names, Mister and Stormy
Let’s say that our table has 3 ‘Mister’s and 5 ‘Stormies’, with unique RowIDs. We’d expect the key columns to be ordered like this:
This data would all fit easily on a single 8KB page in SQL Server. But for the sake of illustration, imagine that it didn’t. What if SQL Server had to put this data on three pages?
In that case, the key columns of the index would be distributed across multiple pages like this, still sorted first by FirstName, and then by RowID:
A few notes about the diagram:
This index has two levels. Those levels are level 0 (the “leaf” of the index), and level 1 (the “root page”). If a bunch of data were added that added enough pages to the leaf such that there wasn’t room on the root page to track them all, another level would be added to the index, and the root page would become level 2.
The FirstName and RowID columns exist at Level 1 (the root page), but not all rows are present there. Only enough information is stored so that we can seek to lower pages to find the more detailed information we need.
I’ve completely ignored the included column, CharCol from the index definition in this diagram, because I’m just focusing on the keys. It will show up in the more detailed examination below. (Spoiler: it’s only at the leaf level of the index.)
I’ve not included some metadata like the FileId where the page is stored, just for simplicity.
Lets prove it with the real table and DBCC PAGE
We’re going to use two undocumented tools in SQL Server to look at sample pages at each level of dbo.UserDatabaseTable.
First, I’m going to get a list of one page per level from the undocumented dynamic management view sys.dm_db_database_page_allocations, with this query:
min(allocated_page_page_id) as min_page_id
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('UserDatabaseTable'), 2 ,NULL, 'detailed')
GROUP BY page_level, allocated_page_file_id
ORDER BY page_level DESC;
This returns my list of sample pages, all of which are in file #1:
OK! Now we are going to dig in and look at which columns are really on those pages using the undocumented (but well known) command DBCC PAGE. My first step is to turn on trace flag 3604 for my session to divert the output of DBCC page to my session window (instead of the SQL Server error log):
Level 5: the root page
When you run DBCC PAGE against an index page with the “dump style” set to 3, you get a nice list of which columns are on the page.
Here’s what the root page of our nonclustered index looks like:
At this level of the index, we have both key columns of the index. The included column, CharCol, is not present.
If you’re wondering why a first name is ‘LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL…’, that is a real value in the table. Much of the data is auto-generated junk across the different letters of the alphabet, so that it’s easy for you to create the data with the script above without having to download and restore anything.
And for my expert friends out there, my screenshots today don’t include the KeyHashValue and Row Size columns returned by DBCC page, just to get a narrow view of the keys and save a little space.
Level 4 to level 1: intermediate levels
Now we’re going to peek at pages at the intermediate levels of the index. Remember, what we’re looking at here is the names of the columns in the table, which show whether or not that column is present at that level.
Sure enough, both FirstName and RowID are present at every intermediate level in the index.
Level 0: the leaf
Now, let’s peek at a page in the leaf of the index:
Just like in the root page and the intermediate pages, the FirstName and RowID columns are present.
Also in the leaf: CharCol, our included column appears! It was not in any of the other levels we inspected, because included columns only exist in the leaf of a nonclustered index.
You might look at this and wonder: our query against sys.dm_db_database_page_allocations selected the minimum page number for each level of the index. We know from checking previous pages at other levels that there is data for ‘Arnold’ in the index, and ‘Arnold’ is alphabetically before ‘Ingred’… so why are we seeing data for ‘Ingred’ on the lowest page number in the leaf?
The answer is that the script that generates this table fragments the index. The lowest page number in a level is not necessarily the first page “logically” at that level! Our pages are all out of order.
Grab the script above and play around with defragmenting the index to see how the page numbers change at each level.
Conclusions: revisiting the question
The quiz question that we set out to prove is this: which levels of the index contain the RowID column (which is a key column in the index)?
We proved that all key columns in the index are at all levels of the index.
There’s even more questions on the quiz. If you haven’t taken it yet, try your hand at it now! I’ll diagram out the answers to more questions in future blog posts right here.
SQL Server knows this is going to be a lot of rows. It decides to go parallel.
Having checked the data, I know that this seek operation is going to read the entire index on BirthYear. But, technically, this is still a seek operation.
What types of read operations does this do?
Looking at an actual execution plan, I dig into the index seek operator and it shows me information about the physical IO. Almost all of the requests were read-ahead reads.
Read-ahead is a mechanism that SQL Server can use when it’s pulling a lot of information from disk. Instead of pulling 8K pages onesy-twosy-threesy, SQL Server can suck up big chunks of pages from disk with a vacuum cleaner.
If you’re running developer or enterprise edition, you may get a larger vacuum cleaner.
What does that look like to Windows?
There’s a couple of ways to look at the read size. One method is to fire up the SysInternals tool Process Monitor and watch ReadFile operations from sqlservr.exe on a test system.
Here’s a sample of the reads it saw when I was doing this seek (and not running anything else on the instance). The highlighted row has a length of 524,288 – that’s a 512KB read! I’m running Developer Edition, so I’ve got the big read-ahead vacuum cleaner.
Most of the reads here are 64K chunks, but there’s quite a few cases when it can grab more…
What does this mean?
This means that there’s no simple shortcut to establishing and testing IO patterns for an application.
You shouldn’t, for instance, look at a performance counter that says there’s a lot of index seeks and interpret that to mean that there’s a lot of tiny 8K random read operations going to disk. Index seeks just aren’t that simple, and that’s a good thing for performance! Even if seeks do need to go to disk, they are able to leverage read-ahead reads — and they may be ‘seeking’ a lot of data.
When you modify the indexes on a table, SQL Server needs to reconsider how it executes queries that reference that table. But the way this appears when you’re looking at your execution plan cache is far from obvious:
Impacted query plans are NOT removed from cache at the time you change an index. The next time the query is executed, it will be recompiled, and the new plan and related execution information will be placed in cache.
This is true whether or not the index is used by the query.
Even if the index is on columns in the table which are not referenced by the query, the query will recompile on the next run.
Creating an index doesn’t remove a plan from cache
We run our Demo Query five times. Querying sys.dm_exec_query_stats and related DMVs, we see five executions, plan_generation_num = one.
Then we create an index on a table referenced by the query.
Querying the DMVs we still see five executions, plan_generation_num = one, and the query plan in cache. It hasn’t been removed!
Running the query again causes a recompile
We run our Demo Query again, this time four times.
Note: It’s important to highlight the query exactly, and not highlight spaces before the query on one run, and highlight differently the next run. Leading spaces are considered part of the query and inconsistent use of spaces impacts plan reuse.
Querying the DMVs, we now see four executions, and a new time for the plan creation_time. Plan_generation_num = two.
If the index is relevant to the query and SQL Server decides to use it, we’ll see a new execution plan.
An index rebuild doesn’t remove the plan from cache, either
After running ALTER INDEX REBUILD, we still see four executions in the cache, and plan_generation_num=two.
But if we rerun Demo Query three times, the first run triggers a recompile. We get a fresh creation_time for the plan, and now plan_generation_num has incremented to three.
The same thing holds for dropping an index
This plays by the exact same rules. When you drop an index, it doesn’t remove the plan from the cache immediately.
Put another way, you can have query plans in cache which refer to indexes that have been dropped, and do not currently exist. The query plan will be recompiled the next time the query is executed — if it’s still in cache. (Maybe it could be removed in the meanwhile due to memory pressure, a restart, an administrator’s command, etc.)
This may seem weird, but I think it’s a good thing
I like this behavior in SQL Server. This means that if you run nightly index maintenance jobs which hit a lot of indexes with rebuilds, your execution plan cache isn’t largely cleared out. If you’re looking at sys.dm_exec_query_stats the next morning, you’re more likely to see relevant information.
I host live technical webcasts and SQLChallenges for subscribers. Register here!
✅ Thu, Jun 21, 9AM Pacific – SQLChallenge: Create an Extended Events Trace