Posted on

Code: Fixing a Filtered Index Which is Causing Data Modifications to Fail

This question came up in a webcast recently: if a filtered index is causing data modifications to fail, does disabling the filtered index fix the problem?

I wasn’t 100% sure — I couldn’t remember the last time I’d tried, if I ever had. So….

Let’s test it!

First, let’s reproduce the problem: we’ll create a filtered index on a table, then set up a session who can’t read from it due to an ANSI_SETTINGS conflict. (A list of required SET OPTIONS for filtered indexes is documented here.)

Here’s my filtered index

I created this in the WideWorldImporters sample database.

CREATE INDEX ix_SalesOrders_filtered_IsUndersupplyBackordered_OrderDate_INCLUDES ON
  Sales.Orders ( OrderDate )
  INCLUDE ( OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
  WHERE (IsUndersupplyBackordered = 0);
GO

Now, let’s make our session incompatible with the index

All I have to do to make an insert fail is violate one of those required ANSI settings in my session.

Here we go!

SET ANSI_NULLS OFF;
GO

BEGIN TRAN
  INSERT Sales.Orders ( [CustomerID], [SalespersonPersonID], [PickedByPersonID], [ContactPersonID], [BackorderOrderID], [OrderDate], [ExpectedDeliveryDate], [CustomerPurchaseOrderNumber], [IsUndersupplyBackordered], [Comments], [DeliveryInstructions], [InternalComments], [PickingCompletedWhen], [LastEditedBy], [LastEditedWhen])
  SELECT TOP (1) [CustomerID], [SalespersonPersonID], [PickedByPersonID], [ContactPersonID], [BackorderOrderID], [OrderDate], [ExpectedDeliveryDate], [CustomerPurchaseOrderNumber], [IsUndersupplyBackordered], [Comments], [DeliveryInstructions], [InternalComments], [PickingCompletedWhen], [LastEditedBy], [LastEditedWhen]
  FROM Sales.Orders;

This results in the Error 1934

Msg 1934, Level 16, State 1, Line 25
INSERT failed because the following SET options have incorrect settings: 'ANSI_NULLS'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

OK! Now to disable the filtered index

I suck all the pages out of the filtered index, leaving only the metadata behind:

ALTER INDEX ix_SalesOrders_filtered_IsUndersupplyBackordered_OrderDate_INCLUDES ON
  Sales.Orders DISABLE;
GO

And now, when I retry my insert….

(1 row affected)

It worked! As long as this filtered index is disabled, the insert works again for the session with the incompatible ANSI_NULLS setting.

Personally, I’d rather drop the filtered index

Rebuilding the filtered index will cause this problem to happen again — until all sessions modifying the table have the required se ssion settings in place.

So to prevent actual rebuilds, I’d rather drop the index until everything is in order for the index to be recreated.

 

Posted on

Does a Clustered Index Give a Default Ordering?

Index-Librarian

Index-LibrarianI recently got a great question by email. It asks something that many of us assume, but never think to question.

The question is: if I order by a column where all rows in that column have the same value, will SQL Server then order the results by the clustered index key?

The short answer: SQL Server only guarantees that results are ordered per the columns you specify in an ORDER BY clause

There is no “default” ordering that a query will fall back on outside of an ORDER BY clause.

  • Results may come back in the order of the clustered index, or they may not
  • Even if results come back in the order of the clustered index on one run of the query, they may not come back in the same order if you run it again

If you need results to come back in a specific order, you must be explicit about it in the ORDER BY clause of the query.

Bonus details: ORDER BY is special inside view definitions. As Books Online explains:

The ORDER BY clause is used only to determine the rows that are returned by the TOP or OFFSET clause in the view definition. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself.

The question wasn’t about views, though, it was about queries.

Let’s prove it! Bring on the demo

The question came with some demo code. For fun, I adapted the code to use 1 million results in the demo table, which introduces the extra factor of parallelism on my test instance. (I have the Cost Threshold for Parallelism setting at the default of 5, which is super low– I have it that way just for demo purposes.)

If you’d like to play along, the sample code for this is in a Gist, which I’ve also included at the bottom of this post.

Our table

Our demo table has a simple schema: there are three columns. The column named Id is a clustered primary key.

If we do a simple unordered SELECT statement from the table, the results look like SQL Server will default to returning the rows to us in an order defined by the clustered primary key.

However, that just happened to be what we got this time. We can’t rely on that in other queries, as we’re going to see.

Example data from dbo.Test
Example data from dbo.Test

Query 1 – Order by Extra, DESC

Notice in our sample data that the “Extra” column is an integer which always has the value 555. That is true for every row in the table.

What happens if we order by “Extra”? All the rows have the same value. Will SQL Server “fall back” on ordering by the Clustered Index– which is the only index it has?

Query 1 results- sample 1

Nope! Id = 1 is nowhere in sight.

And to make this really fun, run the exact same query a second time. Here’s what I got on my second run:

Query 1 results – sample 2

The ordering is different. None of the data has changed in the table, I just happened to get the rows back in a different order.

Let’s look into the execution plan

Here’s the plan I got for Query 1:

This query got parallelism on my test instance. It did:

  • An unordered scan on the clustered index, separating the work cross multiple threads
  • A sort on the Extra column (no other columns to sort by)
  • A re-gathering of the threads

SQL Server is trying to get the results back to me as quickly as possible. It is only sorting things by the Extra column, and since that value is the same for all rows in this case, I can get rows back in whatever order happens first on that run.

Query 2 – What if we add in TOP 2?

For query 2, you’ll get back two rows. But because the only column specified in the ORDER BY is Extra (and all rows for Extra have the same value), if you run the query repeatedly you will likely see that you don’t always get the same two rows back (and they aren’t the rows for Id = 1 and Id = 2).

The TOP does change the execution plan, in this case.

Why do I have both a Top N Sort and then another Top?

Well, I have 4 cores on this instance, and the query is going parallel. What it’s doing is scanning the clustered index with 4 cores. For each of those cores, it gets the top 2 rows (based on Extra), then it pulls those 8 rows together and does a final TOP on them.

Since I’m only ordering by the column Extra, SQL Server doesn’t consider anything else again.

Query 3 – OFFSET and FETCH

Query gets rid of TOP, and instead tries out OFFSET and FETCH.

Again, we get two rows back (because of the FETCH limitation), but they can be any two rows in the table since our ORDER BY column is essentially meaningless (all rows have the same value).

I get the exact same plan as I got for the TOP query in this case– same costs, same estimates, everything.

Query 4 – OFFSET and FETCH with local variables

Query 4 gets a little fancier — instead of using literal values for offset and fetch, it introduces local variables. Again, we get two rows back, but they can be any two rows in the table.

But the execution plan is different:

We don’t have an estimate of 8 rows anymore– we have an estimate of 100.

That’s because we used a local variable, @PageSize. Local variables anonymize their contents, unless you use a RECOMPILE hint. When SQL Server optimized this query, it wasn’t sure how many rows it needed to fetch, so it just guessed. If you set @PageSize = 200000, you’ll get the exact same estimates and plan for this query.

Is it only parallelism that can cause this?

Nope, there are other things, such as allocation order scans, which might change up the order, too.

tldr: When it comes to ordering of results, SQL Server does what you ask in an ORDER BY– and no more!

Except in the case of ORDER BY inside a view, when it actually does less. (Details back at the top of the post.)

If you’re not convinced after all this, or just need a laugh, read Michael Swart’s impressions of bloggers answering a similar question.

Do  you have a question?

I’d love to hear it. Ask here.

And here’s that demo code…

Posted on

Watch my live session: When Partitioning Indexes Hurts Performance (and How to Fix It)

I’m very honored that my session on table partitioning from the 2017 SQL PASS Summit is being featured as one of the “Best of Summit” videos.

I had a terrific time presenting this session, thanks to everyone in the audience: you were awesome!

You can watch the video here, and follow the link above to see even more great videos from the conference for free.

Want the scripts?

Grab ’em here.

Technical notes

Around 26 minutes in, I’m talking about hash match operators and I mention tempdb. Hugo Korneilis pointed out on Twitter that hash matches don’t always spill to tempdb. That’s totally correct– I should have said that this happens on a memory scratch pad that may spill to a place called tempdb.

If you’re curious about spills, here is a good article by Remus Rusanu on the topic.

Posted on

Where are key columns stored in a nonclustered index in SQL Server?

Last week’s Quizletter featured a quiz on keys vs included columns in SQL Server.

I got a great question from a reader:

What do you consider a good reference and/or description of indexes for SQL Server? For example where would you have documentation that explains how the answer to #1 is right?

Instead of finding links, I thought: what a great subject for a blog series! I’m going to step through the questions in the quiz and use some undocumented commands to “prove” each answer, and we’ll learn about the physical structure of disk-based rowstore indexes as we go.

Where are keys physically stored in an index?

Our quiz is about a simple table named dbo.UserDatabaseTable, which has a clustered index on a column named RowID. You can recreate the table using the script here to play along.

The table has a nonclustered index:

CREATE NONCLUSTERED INDEX ix_UserDatabaseTable_FirstName_RowID_INCLUDES on 
  dbo.UserDatabaseTable (FirstName, RowID) INCLUDE (CharCol)
  WITH (FILLFACTOR = 90);
GO

The structure for the nonclustered index is summarized as the output from querying the index from sys.dm_db_index_physical stats:

The page_count here isn’t completely identical as in the quiz. Such is life.

The question is: which levels of the index contain the RowID column? (Remember: RowID is a key column in the index).

What do key columns do?

In a disk-based, rowstore index, key columns define how the index is structured.

This is useful, because you can seek on those columns.

Our index has two key columns, in this order: FirstName, RowID.

So our index is sorted primarily by FirstName, then after that, it is sorted by RowID.

Imagine that our table contains only two first names, Mister and Stormy

Let’s say that our table has 3 ‘Mister’s and 5 ‘Stormies’, with unique RowIDs. We’d expect the key columns to be ordered like this:

This data would all fit easily on a single 8KB page in SQL Server. But for the sake of illustration, imagine that it didn’t. What if SQL Server had to put this data on three pages?

In that case, the key columns of the index would be distributed across multiple pages like this, still sorted first by FirstName, and then by RowID:

A few notes about the diagram:

  1. This index has two levels. Those levels are level 0 (the “leaf” of the index), and level 1 (the “root page”). If a bunch of data were added that added enough pages to the leaf such that there wasn’t room on the root page to track them all, another level would be added to the index, and the root page would become level 2.
  2. The FirstName and RowID columns exist at Level 1 (the root page), but not all rows are present there. Only enough information is stored so that we can seek to lower pages to find the more detailed information we need.
  3. I’ve completely ignored the included column, CharCol from the index definition in this diagram, because I’m just focusing on the keys. It will show up in the more detailed examination below. (Spoiler: it’s only at the leaf level of the index.)
  4. I’ve not included some metadata like the FileId where the page is stored, just for simplicity.

Lets prove it with the real table and DBCC PAGE

We’re going to use two undocumented tools in SQL Server to look at sample pages at each level of dbo.UserDatabaseTable.

First, I’m going to get a list of one page per level from the undocumented dynamic management view sys.dm_db_database_page_allocations, with this query:

SELECT 
    page_level,
    allocated_page_file_id,
    min(allocated_page_page_id) as min_page_id
FROM sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('UserDatabaseTable'), 2 ,NULL, 'detailed')
WHERE 
    page_type=2 
    and is_allocated=1
GROUP BY page_level, allocated_page_file_id
ORDER BY page_level DESC;
GO

This returns my list of sample pages, all of which are in file #1:

OK! Now we are going to dig in and look at which columns are really on those pages using the undocumented (but well known) command DBCC PAGE. My first step is to turn on trace flag 3604 for my session to divert the output of DBCC page to my session window (instead of the SQL Server error log):

DBCC TRACEON(3604);
GO

Level 5: the root page

When you run DBCC PAGE against an index page with the “dump style” set to 3, you get a nice list of which columns are on the page.

Here’s what the root page of our nonclustered index looks like:

At this level of the index, we have both key columns of the index. The included column, CharCol, is not present.

If you’re wondering why a first name is ‘LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL…’, that is a real value in the table. Much of the data is auto-generated junk across the different letters of the alphabet, so that it’s easy for you to create the data with the script above without having to download and restore anything.

And for my expert friends out there, my screenshots today don’t include the KeyHashValue and Row Size columns returned by DBCC page, just to get a narrow view of the keys and save a little space.

Level 4 to level 1: intermediate levels

Now we’re going to peek at pages at the intermediate levels of the index. Remember, what we’re looking at here is the names of the columns in the table, which show whether or not that column is present at that level.

Sure enough, both FirstName and RowID are present at every intermediate level in the index.

Index Level 4 – sample page

Index Level 3 – sample page
Index Level 2 – sample page
Index Level 1 – sample page

Level 0: the leaf

Now, let’s peek at a page in the leaf of the index:

Just like in the root page and the intermediate pages, the FirstName and RowID columns are present.

Also in the leaf: CharCol, our included column appears! It was not in any of the other levels we inspected, because included columns only exist in the leaf of a nonclustered index.

Extra credit

You might look at this and wonder: our query against sys.dm_db_database_page_allocations selected the minimum page number for each level of the index. We know from checking previous pages at other levels that there is data for ‘Arnold’ in the index, and ‘Arnold’ is alphabetically before ‘Ingred’… so why are we seeing data for ‘Ingred’ on the lowest page number in the leaf?

The answer is that the script that generates this table fragments the index. The lowest page number in a level is not necessarily the first page “logically” at that level! Our pages are all out of order.

Grab the script above and play around with defragmenting the index to see how the page numbers change at each level.

Conclusions: revisiting the question

The quiz question that we set out to prove is this: which levels of the index contain the RowID column (which is a key column in the index)?

We proved that all key columns in the index are at all levels of the index.

There’s even more questions on the quiz. If you haven’t taken it yet, try your hand at it now! I’ll diagram out the answers to more questions in future blog posts right here.

Posted on

IO Patterns for Index Seeks: There May be a Lot of Read Ahead Reads

An index seek at work

I received a question recently asking about disk access patterns for index seeks in SQL Server. The question suggested that index seeks would have a random read pattern.

Is this necessarily the case?

Index seeks aren’t necessarily a small read

We tend to think of an index seek as a small, efficient retrieval of a few pages. It “seeks” straight there.

But this isn’t necessarily the case. Seeks can read quite a large amount of data — even all the data in a table if it meets the criteria for the key that I’m seeking on.

Seeks may also be more complex than they sound: they may seek on one key of the index, and then use non-seekable predicates to check every row that comes back to apply more filters.

Let’s look at an example

I have a table named dbo.FirstNameByBirthDate, which has a row for each baby name reported in the United States from 1880 to 2015.

There is a nonclustered index on a column named BirthYear.

I run the following query:

SELECT COUNT(*)
FROM dbo.FirstNameByBirthDate
WHERE BirthYear > cast(1700 as int);
GO

It gets a seek!

Here’s the plan for the query. This gets a seek operator…

Digging into the properties of that operator, the seek predicate here is:

Seek Keys[1]: Start: [BabbyNames201711].[dbo].[FirstNameByBirthDate].BirthYear > Scalar Operator([@1])

SQL Server knows this is going to be a lot of rows. It decides to go parallel.

Having checked the data, I know that this seek operation is going to read the entire index on BirthYear. But, technically, this is still a seek operation.

What types of read operations does this do?

Looking at an actual execution plan, I dig into the index seek operator and it shows me information about the physical IO. Almost all of the requests were read-ahead reads.

Read-ahead is a mechanism that SQL Server can use when it’s pulling a lot of information from disk. Instead of pulling 8K pages onesy-twosy-threesy, SQL Server can suck up big chunks of pages from disk with a vacuum cleaner.

If you’re running developer or enterprise edition, you may get a larger vacuum cleaner.

mmmm, read-aheads

What does that look like to Windows?

There’s a couple of ways to look at the read size. One method is to fire up the SysInternals tool Process Monitor and watch ReadFile operations from sqlservr.exe on a test system.

Here’s a sample of the reads it saw when I was doing this seek (and not running anything else on the instance). The highlighted row has a length of 524,288 – that’s a 512KB read! I’m running Developer Edition, so I’ve got the big read-ahead vacuum cleaner.

Most of the reads here are 64K chunks, but there’s quite a few cases when it can grab more…

A small excerpt of the read operations seen by Process Monitor

What does this mean?

This means that there’s no simple shortcut to establishing and testing IO patterns for an application.

You shouldn’t, for instance, look at a performance counter that says there’s a lot of index seeks and interpret that to mean that there’s a lot of tiny 8K random read operations going to disk. Index seeks just aren’t that simple, and that’s a good thing for performance! Even if seeks do need to go to disk, they are able to leverage read-ahead reads — and they may be ‘seeking’ a lot of data.

If you want to get into the nitty gritty and collect data to help you characterize your workload, check out the Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server whitepaper.

And I guess I should admit: read-ahead doesn’t really use a vacuum cleaner.

Posted on

Why Did My Clever Index Change Backfire? (free video)

I had a great time giving a session recently for the 24 Hours of PASS – Summit Preview Edition.

My session at the SQL PASS Summit this year is on great index ideas that accidentally backfire… and how you can avoid disaster! In this preview version, I talk about filtered indexes, indexed views, and indexed computed columns. You can watch the session here, and download the scripts from the PASS Site.

Sorry for the potato quality, this was recorded live

This was recorded from a live webcast, so the audio isn’t perfect. There’s also no presenter video. Just imagine me waving my hands and making faces, like in the post thumbnail 😀

Posted on

Do Index Changes Remove Execution Plans from Cache?

When you modify the indexes on a table, SQL Server needs to reconsider how it executes queries that reference that table. But the way this appears when you’re looking at your execution plan cache is far from obvious:

  • Impacted query plans are NOT removed from cache at the time you change an index. The next time the query is executed, it will be recompiled, and the new plan and related execution information will be placed in cache.
  • This is true whether or not the index is used by the query.
  • Even if the index is on columns in the table which are not referenced by the query, the query will recompile on the next run.

To see this in action, you can play along with this sample script.

A drama of a query plan and changing indexes

Here’s how our story goes…

Creating an index doesn’t remove a plan from cache

We run our Demo Query five times. Querying sys.dm_exec_query_stats and related DMVs, we see five executions, plan_generation_num = one.

Then we create an index on a table referenced by the query.

Querying the DMVs we still see five executions, plan_generation_num = one, and the query plan in cache. It hasn’t been removed!

Running the query again causes a recompile

We run our Demo Query again, this time four times.

Note: It’s important to highlight the query exactly, and not highlight spaces before the query on one run, and highlight differently the next run. Leading spaces are considered part of the query and inconsistent use of spaces impacts plan reuse.

Querying the DMVs, we now see four executions, and a new time for the plan creation_time. Plan_generation_num = two.

If the index is relevant to the query and SQL Server decides to use it, we’ll see a new execution plan.

An index rebuild doesn’t remove the plan from cache, either

After running ALTER INDEX REBUILD, we still see four executions in the cache, and plan_generation_num=two.

But if we rerun Demo Query three times, the first run triggers a recompile. We get a fresh creation_time for the plan, and now plan_generation_num has incremented to three.

The same thing holds for dropping an index

This plays by the exact same rules. When you drop an index, it doesn’t remove the plan from the cache immediately.

Put another way, you can have query plans in cache which refer to indexes that have been dropped, and do not currently exist. The query plan will be recompiled the next time the query is executed — if it’s still in cache. (Maybe it could be removed in the meanwhile due to memory pressure, a restart, an administrator’s command, etc.)

This may seem weird, but I think it’s a good thing

I like this behavior in SQL Server. This means that if you run nightly index maintenance jobs which hit a lot of indexes with rebuilds, your execution plan cache isn’t largely cleared out. If you’re looking at sys.dm_exec_query_stats the next morning, you’re more likely to see relevant information.