Posted on

How to cause a simple spill to tempdb

Sometimes it’s useful to know how to cause a problem.

Maybe you’ve never encountered the problem, and want to get hands-on experience. Maybe you’re testing a monitoring tool, and want to see if a condition flags an alert. Maybe you’re testing out a new client tool, and want to see how it displays it.

I recently was going through some demos in SQL Operations Studio, and I found that a spill on a sort operator wasn’t causing a warning to visibly show in the graphic execution plan.

I wanted to file an issue on this and let the Ops Studio team know that would be helpful – but my demo code was somewhat complex and required restoring a rather large database. So I set up a quick code sample to cause a spill that could be run in any database.

What’s the easiest way to cause a spill on a sort operator?

Populate a non-indexed table variable with a bunch of rows, then query the table variable and order the output. Ordering the output will require a sort, and since table variables don’t support column statistics, SQL Server won’t allocate enough memory for the sort.

Voila, super quick sample code…

declare @foo table (i int);

WITH Ten(N) AS 
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)   
insert @foo
SELECT T1000000.N
FROM Ten T10
CROSS JOIN Ten T100
CROSS JOIN Ten T1000
CROSS JOIN Ten T10000
CROSS JOIN Ten T100000
CROSS JOIN Ten T1000000;

select 0
FROM @foo
ORDER BY i DESC

Credit: This code is a simple adaptation of Paul White’s answer on this StackExchange question.

And here’s what the spill looks like in action

Here’s a quick view of what the spills look like in SQL Server Management Studio, and then in Operations Studio.

In Operations Studio, you can currently only see one actual plan at a time (I’ve got an issue filed for that here), and there’s no little warning on the sort operator, although you can see the details of the spill in the tooltip (I commented on this issue to potentially reactivate it).

A quick screencap of the issue turned into an animated gif
Posted on

Fake News about Auto Parameterization/Simple Parameterization

I saw a question on Twitter today that took me down a little rabbit hole. And when I go down rabbit holes, I like to blog about it! There’s a TLDR at the bottom of this post if you’re short on time.

Here’s the question:

Simple parameterization leads to insanity

The thing about simple parameterization is that it’s not simple.

This is also known as auto-parameterization, and sometimes it’s not automatic — in other words, sometimes it looks like it happens, but it doesn’t actually happen.

In Klaus Aschenbrenner’s blog post, “The Pain of Simple Parameterization“, he wrote:

In general SQL Server only auto parameterizes your SQL statements if you deal with a so-called Safe Execution Plan: regardless of the provided input parameter values, the query must always lead to the same execution plan.

There are times when SQL Server just doesn’t think it’s safe to simply parameterize your query, and that can be SUPER confusing when you’re looking at queries.

Here’s what simple parameterization looks like when it works

I run two these two statements against the BabbyNames database…

SELECT FirstName FROM ref.FirstName where FirstNameId=76682;
GO
SELECT FirstName FROM ref.FirstName where FirstNameId=86055;
GO

In the actual execution plan, I see that the literal FirstNameId values have been replaced with @1.

Further, in the properties of the leftmost operator on the plan, I see that StatementParameterization type = 2.

Click for a larger image

In Query Store and the plan cache, the text for my query gets recorded as:

(@1 int)SELECT [FirstName] FROM [ref].[FirstName] WHERE [FirstNameId][email protected]

That single plan is show as having two executions. In other words, it’s re-used.

Simple parameterization sometimes doesn’t happen… but looks kinda like it did in your execution plan

Let’s change our query a little:

SELECT FirstNameId FROM ref.FirstName where FirstName='Grant';
GO
SELECT FirstNameId FROM ref.FirstName where FirstName='Kendra';
GO

When I run these new queries, here’s what my actual execution plan looks like:

Hmmmm… this is different! I’ve got the @1 again in my query text, and it’s even in the parameter list on the left.

But notice that this time, StatementParameterizationType is 0.

Last time, that was set to 2!

This didn’t really get parameterized

If I look in Query Store and my query plan cache, I find two queries and two plans.

The queries are formatted differently, they look like this:

SELECT FirstNameId FROM ref.FirstName where FirstName='Grant'
SELECT FirstNameId FROM ref.FirstName where FirstName='Kendra'

The parameter didn’t make it in there at all.

TLDR: Sometimes, it looks like something has been simple parameterized in an actual execution plan, but it hasn’t!

Just because I see an @1 being substituted in for a literal value in that text hint at the top of a plan doesn’t mean that simple parameterization has actually happened.

And for the record, I’m not sure that this is the exact scenario Grant was asking about, because my second example here isn’t technically a ‘trivial’ plan. But I do think that this is one of many reasons to use explicit parameterization practices, and not to rely on simple or auto-parameterization, because it just doesn’t work in many cases.

If you’d like to play around with these examples, the code is in this gist.

And thank you Grant for posting this question, I needed it to get me out of a blogging dry spell!

Posted on

A Case of Inconsistent Wait Stats and Query Duration

You know how they say “don’t sweat the small stuff”?

That’s good advice for lots of things, but when it comes to query tuning – I WANT TO KNOW MORE when something is inconsistent. I just can’t let it go. I go total X-Files: The Truth Is Out There.

The bright side of this habit is that it makes finding blog topics fairly easy.

I was measuring query performance at different degrees of parallelism

I set up a script to measure duration and wait statistics for a few queries. It followed this flow:

  • Set a variable with the “starter” degree of parallelism (DOP)
  • Begin a loop
    • Clean up anything from the last round
    • Record start time and snapshot of waits for the session from sys.dm_exec_session_wait_stats
    • Run the test query at given DOP (using Dynamic SQL)
    • Record end time and snapshot of waits for the session from sys.dm_exec_session_wait_stats
    • Decrement the DOP

I was gathering the data to graph and show query durations along with the amount of CXPACKET and CXCONSUMER waits at different DOPs.

The first weird thing: inconsistent wait stats

Whenever I can, I run a test more than once. I was graphing my wait statistics, and I noticed that on different executions of the procedure, my parallelism waits varied.

Varied more than a small amount.

Luckily for me, I read the release notes for Cumulative Updates for SQL Server most of the time, and after noticing the variation I remembered FIX: CXPACKET and CXCONSUMER wait types show inconsistent results for some parallel query plans in SQL Server 2017.

I checked, and sure enough, my little cloud VM was on SQL Server 2017 CU3, and that fix is in CU4.

I patched to CU5 (latest and greatest for my little test instance, ya know), redid all my testing twice, and admired my now-more-consistent charts.

The second weird thing: inconsistent query duration

I was getting consistent results for query duration when I ran the query in the script above, which started at a high degree of parallelism, and then went to low.

But I found that for one of my test queries, when I took the query and ran it in my own session to look at Query Time Stats in the execution plan, it took much longer.

For example, at MAXDOP 13:

  • Duration was consistently 55-65 seconds in the looping script
  • Running it in a session by itself, I was seeing durations of 2 minutes to 2 minutes 30 seconds

More than twice as long.

Why was it so much slower when I ran it in another session?

It wasn’t about cache/buffer pool reuse

I had plenty of memory on this instance, and I’d made sure to set ‘max server memory (mb)’ plenty high, but could something be causing me to do physical reads when I ran the query in my own session?

This made me realize that my test script wasn’t quite fair to the DOP that ran first — it might have to do physical reads where following queries made use of data in memory. This is a private test instance, so I changed the script to run everything with cold cache, by dropping clean buffers between runs.

I wasn’t getting a different query execution plan

I was getting the same plan with the same cost both in my “slow session” and in the script.

It wasn’t ASYNC_NETWORK_IO_WAITS

I was running everything from an SSMS instance on my laptop against a SQL Server on a VM in the Azure Cloud. Could I be getting some weird kind of latency on one session?

Nope. And my query wasn’t even returning a result set to my SSMS window — it was putting the results into a table in the cloud.

Was it the Dynamic SQL?

Could something about the Dynamic SQL be making it faster? Or had I made some sort of typo and the Dynamic SQL query was a little bit different in a way that made it faster?

Nope, it really was the same query, and the same plan. When I copied the longer script and adjusted the variables to only run for DOP 13 in my “slow” session, it was also slow.

Was it using STATISTICS TIME or STATISTICS IO?

I sometimes use these to measure queries, but in this case I wasn’t using them in either session.

Was it having Actual Execution Plans on?

Yes, it was.

This query was a reminder of the observer effect: watching something can change its behavior. In this case, “watching” the query by enabling actual execution plans made it take twice the duration.

I had actual plans enabled in my “slow” session, and I didn’t have them on in the session where I was measuring duration and wait statistics in the loop. (If you’ve ever accidentally left actual plans enabled when you’re running TSQL in a loop, you know how much SSMS doesn’t love that.)

I tested this a few different ways, including in another instance of SSMS local to the VM itself, and sure enough – actual plan drags this puppy down, although it’s a relatively simple query.

Was it only graphical plans?

I disabled graphical execution plans in Management Studio, then tested a few varieties of outputting actual plan info:

  • SET STATISTICS XML ON – Actual plan information output as XML in a column named “Microsoft SQL Server 2005 XML Showplan”
  • SET STATISTICS PROFILE ON – Actual plan information output as text across a variety of columns (screenshot below)

I tested these independently. Both of these had the same impact on the query as using the graphic execution plan – the query took twice as long.

I did feel kind of fancy looking at actual plan info in text format, though. We all know that hackers ALWAYS use text interfaces.

Kinda cool how execution count pops out in this view

What about ‘Live Query Statistics’?

I expected this to take a full three minutes, but it only took a bit longer than the ‘Actual Plan’ run in this case: 2 minutes 5 seconds.

Does this happen against different instances? At different DOPs?

I ran the same query against the same data on a local test VM on my MacBook Pro, which has 4 cores.

The duration at MAXDOP 4 on the MacBook Pro:

  • No actual execution plans = 2:41
  • Actual execution plans enabled = 2:50

Hmm, the discrepancy is much smaller on the small instance.

So I retested at MAXDOP 4 on the cloud VM. Note that this gets a different plan – similar shapes, but different costs (much more memory on that instance).

  • No actual execution plans =  2:20
  • Actual execution plans enabled = 3:07

To recap what I mentioned above, MAXDOP 13 on the cloud VM:

  • No actual execution plans =  0:55
  • Actual execution plans enabled = 2:00

Just for fun, MAXDOP 16 on the cloud VM (just in case you thought the issue was 13 being unlucky)

  • No actual execution plans =  1:00
  • Actual execution plans enabled = 2:14

Did I mention I’m compulsive about investigating this? MAXDOP 20 on the cloud VM (all its cores):

  • No actual execution plans =  1:12
  • Actual execution plans enabled = 2:10

Innnnnteresting. The impact of having actual plans enabled for this query is more pronounced on the cloud VM. The skew is greater at a higher DOP, although it’s not linear.

Actual plans are awesome, but make sure they don’t confuse your tuning process by secretly skewing your execution times!

There’s always overhead to using an actual plan.

Frequently it’s minimal, but sometimes it’s really significant – and the amount it varies may differ on different hardware, and at different degrees of parallelism.

Posted on

Auto-Tuning: Automatic Plan Correction is Cleared on Restart

The most delicious plans are mauve, but teal is a close second

The new Enterprise Automatic Tuning feature in SQL Server 2017 may sound intimidating at first — one question I get a lot lately is whether or not there’s a future for DBAs. Will Auto-Tune mean we don’t need any more human tuning?

Well, not anytime super soon.

I’m a big fan of the Automatic Plan Correction feature in SQL Server 2017, but it reminds me a bit of the missing index DMVs we got in SQL Server 2005: the suggestions are a fantastic indication of where you need to look and use your powerful human brain.

Automatic plan correction is temporary — and that’s a good thing

One of the things I love about this feature is that it wisely includes some caution.

If you’ve enabled automatic tuning and it finds what looks like a plan regression, it won’t just force a plan and forget about it.

Let’s say we have a query where the average CPU time with Plan A is 2 seconds, and the average CPU time for Plan B is 10 seconds. If Automatic Plan Correction kicks in, we go into a state where Plan A is being verified.

Trust, but verify

Is Plan A really always faster? It may be that after Plan A is being ever so gently forced (ok, that’s not gentle), the query is run with different parameters, that make Plan A skew horribly wrong, and our average CPU time goes to 12 seconds. SQL Server is looking for that.

Un-force when things change

Automatic Plan Correction is also pretty skittish when it comes to change.

Change indexes on the table? Whoops, we better un-force everything in case there’s something that could be faster! Update statistics on the table? Same thing!

And also…

Automatic Plan Corrections don’t persist over restarts

I did a little demo of this, just to prove it to myself.

I have an Automatic Plan Correction in verification state. Here’s a view of it in sys.dm_db_tuning_recommendations:

But then, I restart my SQL Server instance:

Don’t tease me about instance stacking on my test machine! I keep the Standard Edition instance off most of the time 😀

Once the instance comes back up, Automatic Plan correction is no longer in place.

Here’s a different view, after restart. Looking at Query Id 25 in the Queries with High Variation report, there are NO check boxes in those bubbles to the right. Nothing is being forced.

Everything is also gone from view in the sys.dm_db_tuning_recommendations. That data isn’t persisted after restarts.

The good news: we don’t completely start from scratch

My Query Store still has data about past performance of the plan.

After restart, if my query happens to compile with the “slow plan”, the fast plan can be identified from the history capture in Query Store before the restart.

In other words, Automatic Plan Correction doesn’t only consider information since the last restart when making its suggestions. I tested, and just by repeatedly running the “slow plan” for Query Id 25, a recommendation popped up. Since I have automatic tuning enabled for this database, it was automatically applied.

Check boxes indicate that plan forcing is currently active

Think of Automatic Tuning as an assistant DBA

And it’s better than just a random assistant! It’s an assistant that can help point you in the direction of problems like bad parameter sniffing, which are really hard to track down.

If your environment is so sensitive that you don’t trust your assistant to just go tuning things, that’s fine! You don’t have to enable Automatic Tuning, and you can still look at the suggestions manually (as long as you’ve got Enterprise Edition).

But at this point, and for the foreseeable future, your assistant still needs you to dig into the code, indexes, and architecture to find a better long term solution than just freezing the plan. Because if just freezing the plan was enough, this feature would be a whole lot simpler.

Posted on

Why Make a SQLChallenge on Tuning Functions? Why Not Say, ‘Stop Using Functions’?

I’ve published a new SQLChallenge for subscribers: Fix My Function.

This 200-Level challenge features a terrible, awful, very-bad function, which you are dared to tune.

The rules of the challenge include that you can rewrite the function and rewrite the query, but you still need to keep the function logic in a function.

What’s wrong with user defined functions?

I’m going to quote from a recent Microsoft paper of note a few times in this post: Froid: Optimization of Imperative Programs in a Relational Database. The paper is by Karthik Ramachandra, Kwanghyun Park, K. Venkatesh Emani, Alan Halverson, Cesar Galindo-Legaria and Conor Cunningham.

The introduction to the paper explains why user defined functions are both loved and hated.

UDFs are loved because they have loads of advantages

The paper lists these advantages, which are pretty persuasive. Functions can be very powerful because…

(a) They are an elegant way to achieve modularity and code reuse across SQL queries, (b) some computations (such as complex business rules and ML algorithms) are easier to express in imperative form, (c) they allow users to express intent using a mix of simple SQL and imperative code, as opposed to complex SQL queries, thereby improving readability and maintainability.

Expressing and coding business rules clearly, in a way that can be easily reused and maintained: that’s not a small benefit! That’s why functions are very common in database code, even though….

UDFs are hated because they often have big performance problems

I’m not going to be able to say this better than the paper. It explains:

It is a known fact amongst practitioners that UDFs are “evil” when it comes to performance considerations.

This is an academic paper, so there are citations regarding the evil of functions! That makes me laugh so hard.

If you’re wondering who was cited on the evil of functions, the articles are  by Simon Sabin and Arvind Shyamsundar.

Functions look like they’re going to shed some of their ‘evil’

The paper on Froid goes on to describe very cool new optimization techniques to help keep all those benefits of functions without killing performance. These optimizations solve a lot of the pains.

Pains like: user defined scalar-functions (of the non-CLR variety) prevent parallelism in the queries that call them.

The features described in the paper transform the logic in the function and bring them into the query plan itself, allowing parallelism and other optimizations. If you don’t want to read the whole paper, the abstract is pretty clear about this.

It’s still well worth knowing how to tune functions

There are reasons to still learn your way around tuning a function in SQL Server, and not just wait for v-Next (the version after SQL Server 2017).

  1. We don’t know exactly which functions can be optimized by these new function Froid-ing features
  2. We don’t know what Froid will cost / which editions will get it
  3. When there’s a simpler way to write or tune something, it’s often better!

Those reasons are why I wrote  Fix My Function. (Besides it being a lot of fun)

This SQLChallenge is a 200 level problem, but the videos documenting sample solutions get into the 300-level area by showing how to use estimated and actual plans to understand what functions are doing, and how to use SQL Server’s DMVs to measure function performance in different ways.

If you’ve already subscribed, you already have access to this course: dive right in.

If you’re ready to join, sign up here!

Posted on

The Case of MAX() Requiring an Index Scan, While TOP(1)/ORDER BY DESC Does Not

This is told as a story about query tuning. If you just want the answer to why the MAX() query was slow, scroll to the end for a recap.

Most of the time in SQL Server, the MAX() function and a TOP(1) ORDER BY DESC will behave very similarly.

If you give them a rowstore index leading on the column in question, they’re generally smart enough to go to the correct end of the index, and — BOOP! — just pluck out the data you need without doing a big scan.

I got an email recently about a case when SQL Server was not smart enough to do this with MAX() — but it was doing just fine with a TOP(1) ORDER BY DESC combo.

The question was: what’s the problem with this MAX?

It took me a while to figure it out, but I finally got to the bottom of the case of the slow MAX.

I got the problem in the form of a database backup

Dear readers, it is not always a good idea restore database backups from strangers on the internet. People can put nasty things in there, just like any old thing you zip up and attach to an email.

But this backup was from a fellow Microsoft MVP, I was curious about the problem, and I have a nice, isolated test instance in a VM just waiting to be tormented. So I went forth and restored!

Sure enough, I could reproduce the issue.

Here’s what it looked like — but in a fake table that I’ve created from scratch, which reproduces the issue.

Slow query vs fast query

Meet slow query

Here is the actual plan for the problem query (view from Sentry One’s free Plan Explorer)

That nonclustered index leads on CharColumn. SQL Server correctly estimated that there are 1 million rows in the index and it fed them all faithfully into a Stream Aggregate operator to do the MAX.

Meet fast query

Fast query used the very same index. It figured out that it could go to one end of the index and do a quick backward scan, feeding that TOP 1 row into a TOP operator and then stopping.

And it was correct. One row read, not one million!

A few metrics

SQL Server estimated that this MAX query was going to be more expensive than the TOP, and it was right:

These metrics are for the queries in a simple, narrow table with a single column nonclustered index on CharColumn. Add more rows and more complexity, and the performance difference will just get uglier.

Initial observations

The first thing I noticed is that we’re doing MAX against a CHAR column. CHAR is a fixed length, non-unicode data type.

Actually, that’s not the first thing I noticed.

The first thing I noticed was that this is a Dynamics database

The kindly fellow who raised this question let me know right away that his question is regarding a Dynamics database. That was one of the initial things that forewarned me that things might get wacky.

Having done some consulting for a while, I know the look that SQL Server specialists get when confronted with any database that’s a member of the Dynamics family. It looks like this:

Dynamics is coming for me!

It’s not that Dynamics is bad, it’s just … weird. The kind of weird that can lead you to think that your database is haunted, because it gets unusual behaviors related to its strange habits. (Heaps, odd settings, strange creaks in the night…)

Strange things like this issue.

But we can figure out EVEN A DYNAMICS query tuning problem, right? YES, WE CAN!

You can use MAX() against a character column, and this doesn’t always happen

The very first thing I tested was whether or not this “scan all the rows and push them into a stream aggregate for MAX” issue happened against other CHAR columns.

I selected CHAR data from another table in another database into a new table in my test database, indexed it, and started querying it.

It worked fine – the MAX() plan estimated one row into a top operator, then fed that single row into a stream aggregate.

I made the CHAR data into a CHAR(13) column, same as the test table.

It worked fine, too.

I spun my wheels for a while

I spent about half an hour testing random things, that didn’t end up helping. Things like….

  • Looking at the boot page of the database to see if there was any weird history of logs being rebuild that might cause this table to be funky. Nope.
  • Creating a clustered index on the test table. (It was a heap.) No change in behavior.
  • Changing the sort order of the nonclustered index on the test table. Nada, no change in behavior.
  • Performing various rituals with random trace flags, then admitting this ritual was ridiculous.
  • Asking my dog what he thought, and he thought it was time for dinner.

Sorry, dog, I have one last thing to test.

I altered the column, and that fixed it

The problem CHAR column didn’t allow nulls. In the documentation for MAX, it mentioned that it skips NULL values. The column seemed perfectly attuned to that, but I thought, hey, maybe switch that on my test column and see what happens?

So I ran this command:

ALTER TABLE dbo.UserDatabaseTable ALTER COLUMN CharColumn CHAR(13) NULL;
GO

I didn’t change the column length, I just made it allow NULLs.

After this, when I re-ran the MAX query, the plan looked like this:

This is what we wanted our MAX() query to do all along — feed a row into a TOP operator. Yes, we still have a stream aggregate there, but it’s a lot faster when it’s only getting one row.

This query took 0 seconds of CPU time and did 3 logical reads. It now matched the performance for the TOP(1) / ORDER BY query.

But wait, it’s not allowing NULLs that fixes it

I was making progress, but this was still really fishy. After all, MAX does not count NULL values, so if the column didn’t allow NULLs, that seems like it would make MAX’s job easier, not harder.

So I wondered if it was just the act of altering the column that made the difference, not changing the NULL-ability.

I re-restored the database, and I ran:

ALTER TABLE dbo.UserDatabaseTable ALTER COLUMN CharColumn CHAR(13) NOT NULL;
GO

Yep, I altered the column to be JUST LIKE it already was.

This also fixed the issue.

It’s not the NULLs, it’s some side-effect of altering the column. I scratched my head, looked at the dog, and called it a night. Relaxation and sleep can be good for this kind of thing.

Sure enough, it all made sense in the morning

In the morning, I played around with the table a bit more:

  • Selecting data from the column into a new table in tempdb still had the scan problem / was slow
  • Scripting out the table, creating it in tempdb, and inserting the data was fast every time – TOP operator was just fine!

This convinced me that it was nothing about that specific table or that specific database.

This problem is a column problem.

I started looking at the column properties

I suspected I was close to the answer at this point, and I got all excited and happy. I could barely type sensible TSQL, but I managed to query sys.columns for the table when it had the problem, and after I fixed it with an ALTER:

SELECT OBJECT_NAME(object_id) as table_name,
    * 
FROM sys.columns
WHERE object_id = OBJECT_ID('UserDatabaseTable') 
    and name='CharColumn';
GO

And I found the answer.

ANSI PADDING, you little devil!

When this CHAR(13) column has is_ansi_padded set to 0 (false), the MAX() operation generates an index scan that feeds the data into a stream aggregate operator.

The TOP(1) / ORDER BY combo doesn’t care.

MY ALTER TABLE / ALTER COLUMN statements have the effect of setting is_ansi_padded to 1 (true), which is the setting SQL Server generally prefers. With ANSI_PADDING on for the column, SQL Server says, “oh, I don’t need to read every row!” and just goes to the end of the index.

Note: Running ‘SET ANSI_PADDING OFF;’ in my session to match my session settings to the column setting doesn’t change anything. This is not a problem of session settings mis-matching the column settings, this is a problem with the ANSI_PADDING setting on the column itself.

What in the world is ANSI PADDING?

These ANSI settings are weird and esoteric. Here is a summary of ANSI PADDING from “INF: How SQL Server Compares Strings with Trailing Spaces“:

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, <Comparison Predicate>, General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons.

The article has more helpful info, so if you’re curious, go ahead and click through and read it.

This makes sense! And it’s even documented (mostly)

We’ve solved the mystery, as far as the MAX() query goes:

  • The ANSI spec SQL Server follows requires padding for CHAR columns when comparing them
  • MAX() is comparing values, so it seems fair that it would have to scan every row in the index, pad them (since ansi padding is OFF for the column), then do the MAX

But… what about the TOP (1) / ORDER BY query?

Isn’t ORDER BY also comparing values?

Shouldn’t that query also have to look at every row, for the ORDER BY operation? If there’s a bug here, it seems kinda like the TOP (1) / ORDER BY query is the one who is misbehaving.

I found a tiny piece of documentation for this (after I figured it out, of course – isn’t that always the way) in the Books Online article for SET ANSI_PADDING:

 When ANSI_PADDING set to OFF, queries that involve MIN, MAX, or TOP on character columns might be slower than in SQL Server 2000.

It does seem from this that TOP/ORDER BY should be padding all the strings before comparing them, I’m pretty reluctant to file a feature request saying I think something should be slower (and for a deprecated setting).

For most folks, the message is: before you create a table, it’s good to make sure you’re using the ANSI settings that SQL Server recommends. Here is a list in the Microsoft documentation. One of them is: SET ANSI_PADDING ON.

In fact, in the case of this Dynamics database, our questioner started looking into this and found that setting ANSI_PADDING to off may no longer be required. WOOT!

If you want to know if any of your existing tables has this problem, you can query sys.columns to find out (like in the query above).

Mystery solved!

If you’d like to play around with code that reproduces the problem in this mystery, you can grab it here.

Posted on

What is a “Morally Equivalent Execution Plan,” and Why Is It Good?

I recently wrote a fairly complicated post, called “Forced Plan Confusion: Is_Forced vs Use Plan = True.”

You do NOT need to go read that post to understand this one. I think I’ve found simpler way to explain the most important part of that post – and why it’s A Very Good Thing.

Plan forcing in Query Store is clever

The gist of this post is that plan forcing in Query Store is not the same as just putting duct-tape on an execution plan.

Query Store is a little more flexible than that. It has an understanding that if you force something too tightly, it may break.

It leaves a little wiggle room. SQL Server may compile a “Morally Equivalent Plan” if a query plan has been forced, but the optimizer looks at the plan and the “compile for” values and sees that the query plan is the wrong size pants for that query.

Step 1: I put a plan in Query Store

To get this party started, I restore the WideWorldImporters sample database and enable Query Store. I run a query and get its plan into cache.

The plan looks like this — notice that @packagetypeid is parameterized…

This plan has an estimated cost of .0067516. It thinks one row is going to flow through the whole query.

Step 2: I force that plan

I run a query against the Query Store DMVs to figure out the query_id and plan_id for this query.

Then I force it by running:

exec sp_query_store_force_plan @query_id=1, @plan_id=1;
GO

That one-row plan you see above is now forced!

Step 3: I cause a recompile to happen

The forced plan is currently in my execution plan cache. I want to cause a recompile to happen, so I clear out the cache for this database on my test instance:

ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
GO

The next time my query runs, it’s going to get a fresh compile. Yes, it has a forced plan, but let’s see what happens.

Step 4: I run the same query, but with a different value for @packagetypeid

When SQL Server compiles my query, it doesn’t blindly give me the exact forced plan with its one row estimates.

Instead, it sniffs the value of my parameters. It uses that sniffed value to look at the statistics. And it sees that I’m pretty likely to have more than one row flowing through this plan. Based on the statistics, it looks more like 221,241 rows!

So, clever SQL Server gives me a “morally equivalent plan”.

Meet the “morally equivalent plan”

Notice that this plan has the same shape as the plan above. That’s the “moral equivalency.” We’ve got our Seek -> Nested Loops -> Key Lookups. Compute Scalar feeds into Stream Aggregate, etc.

But there are some differences!

  • We have thicker row estimate bars, based on that statistic that was sniffed
  • My plan estimated subtree cost is 51.0062 (the forced plan is .0067516)
  • My plan has a green hint that points out my indexing on OrderLines isn’t so great — and my costs are different on my operators!
    • The Key Lookup is now estimated at 97% of the work, whereas before it was 50%
    • SQL Server isn’t thrilled about doing that lookup 221,241 times. It was no big deal to just do it once, so there was no missing index request when this was optimized for a different parameter.

The “morally equivalent plan” has its own query_plan_id, query_plan_hash, and shows up as separate in Query Store

Both the forced plan and the “morally equivalent” plan are visible in Query Store. It can be a little confusing to identify them if you haven’t seen them before.

  • The forced plan has is_forced = 1 in sys.query_store_plan and will have a check mark in the Query Store reports.
  • The “morally equivalent plan” will have “Use Plan=true” in the properties of the left-most operator in the plan.

The good news: if you open the “Queries with forced plans” report, both of them show up there.

Same shape, sized to fit the parameters it compiled with

I think this “wiggle room” is a terrific feature for forced plans.

Forcing plans is a somewhat crude thing to do — just because a forced plan works better today, doesn’t mean that the forced plan is going to be great tomorrow, if a bunch of my data changes. Or if a different customer is running the procedure, one who has quite a different data profile.

This feature is kind of like a stretchy waistband for the sweatpants of an execution plan. Maybe you need a bigger memory grant than the one forced with the original plan? Here’s a little wiggle room.

Plus, if I happen to check up on my forced queries in Query Store, I can notice that often when this runs it’s saying it could use a little indexing help. And just the fact that I’m sometimes getting a “morally equivalent plan” is an indication to me that I haven’t forced something that’s perfect, I should look deeper.

I love that!

I didn’t come up with the term “morally equivalent plan”

I heard this term from Erland Sommarskog, who said that he heard the term for Conor Cunningham. I love that it’s unique, not an acronym (yet), and it seems like a good description to me – so let’s popularize it among those of us who talk about plans 🙂

And if you’d like to play with the code for this, it is based on a sample from Jovan Popovic published under the MIT license: grab it here.