Posted on

AVG() in TSQL: Watch Your Datatypes

This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.

I was inspired to write this quiz because I’ve been teaching TSQL School each week. When we covered aggregate functions, I remembered just how tricksy and non-average that AVG() can be.

Want to take the quiz before you read the answers?

Head over here and take the quiz first.

Or read this post and then take the quiz, if you prefer. I’m not gonna call it cheating, because it’s learning either way.


Question 1: AVG of 1 and 1

In question 1, we average two rows: 1 and 1.

How did folks do on this one after a week?

  • Correct: 91% – These folks guessed the answer would be: 1
  • Incorrect: 9% – These folks almost all guessed the answer would be: 1.0

This seems like a subtle difference, but the key to the answer is understanding what data type SQL Server will implicitly convert these numbers to. And data types matter a lot when it comes to the results you can get out of AVG. In this case, 1 will be implicitly converted to an INT data type.

Question 2: AVG of 1 and 2

Here’s where the wacky fun begins: if I average two ints, one and two, I get back… one!

  • 46% – YAY TEAM! Great work! This is bizarro, so good on you folks.
  • 54% – I feel your pain. Most of these folks guessed 1.5, a few guessed 1.500000

What does this mean? Well, let’s look at an example with negative numbers to help get a bigger picture:

When we round integers -1 and -2, we get -1.

How to describe this behavior

There are two different ways you can think about this behavior. Pick the one that is easier for you to remember.

Description 1: SQL Server rounds integers “toward zero”. (When the average was positive, we got the ‘floor’. When the average was negative, we got the ‘ceiling’.)

Description 2: It’s truncating the decimal bits.

Extra credit: Guess which one of these descriptions came from me, and which one came from my computer scientist partner.

Question 3: Mixing in a CAST()

Aha! Now we’re getting fancy. We have a CAST function outside of the AVG function.

The keys to this answer are that CAST will be run after the AVG function completes, and the AVG function is still returning 1, because inside the AVG function, it sees the INT data type.

  • Correct: 36% – These cunning quizzers worked out that AVG would truncate the decimal bits and return 1, and that this would then be cast to 1.0
  • Incorrect: 64% – Most folks guessed 1.5

Question 4: Move that CAST() inside the AVG()

Here we finally have an example where I am changing the data type inside the AVG, and we are casting to a numeric type with precision of three, scale of one. (One number past the decimally bit!)

And, yeah, look at that, we get back a result with a scale of SIX.

In fact, if you to the trouble to see what type we end up with here (like by selecting the results into a temp table and inspecting the data types), this is numeric(38,6). 38 is the highest precision you can get with numeric.

  • Correct: 37% – Clever rabbits!
  • Incorrect: 63% – Most of these folks guessed 1.5, and I do not blame you at all

What’s up with this? Well, there’s a bit of info inside the documentation on AVG, but it’s about the decimal data type. Numeric and decimal are twins (but you want to use one or the other consistently, because of things like foreign keys). In the section on return types, it explains that the return type for decimal category (p, s) will be:

decimal(38, s) divided by decimal(10, 0)

Plugging that formula into the numbers we’re working with in this question:

Yep, it checks out.

Takeaways

Weird things like this are why some folks say not to do calculations in the SQL language.

For times when you do need to do calculations in TSQL, remember to think about your data types– both inside your functions, and outside.

Posted on

Free Session! DBA vs Deadlock: How to Out-Index a Deadly Blocking Scenario

I’ll be doing a free online webcast with the fine folks at Quest Software this June. Join us to practice solving and preventing deadlocks in SQL Server.

DBA vs Deadlock: How to Out-Index a Deadly Blocking Scenario

June 14, 2018
8:30 AM – 9:30 AM PST

Deadlocks strike fear into the hearts of even seasoned DBAs — but they don’t have to!

In this session, you’ll get the code to cause a sample deadlock in SQL Server. You’ll see how to interpret the deadlock graph to find out where the conflict lies, and how to design an index to make the deadlock disappear.

You’ll leave the session with the steps you need to confidently tackle future deadlocks.

Register for this free online session here.

Posted on

What is Deferred Name Resolution, and How Can It Go Wrong?

I got a question recently about why a particular bit of code was failing when running.

My first thought: oh, that’s probably related to… whatsitcalled.

You know, whatsitcalled, one of those features you notice just often enough so that it’s name doesn’t come to mind immediately.

It’s deferred.

In this case, it was actually named “Deferred Name Resolution.”

Deferred name resolution lets you create a procedure referencing something that doesn’t exist

In this case, I’m creating a temporary stored procedure (out of laziness, it means I don’t have to clean up a quick demo) –

CREATE OR ALTER PROCEDURE #test
AS
IF 1=0
       EXECUTE dbdoesnotexist.dbo.someproc;
GO

The database dbdoesnotexist does NOT exist, but I’m still allowed to create the procedure.

When I do so, I get an informational message:

The module '#test' depends on the missing object 'dbdoesnotexist.dbo.someproc'. The module will still be created; however, it cannot run successfully until the object exists.

This can be useful in some cases where you’ll be querying a table or procedure that may not exist all the time, but which will exist when a certain code block is run.

You can’t always count on deferred name resolution

What if our code refers to something that may exist, but isn’t accessible?

Here’s a slightly different code sample:

CREATE DATABASE offlinedb;
GO
ALTER DATABASE offlinedb SET OFFLINE;
GO

CREATE OR ALTER PROCEDURE #test
AS
IF 1=0
       execute offlinedb.dbo.someproc;
GO

This results in error 942

Creating the procedure fails in this case. The error given is:

Msg 942, Level 14, State 4, Procedure #test, Line 5 [Batch Start Line 17]
Database 'offlinedb' cannot be opened because it is offline.

If I set the empty database “offlinedb” to be online, then deferred name resolution works and I can create #test. If I drop “offlinedb”, same thing– no problems.

But while offlinedb exists in an offline state, I get error 942 at the time I attempt to create the procedure.

Takeaways

Little quirks like this are a good argument to configure test and pre-production/staging environments in ways that mirror production as much as possible.

And if you think you might run into this situation, it might be worth using a bit of Dynamic SQL to avoid it!

Posted on

Register for My Free Session on Execution Plan Forcing

I am excited to be giving a free online session as part of the Idera Live Virtual Conference, 2018!

Here are the details on my session:

To Force Plans, or Not to Force Plans, That Is the Question

Wed, May 16, 2018 – 9AM Pacific / Noon Eastern

We have new ways to force execution plans in SQL Server: in SQL Server 2017, you can even let SQL Server temporarily force query plans for you and test if it works well!

In this session, you’ll learn:

  • How forcing plans with Query Store compares with the older technology of Plan Guides
  • How to tell when a plan has been forced
  • What “morally equivalent plans” are (and why they’re a good thing)
  • How to see if a query has a high variation in performance
  • How to decide, “Should I force this plan?” and “Should I let SQL Server force plans for me?”

Register for this free session here

But wait, there’s more

I’m not the only one giving cool sessions that day — check out more free sessions at the Idera Live Virtual Conference.

Hope to see you there!

Posted on

Sys.dm_db_tuning_recommendations Makes Suggestions if Automatic Tuning isn’t Enabled

Dinosaur gives a hint: I really wouldn't put the stove THERE
Dinosaur gives a hint: I really wouldn't put the stove THERE
Just a suggestion

I naturally think about the new sys.dm_db_tuning_recommendations DMV when I’m working with the new Automatic Tuning feature in SQL Server 2017.

But I came across a post by Grant Fritchey recently, in which he remarked in the conclusion:

 Even if you don’t use [Automatic Tuning], you can take advantage of the information in sys.dm_db_tuning_recommendations as a starting point for adjusting your own plans.

WHAAAAAAT?

I had assumed that recommendations would only show up in  sys.dm_db_tuning_recommendations if I’d enabled automatic tuning for the database. I hadn’t even thought to test looking at the DMV if Query Store was set up but Automatic Tuning was disabled.

What does the recommendation look like if tuning is disabled?

I had to know, so off to my test database! I did a fresh restore of my database, configured and enabled Query Store, and then made sure that automatic tuning was off:

ALTER DATABASE current
SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = OFF);
GO

I verified that at this point,  sys.dm_db_tuning_recommendations didn’t have any rows.

Then I ran a workload where a query gets a “faster” plan that executes 102 times, suffers a recompile (whoops!) and then runs again with a “slow” plan that executes another 102 times.

Sure enough, I got a recommendation

Even though automatic tuning wasn’t enabled, SQL Server picked up on the performance changes. I got a recommendation in sys.dm_db_tuning_recommendations.

  • reason: Average query CPU time changed from 2127.84ms to 66291.9ms
  • state: {“currentValue”:”Active”,”reason”:”AutomaticTuningOptionNotEnabled”}

The details also include the query id in question, and the plan_id of the “fast plan”.

Do the suggestions show up in Standard Edition?

Automatic Tuning is an Enterprise Edition feature in SQL Server.

If you’re like me, your next question was whether suggestions might appear in Standard Edition, even if you can’t formally enable Automatic Tuning.

On my Standard Edition test instance, I can query sys.dm_db_tuning_recommendations, but no suggestions appear when I run my sample workload.

Here’s the workaround for Standard Edition

Don’t worry, Query Store still has your back, you just have to find your own suggestions.

  • Open the new ‘Queries with High Variation’ built in report
  • Set the graph to Metric: CPU Time (ms), Based on Std Dev

My problem query pops right to the top of the list with its fast and slow plans right in the graph.

This does not suck!

It turns out I really should have already known this…

Erin Stellato mentioned this in her excellent session on Query Store and Automatic Tuning with Dejan Krakovic at the SQL PASS Summit last year. It just didn’t sink in the first time I heard it!

I love a good suggestion!

I agree with Grant: I think the automatic tuning suggestions are a great place to start.

Even if you like to have the Automatic Tuning enabled, the fact that it’s adjusting plans for you should be a trigger to looking at the queries and finding a way to keep the plans from being so volatile and needing adjustment.

One note: suggestions disappear when the instance is restarted or the database goes offline — so if you’d like to keep ’em, you need to query them out of sys.dm_db_tuning_recommendations and persist them somewhere yourself.

 

Posted on

Partitioned Tables: Rolling/Rotating/Round-Robining Partitions

Rotating log pattern
Rotating log pattern
If a log rotates in a forest alone, does anyone hear it?

I recently received a terrific question about table partitioning:

I’m writing up a proposal for my company to start partitioning a 2.5TB table. The idea/need is to control the size of the table while saving the old data. The old data will be moved to an archive database on a different server where the BI guys work with it.

In none of the videos articles I’ve seen is the explanation of how the rolling partition works on a long term daily basis.

  1. Are the partitions reused, like in a ROUND ROBIN fashion?
  2. Or, do you add new partitions each day with new filegroups, drop the oldest partition off – this would be FIFO?

Lots of folks assume the answer here is always #2, simply because there’s a bunch of sample code out there for it.

But option #1 can be simpler to manage when it fits your data retention technique!

Reusing partitions – the “rotating log” model

Reusing partitions can absolutely work if you have a fixed number of partitions.

By reusing partitions, you don’t need to manage the code or the locking required with splitting or merging your partition functions.

You can still leverage switching, however! As long as you know that data in a given partition doesn’t need to be read, you can switch the data out and switch new data into it. You can also truncate individual partitions as of SQL Server 2016. (That is, of course, going to require high level locks.)

This also means that you don’t have to constantly drop and recreate staging tables for switching data in, if you don’t want to.

Thomas Kejser writes more about the rotating log pattern here: http://kejser.org/table-pattern-rotating-log-ring-buffer/

What if I don’t have a fixed number of partitions?

In Thomas’ model, the data which is switched out will be used to update an aggregate table of older data.

If you need to keep detail of older data, you could switch the data from the oldest partition into another table which is prepared as an archive table. You can even used indexed views to access data from both the “active” rolling table and the archive table together.

Remember that switching is only allowed when partitions are on the same filegroup, so you would want to plan this carefully if that was the case.

In the case of our questioner, older data will be moved out to another database, so it could simply be BCP’d out.

One note about filegroups

This question somewhat implies that each partition will be on its own filegroup.

Sometimes folks use way more file groups than they need with the sliding window model, and it added extra complexity to their project.

Filegroups can be very useful for:

  • Individual restore sequences
  • Running CHECKDB on part of the partitioned table with DBCC CHECKFILEGROUP (there is no partition level checkdb)
  • Putting some data on slower storage and some data on faster storage

However, if you are planning a very granular partition scheme, having one partition per filegroup can lead to a lot of wasted space if you ever rebuild data in each partition. It’s worth thinking about if it makes sense to group multiple partitions on the same filegroups.

Thanks for the great question!

If you have a question, ask it here.

Posted on

New Poster & Wallpaper on User Defined Functions in SQL Server (UDFs)

I added a new wallpaper and poster this week…

I had so much fun with the new SQLChallenge on functions that I made a poster and desktop wallpaper on user defined functions (UDFs).

It’s UFO themed. Because acronyms.

Grab this poster for free in three different wallpaper resolutions, plus a poster formatted in US letter, right here.