Posted on Leave a comment

Fix My Functions: Speeding Up Scalar and Table Valued UDFs (video)

Last week, I was lucky enough to present on the topic of TSQL User Defined Functions (UDFs) in SQL Server at the PASS Summit.

My session was live-streamed, and the video, scripts, and slides are below.

Want to watch more Summit sessions for free? Check out PASS TV here.

Video (1 hour 10 minutes)

Note: if you’re just here to see the SQL Server 2019 (CTP2.1+) scalar UDF inlining, that starts at 54 minutes into the video. 

Scripts

Fix My Functions demo script (zip)

Slides

Below the slides is an outline of the content, generated by SlideShare.

1. Speeding Up Scalar and Table Valued UDFs Kendra Little, Redgate Fix My Functions

2. Kendra Little Evangelist at Redgate Founder, SQL Workbooks Microsoft MVP, MCM [email protected] @Kendra_Little

3. This talk Level: 200 Audience: DBAs & Devs Goals: Speed up functions

4. “It is a known fact amongst practitioners that UDFs are “evil” when it comes to performance considerations [35, 28].”

5. Agenda

6. Why we use functions Modular meerkat

7. Why functions? Modularity Code reuse Simplify queries

8. Function types Scalar: returns a single value Multi-statement TVF: returns table Inline TVF: returns table

9. CREATE FUNCTION [schema].[function_name] (@parameter_name AS INT) RETURNS INT –WITH SCHEMABINDING, … AS BEGIN RETURN END GO Scalar function syntax

10. CREATE FUNCTION [schema].[function_name] (@parameter_name AS INT) RETURNS @return_variable TABLE (/* table type definition */) –WITH SCHEMABINDING, … AS BEGIN RETURN END GO Multi-statement TVF syntax

11. CREATE FUNCTION [schema].[function_name] (@parameter_name AS INT) RETURNS TABLE –WITH SCHEMABINDING, … AS RETURN ( ) GO Inline TVF syntax

12. SELECT TOP (10) qp.dbid, qp.query_plan, cp.size_in_bytes / 1024. / 1024. AS size_in_mb FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp ORDER BY size_in_mb DESC; CROSS/OUTER APPLY and TVFs

13. Demo: why so slow?

14. sp_WhoIsActive – free procedure from Adam Machanic: WhoIsActive.com

15. sp_WhoIsActive – free procedure from Adam Machanic: WhoIsActive.com

16. Lightweight Statistics Profiling SQL Server 2014 SP2 through 2017 • Trace Flag 7412 • Install KB 4078596 (2016 & 2017 only) SQL Server 2016 SP1+ MUCH lower overhead SQL Server 2019 no trace flag needed https://blogs.msdn.microsoft.com/sql_server_team/query-progress- anytime-anywhere

17. sp_WhoIsActive Free procedure Written by @AdamMachanic WhoIsActive.com

18. Find what functions hide Curious corg

19. Get tuning data for functions Estimated & actual execution plans Finding function calls in operators UdfCpuTime and UdfElapsedTime in actual execution plans

20. Demo: spy on function internals

21. Estimated plans help! Scalar functions and multi-statement TVFs: estimated plan shows the function logic • Does not appear in an actual execution plan • Plans for the calling query and the function are stored in sys.dm_exec_query_stats, but you must find them individually

22. UDFs and parallelism TSQL scalar UDFs – serial plan Multi-statement TVFs – serial zone Computed column with TSQL UDF – parallelism eradicator, BEWARE

23. MSTVFs and row estimates SQL Server 2005 – 2012 SQL Server 2014 – 2016 SQL Server 2017+ 100 1 ?

24. Interleaved execution Part of adaptive query processing, all Editions Introduced in SQL Server 2017… • MSTVFS only • Read only queries • Cannot be on the inside of an APPLY • Compatibility level 140+

25. Interleaved execution (continued) Diagram by Joe Sack @JoeSackMSFT https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/19/i ntroducing-interleaved-execution-for-multi-statement-table-valued- functions/

26. Scalar UDF Inlining Clever crow

27. https://aka.ms/iqp

28. Why are scalar UDFs slow? Executed row by agonizing row Scalar operators not ‘costed’ No cross-statement optimization No parallelism https://docs.microsoft.com/en-us/sql/relational-databases/user-defined- functions/scalar-udf-inlining

29. Automatic inlining Rewrite scalar UDF Substitute rewrite into calling query Then optimize https://docs.microsoft.com/en-us/sql/relational-databases/user-defined- functions/scalar-udf-inlining

30. Demo: wizardry

31. Controlling behavior Database compatibility level 150 CREATE FUNCTION … WITH INLINE = OFF USE HINT (‘DISABLE_TSQL_SCALAR_UDF_INLINING’) https://docs.microsoft.com/en-us/sql/relational-databases/user-defined- functions/scalar-udf-inlining

32. No scalar UDF inlining if it… Uses GETDATE() Uses table variables or TVPs Is in computed column Is in a check constraint https://docs.microsoft.com/en-us/sql/relational-databases/user-defined- functions/scalar-udf-inlining

33. Takeaways

34. Tips for tuning functions Scalar UDFs and Multi-Statement TVFs inhibit parallelism Use SCHEMABINDING if your function doesn’t do data access Use inline TVFs (single statement) or persist data when possible

35. The future of scalar UDFs 2019 inlining is VERY compelling Edition has not been announced Releasing with “high coverage”

36. References & links Lightweight query profiling reference – Pedro Lopes https://blogs.msdn.microsoft.com/sql_server_team/query- progress-anytime-anywhere/ SQL Server Functions, the basics – Jeremiah Peschka https://www.red-gate.com/simple-talk/sql/t-sql- programming/sql-server-functions-the-basics/ Froid: Optimization of Imperative Programs in a Relational Database – Karthik Ramachandra et al http://www.vldb.org/pvldb/vol11/p432-ramachandra.pdf

37. References & links continued Interleaved execution for multi-statement TVFs – Joe Sack https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/19/introd ucing-interleaved-execution-for-multi-statement-table-valued-functions/ Parallelism inhibitors – Paul White http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing -a-parallel-query-execution-plan.aspx

38. Thank You Where to find me… @Kendra_Little [email protected]

Posted on Leave a comment
Posted on 1 Comment

Find the Scalar Function Call: What it Means If It’s Hiding in Probe Residual

User defined functions are fairly simple to create in SQL Server, but figuring out exactly how they are being used can take a little time – even in a simple execution plan. This is because the functions can be tucked away into lots of different operators in the plan, even join operators.

The good news: once you learn what to look for, it’s not terribly complicated. There are some fancy terms involved, but the definitions are pretty simple.

Here’s a quick look at a function call hidden in a hash match operator, which is being used to implement an inner join.

The estimated query execution plan

This query uses the Microsoft WideWorldImporters sample database. The query calls the built in scalar user defined function, Website.CalculateCustomerPrice:

SELECT
    il.InvoiceLineID
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
    ON i.InvoiceID=il.InvoiceID
WHERE 
    [Website].[CalculateCustomerPrice] ( i.CustomerID, il.StockItemID, i. InvoiceDate ) 
        <>
     il.UnitPrice;
GO

Here’s the query execution plan:

There’s not a lot of places for the function call to hide in this particular plan, right? No Compute Scalars to look into at all.

If we look at the properties of the Hash Match join, we can find the function call:

How does the hash join work?

The description at the top of this tool tip is quite helpful when it comes to remembering how hash matches work: SQL Server is building a table in memory from the Sales.Invoices table. It’s then “probing” into it with items from the Sales.InvoicesLines table.

The “Hash Keys Probe” part of the tool tip tells us that it is looking for matches based on the InvoiceID column (our join column).

What about the ‘Probe Residual’?

After we find matches based on the customer id, we have more work “left over” — that’s the “residual” bit.

For every row that matches, SQL Server is plugging values into the Website.CalculateCustomerPrice() function and comparing the result to the Unit price column, just like we asked for in the where clause.

In other words, this is happening for every row in Sales.InvoiceLines that has a matching row in Sales.Invoices.

Which is every single invoice & invoice line, as it turns out.

Bonus

If you’re ever looking for an example of a query where collecting an actual execution plan dramatically slows down execution (observe effect), this is a good one. On instances I’ve tested, this query takes 14-18 seconds to run when no plan is collected, and about a minute longer when you enable actual plan collection.

Takeaways

When you’re looking at execution plans, it can be very tricky to spot user defined functions in the plan:

  • You need to not only check into Compute Scalar operators if they are around, but also look carefully at the properties of other operators, including joins
  • If you spot the function call as part of a “probe residual” on a hash join, it is being applied to every row that meets the initial join criteria
Posted on 1 Comment
Posted on Leave a comment

Adaptive Queries in Standard Edition: Interleaved Exec for Multi-Statement TVFs

It’s tough to keep track of which features work in each version of SQL Server, and which Editions support them.

My memory told me that the new Adaptive Joins feature in SQL Server 2017 was Enterprise Edition only…  and that’s correct, but I didn’t realize that the fancy new feature to make Multi-Statement TVFs smarter has much wider licensing.

Interleaved Execution for Multi-Statement Table Valued Functions works in Enterprise, Standard, Web, and Express Editions

I discovered this by accident when I was doing some testing on my Standard Edition instance. I was looking at an execution plan, and I checked the properties of a TVF and …

Function Beaver couldn’t be more excited about this

First, I double-checked and made sure I really was connected to my Standard Edition instance. YEP.

Then I checked the SQL Server 2017 Editions and Components chart to see if this was an accident or not

Sure enough, ‘Batch Mode Adaptive Joins’ and ‘Batch Mode Memory Grant Feedback’ are Enterprise Edition features.

But ‘Interleaved Execution for Multi-Statement Table Valued Functions’ is available in all the editions.

What does ‘Interleaved Execution for Multi-Statement TVFs’ do for you?

Multi-statement table valued functions are difficult to optimize as part of a query: they run multiple statements, so how the heck do you estimate how many rows are going to come out of that?

This new feature lets the optimizer stop and take a peek!

When the optimizer finds a multi-statement TVF that qualifies*, it pauses the optimization process (badass!), and then goes briefly into the execution phase.

(This is the interleaving part.) It can run through that little part of the plan to get a good estimate, instead of making a blind guess.

Good estimates means a much better chance that you’ll get the right joins, memory grant, and qualify for parallelism when needed.

That usually adds up to faster query execution.

*Which Multi-Statement TVFs qualify?

Not all multi-statement TVFs are created equal. If your TVF is part of a data modification, or if your TVF is inside a CROSS APPLY, it doesn’t qualify for this magic — at least not in SQL Server 2017. That may change in the future, this is just the V1.

How do I turn this on?

If you’re on SQL Server 2017, your database compatibility level controls whether or not this feature is available.

If your compat mode is 140, your multi-statement table valued functions are very likely to speed up.

I say ‘very likely’ because there can be some edge cases where better estimates lead to worse performance.

Want to learn more?

If you agree with Function Beaver that this is the best thing since CREATE FUNCTION…

Posted on Leave a comment
Posted on 1 Comment

Why Is My Function Missing From sys.dm_exec_query_stats and Query Store?

It’s time to write a blog post, so I can save time when I go searching for this in a couple years!

Functions can be tricksy in SQL Server

It can be difficult to see functions in execution plans, and to figure out when they’re eating a lot of CPU time on your SQL Server.

One way that’s handy to see if functions are being executed frequently is to check out the sys.dm_exec_query_stats Dynamic Management View and look at the execution count, CPU time, and logical reads of the ‘CREATE FUNCTION’ statement for a given function, to estimate its impact.

But recently, I was working on a project and…

I couldn’t find my functions in sys.dm_exec_query_stats! I also couldn’t find them in Query Store.

Or, rather, I couldn’t find SOME of my functions

Some of my functions in the demo code were showing up just fine. I was really puzzled by that. I thought …

Maybe this is a bug with ‘CREATE OR ALTER’? A sign of some weird memory pressure? Something introduced in SQL Server 2017? A buggy side effect of implicit conversions in some of the functions? A problem with the queries I was using? A weird setting on the database? (Also: about 100 other things that didn’t turn out to be the case.)

I finally wrote up some simple demo code, tested it against a SQL Server 2008 R2 instance (omitting the Query Store components), compared it with SQL Server 2017, and found it to be consistent.

Some functions just don’t accrue information in sys.dm_exec_query_stats, and that’s been true for quite a while.

I tested six types of functions:

  • Scalar functions that do, and do not do data access
  • Multi-statement table valued functions that do, and do not do data access
  • Single statement table valued functions that do, and do not do data access

Which of these functions appear in sys.dm_exec_query_stats and sys.query_store_query?

Philosoraptor, reinvented

To be clear, the queries run which call the functions all appear in sys.dm_exec_query_stats and sys.query_store_query. There are no missing calling queries.

However, the calling query only shows things like how many times it was executed. It doesn’t show how many times the function was executed (the function may be executed many times per query). Sometimes that’s handy to see when you’re troubleshooting.

CREATE FUNCTION entries with per-function-execution data only appeared for my:

  • Scalar function that does data access
  • Multi-statement table valued function that does data access
  • Multi-statement table valued function that does NOT do data access

It makes sense to me that my single statement table valued functions are missing from the list, because those functions can be “inlined” into their outer queries. They don’t really run on their own.

However, it is a bit confusing that the scalar function that does not do data access is missing from the list. (I don’t think it’s a huge tragedy, but it makes it look like you’re missing something — hence me spending a while on this!)

Which of these functions appear in sys.dm_exec_function_stats?

We got sys.dm_exec_function_stats in SQL Server 2016, so you can’t use this on Ye Older SQL Instances. This view recorded information for my:

  • Scalar function that does data access
  • Scalar function that does NOT do data access

Why no table valued functions? This one is documented! The DMV simply doesn’t have information about Table Valued Functions.

What does this all mean?

If you have a lot of scalar functions that don’t do data access, and you’re on SQL Server 2016 and higher, check out sys.dm_exec_function_stats  for aggregate information on how they are running.

And if you’re just confused about why not all your functions are showing up in sys.dm_exec_query_stats or Query Store, hopefully this post saves you some time!

Posted on 1 Comment