Table Value Constructors in TSQL

Last week’s quiz was on Table Value Constructors in TSQL.

Essentially, table value constructors let you create a dataset on the fly. These can occasionally be useful in writing queries, but I think playing with them has another benefit: they provide a simple, lightweight framework to let you develop your ability to think in sets.

I wrote this quiz building up to question #4, which I think is a very fun and interesting pattern.

Let’s take a look at how folks did on the quiz, and see what table valued constructors can do.

Quiz Answers and Explanations

Question 1: A single Table Valued Constructor! How many rows will this query return?

SELECT *
FROM (VALUES 
   ('Spider Plant', 'Chlorophytum comosum', 'airplane plant'),
   ('Aloe Vera', 'Aloe vulgari', 'Burn Plant')
)as v(name1, name2, name3);
  • Answer: 2
  • Correct: 252 (86%) / Incorrect: 40 (14%)

The folks who got this incorrect mostly went for 3 — if you haven’t used these before, it’s unclear how those commas work!

Table value constructors allow you to specify multiple rows, each surrounded by (round parens). Look for those round parens to identify the rows.

Screenshot of the results from question 1

Question 2:  I’ve cross applied my TVC with another TVC. How many rows will this query return?

SELECT * 
FROM ( VALUES 
   ('Heart Leaf Philodendron'),
   ('Philodendron cordatum')
) as v(name1)
CROSS APPLY ( VALUES 
   ('green'),
   ('leafy')
) as v2(name1);
  • Answer: 4
  • Correct: 239 (82%) / Incorrect: 53 (18%)

We have two table value constructors, and each of them has one column and two rows.

If you look at the execution plan for this query, it takes the datasets and joins them with an INNER JOIN with no join predicate – in this case it’s as if we did a cross join, and got the cartesian product. Each dataset has two rows, so 2 x 2 = 4.

Screenshot of the results of question 2

Question 3:  Something changed in that APPLY. How many rows will this query return?

SELECT * 
FROM (VALUES 
   ('Succulents'), 
   ('Cacti') 
) as v(thing) 
CROSS APPLY (VALUES 
   (v.thing + ' are friends')
) as v2(thing);
  • Answer: 2
  • Correct: 220 (75%) / Incorrect: 72 (25%)

Whoo, the second table value constructor (v2) is doing something interesting this time: it’s referring to the dataset produced by the first table value constructor (v1), and concatenating on a literal value to the ‘thing’ column.

The results here have two rows: the second TV creates an additional column.

Note: you could get the exact same results using a SELECT to define v2 instead of a VALUES clause.

Screenshot of the results of question 3

Question 4:  Hmmmm… the CROSS APPLY got even more interesting. How many rows will this return?

SELECT v.plant, v2.faves 
FROM (VALUES 
   ('Ferns', 'being spritzed', 'shade', 'humidity'), 
   ('Succulents', 'well drained soil', 'to dry out a bit', 'brighter light')
) as v(plant, fav1, fav2, fav3)
CROSS APPLY ( VALUES 
   (fav1),
   (fav2),
   (fav3)
) as v2(faves);
  • Answer: 6
  • Correct: 202 (69%) / Incorrect: 90 (31%)

I find this to be a very interesting pattern: we are using a table value constructor to do an unpivot operation!

The first table value constructor (v), is a two row table with four columns: plant, fav1, fav2, and fav3.

The second table value constructor (v2), does something that seems almost impossible:

  • It refers to the fav1, fav2, and fav3 columns from (v)
  • It places all THREE of these columns into a single column named (faves)

This is allowed, and it does work.

The number of rows we get back is the number of values in fav1, fav2, and fav3 – there are two rows and three columns, so 2 x 3 = 6 rows.

Screenshot of the results from question 4

 

CROSS APPLY in T-SQL: Why to APPLY

Why use APPLY?

Michelle asked a great question on the quiz page:

In your own words, why would one want to use a cross apply operator rather than a join operator? I’m old school, and I’m just not getting why a cross apply would be so much better to use than a join.

Here’s my top 3 favorite uses for CROSS APPLY and OUTER APPLY:

  1. APPLY is fantastic for calling table valued functions. I didn’t include questions about those in the quiz, simply for the purposes of keeping the code simple, and because I wanted the quiz to be about thinking through how apply works — but it’s still the #1 use.
  2. Another thing I might use it for is when a query needs a correlated subquery — somewhat like an inline function.
  3. And I also like it for queries that have a calculation that needs to be done and which is referenced in multiple columns in the query, or perhaps also in a predicate and the select. You can perform the computation once in the apply and then reference it multiple times. That way if you have to change the formula later on, you only have to change it in once place, plus I find it’s easier to read in some cases.

But it can do even more…

It can help you (un) pivot data. Check out Kenneth Fisher’s example here.

Brad Schulz’s post here has code samples for those, plus more things like showing how it can be useful for shredding XML: http://bradsruminations.blogspot.com/2011/04/t-sql-tuesday-017-it-slices-it-dices-it.html

Brad gets a bit jokey at the end of the post, but there’s a lot of valid uses along the way.

Want to take the quiz on APPLY?

Take the quiz here, then head back here to see how you compare to other folks who took the quiz– reported below.


Quiz Results/Answers/Explanations: CROSS APPLY

The questions in this quiz are based on two tables, created and populated with the following commands:

CREATE TABLE dbo.t1 (t1c1 int IDENTITY PRIMARY KEY); 
CREATE TABLE dbo.t2 (t2c1 int IDENTITY PRIMARY KEY, t1c1 int NOT NULL DEFAULT 'foo'); 
GO

INSERT dbo.t1 DEFAULT VALUES
GO 5
INSERT dbo.t2 (t1c1) SELECT TOP (2) t1c1 FROM dbo.t1;
GO

This produces two tables which look like this:

Question 1: If you run this code in a SQL Server Management Studio session, what will appear in the Messages window?

SELECT COUNT(*) AS ct 
FROM dbo.t1
CROSS APPLY (SELECT 1 / 0 AS x 
     FROM dbo.t2 
     WHERE t1.t1c1 = t2.t1c1) AS t2;

Answer: The results of COUNT(*) based on an inner join between the two tables

  • Correct: 106 (42%)
  • Incorrect: 146 (58%)

I know, it’s weird, there’s a 1/0 in there which seems like it would produce a divide by zero error! But that bit of code never gets evaluated in this case, and the query runs successfully.

Sometimes a query using APPLY can be “transformed” into a join operation, and this is one of those cases. You can read more about this in Paul White’s article here.

Most of the time, if you want a join, you should use a join. However, it’s definitely useful to know that APPLY can be used this way for when you are reading and editing code written by others.

Question 2: Will these queries return the same result?

SELECT COUNT(*) AS ct
FROM dbo.t1
   JOIN dbo.t2
    ON t1.t1c1 = t2.t1c1;

SELECT COUNT(*) AS ct
FROM dbo.t1
CROSS APPLY (SELECT 1 / 0 AS x 
   FROM dbo.t2 
   WHERE t1.t1c1 = t2.t1c1) AS t2;

Answer: yes

  • Correct: 140 (56%)
  • Incorrect: 112 (44%)

In this case, the second query will be rewritten to use an inner join, just like the first query. One method you can use to verify this is to look at the execution plan for both queries (and you can obviously run them and compare the results).

Both queries return a count of 2.

Here are the estimated plans for both queries:

Question 3: Will these queries return the same result?

SELECT COUNT(*) AS ct
FROM dbo.t1
   JOIN dbo.t2
    ON t1.t1c1 = t2.t1c1;

SELECT COUNT(*) AS ct
FROM dbo.t1
OUTER APPLY (SELECT 1 / 0 AS x 
   FROM dbo.t2 
   WHERE t1.t1c1 = t2.t1c1) AS t2;

Answer: no

  • Correct: 198 (79%)
  • Incorrect: 54 (21%)

The first query returns the count of 2. The second query returns a count of 5.

This is because the first query is implemented as an inner join, and the second query is implemented as an outer join.

Question 4: Will these queries return the same result?

SELECT t1c1 * 2 AS d
FROM dbo.t2;

SELECT x.d
FROM dbo.t2
CROSS APPLY (SELECT t1c1 * 2 AS d) AS x;

Answer: yes

  • Correct: 161 (64%)
  • Incorrect: 91 (36%)

This is an example of using a CROSS APPLY to compute a calculation.

It’s pretty trivial in this case, but I find this can be very useful in the case of complex queries where you refer to the result of a calculation more than once — doing this can allow you to only do the computation in once place and then refer to it many times. Not only can this make the query more readable sometimes, it can limit the errors if you need to change the calculations (because you’re changing it in fewer places.)

 

The BREAK Keyword in Transact SQL

BREAK is a useful piece of control-of-flow language in SQL Server, but it only works in specific situations.

Want to take the quiz first?

Take the quiz on BREAK here, then head back here to see how you compare to other folks who took the quiz.


Break Questions, Answers, and Explanations

1) If you run this code in a SQL Server Management Studio session, what will appear in the Messages window?

BREAK;

PRINT 2;

Answer:

Msg 135, Level 15, State 1, Line 6
Cannot use a BREAK statement outside the scope of a WHILE statement.

How’d people do?

  • Correct: 237 (77%)
  • Incorrect: 70 (23%)

I know, it seems like this should work! But, in fact, BREAK only breaks you out of a WHILE loop.

For other purposes, you may want to use RETURN.

2) If you run this code in a SQL Server Management Studio session, what will appear in the Messages window?

WHILE 1 = 1
    BREAK;
    PRINT 2

Answer: 2

This one was a little trickier…

  • Correct: 202 (66%)
  • Incorrect: 105 (34%)

In this case BREAK will cause us to exit that WHILE loop (so we won’t be stuck in it forever). BREAK doesn’t cause the execution of the whole batch to terminate, though, so we go on to the next statement, and PRINT 2 is executed.

3) If you run this code in a SQL Server Management Studio session, what will appear in the Messages window?

WHILE 1 = 1
IF 1 = 1
    PRINT 2
    BREAK;

Answer:

Msg 135, Level 15, State 1, Line 6
Cannot use a BREAK statement outside the scope of a WHILE statement.

This one foiled even more folks:

  • Correct: 135 (44%)
  • Incorrect: 172 (56%)

The key to this one is that we haven’t defined a statement block. That means that the WHILE statement applies to the next statement only. The IF statement also applies to the next statement only. The BREAK is left out in the cold, and it will throw an error if it is not part of a WHILE loop.

But it’s a tricky question, because it might seem like SQL Server would get to the PRINT and be stuck in an infinite loop, never getting to the point that it throws the error. That doesn’t happen – as soon as we execute, the parse process sees the mistake and we get the error right away.

4) If you run this code in a SQL Server Management Studio session, what will appear in the Messages window?

WHILE 1 = 1
BEGIN
    BREAK;
    PRINT 2;
END

Answer: Commands completed successfully.

  • Correct: 239 (78%)
  • Incorrect: 68 (22%)

In this case, the BREAK and the PRINT are within a defined BEGIN / END statement block together, inside a WHILE loop. When we hit the BREAK it exits the loop, never getting to the PRINT statement.

 

Using RETURN in Transact SQL: Quiz Results

RETURN is a “control of flow” language keyword in Transact-SQL. It’s very useful for managing when code does — and does NOT — get executed in procedural code, and it can also be used to communicate the status from procedural code.

But not everyone knows how to use RETURN, and learning about it can be awkward because of some quirks of SQL Server Management Studio. It is well worth taking the time to get used to it, though!

Want to test your knowledge?

Take this quiz on RETURN, then head back here to see how you compare to other folks who took the quiz.

Note: I’m going to implement a change soon so that only the four most recent quizzes are open to everyone. I am still going to post questions and answers in follow up posts, but the fun interactive bit won’t be as open (after a month for each quiz).

So now is a good time to go through my quiz archive if you haven’t already!

Quiz results and Discussion of RETURN

I changed the wording on question #4 on July 4 after Nick posted a message about the wording — and he was totally right, there was more than one defensible correct answer.

Changing the wording did not significantly change the average score, it turns out. The average percentage dropped from 65% to 64% once that question was clarified. (That’s not a bad percentage at all — I think a lot of folks aren’t used to using RETURN, and hopefully the quiz made them curious on how it can be useful.)

RETURN
GO

PRINT 2;
GO
  • Correct: 292 (78%)
  • Incorrect: 84 (22%)

Answer: 2

The key to this question is that RETURN controls flow within a given batch, and the default batch separator is the word “GO”.

If you want to stop the execution of a script you’re running in SSMS, the RETURN keyword won’t stop everything if you have more than one batch.

RETURN

PRINT 2;
GO
  • Correct: 249 (66%)
  • Incorrect: 127 (34%)

Answer: Commands completed successfully.

In this question, RETURN is shown inside a single batch. It will stop the flow of execution before the PRINT statement, and SSMS will simply report, “Commands completed successfully.”

IF 1 = 1
BEGIN
    RETURN;
END

PRINT 2;
  • Correct: 256 (68%)
  • Incorrect: 120 (32%)

Answer: Commands completed successfully.

This time we have everything in one batch statement again, but we have the RETURN inside of a BEGIN / END block.

The gist of this question is: does RETURN cause it to exit only the BEGIN/END block, or that entire block of code?

The answer is that it causes it to exit from the entire block of code, it will never reach the PRINT statement.

CREATE OR ALTER PROC dbo.test
AS
    RETURN 1;
GO

EXEC dbo.test;
GO
  • Correct: 29 (58%)
  • Incorrect: 21 (42%)

Answer: Commands completed successfully.

The wording in this post is the “corrected” wording. The original wording was less clear, so I’ve reported the correct/incorrect ratios for July 5 – 9, after the wording was changed.

This one may seem silly, but I think it’s very important to know that SSMS will not report anything in the Messages tab in the case. It doesn’t mean that RETURN didn’t work, it simply means that the SSMS interface doesn’t show it!

If you want to capture the value that RETURN sends back and do something with it, you need to assign it to something, like this:

CREATE OR ALTER PROC dbo.test
AS
    RETURN 1;
GO

DECLARE @RESULT TINYINT
EXEC @RESULT = dbo.test;

PRINT @RESULT
GO

Used in this way, RETURN can be incredibly useful when it comes to returning status from procedural code.

ORDER BY, OFFSET, and Fetch in TSQL

My most recent Quizletter featured a quiz on ORDER BY in TSQL, complete with questions on the OFFSET and FETCH clauses which we got in SQL Server 2012.

One great thing about teaching TSQL School is that it reminds me of these cool options that can be easy to forget.

Want to take the quiz before you read the answers?

Take the quiz here first, if you’d like.  If you get 100%, you probably don’t need to read this post 😉


Overall Quiz Stats

Folks did pretty well on this quiz, but lots of people did fall into one mean little trap that I left in one question. (Sorry! I can be tricksy.) Here are the stats at the time of writing this post:

Entries: 436 Average Score (Out of 4): 2.64 Average Percentage: 66%

Now for the fun stuff, let’s talk about answers!


Question 1

1) How will the results of this query be ordered (and why)?
SELECT PersonID
FROM Application.People;

Correct: 76%
Answer: No ordering is guaranteed in this case

This is very important to know: although the answers may come back ordered by PersonID in this case, you cannot count on that always being the case, and it could change at any point. Ordering of results is never guaranteed unless you explicitly use an ORDER BY in your query.

Question 2

2) What will this ORDER BY do?*
SELECT PersonID,
FullName
FROM Application.People
ORDER BY 1 ASC;

Correct: 90%
Answer: Order the results by the first column listed, PersonID

Numbers used like this in the ORDER BY clause represent the column position. I was surprised that so many people knew the answer to this one, to be honest. It is a little bit troubling, because Microsoft calls out using numbers as positional representations in order by as an anti-pattern that should be avoided.

But, I confess, I personally do use this myself when I’m writing quick and dirty code (and sometimes forget to fix it, too).

Question 3

3) What will this OFFSET / FETCH clause do?*
SELECT FullName
FROM Application.People
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

Correct: 26%
Answer: This query will throw a syntax error

Why? You can only use OFFSET with an ORDER BY. (This ain’t no TOP operator which doesn’t require ORDER BY, this is a part of ORDER BY.)

I asked this question to try to get a sense of how many people are currently using the OFFSET clause already in code and would pick out the goof based on familiarity. Seems like not many! That’s OK, but it’s good to know how this works for those times when it may come in handy.

Question 4

4) What will this OFFSET clause do?*
SELECT FullName
FROM Application.People
ORDER BY PersonID
OFFSET 10 ROWS;

Correct: 72%
Answer: Return all the rows except for the 10 with the lowest PersonIDs

You are allowed to specify OFFSET without a FETCH NEXT. In this case, we are ordering by PersonID. We didn’t specify ascending or descending, so the default ascending sort order is used for the order. We are OFFSET-ing the first 10 rows ASC, so that’s the lowest 10 PersonIDs. We’ll get all the other rows, because we haven’t limited how many will be fetched.

And now, for a terrible joke

Unlike cursors, order by has finally made fetch happen. (Sorry.)

Nice work, folks!

When Should I Use Explicit Transactions for Single Statements?

I got a great question about transactions by email recently.

Here it is:

When should you use transactions?

From what I have read online the answer is whenever you can, but I haven’t found a good why answer.

I understand that you should be using them when you want a set of events to succeed or fail. But what if you just have a single statement that doesn’t depend on other statements – why should you use transactions?

We have experienced some errors when we used them and so I was wondering what merits does it have that make it worth the code maintenance, especially since SQL Server implicitly creates transactions.

Part of why this is extra confusing is that there are three types of transactions. First up, let’s clarify a few terms.

Explicit vs. Implicit vs. Autocommit Transactions

1. Explicit transactions

This is the type of transaction where you explicitly start the transaction with BEGIN TRAN or BEGIN TRANSACTION, and later complete the work with either a COMMIT statement or undo it with a ROLLBACK statement.

As our questioner writes, if you have multiple statements that should either succeed or fail as an individual unit, BEGIN TRAN allows you to bundle them together.

2. Implicit transactions

Implicit transactions are a bit weird, and I typically only run into them when applications have been written for a different relational database and then ported to SQL Server.

To use implicit transactions, you run the statement: SET IMPLICIT_TRANSACTIONS ON;

This puts you in a mode where running a statement which accesses a table implies that you ran BEGIN TRAN. You must explicitly COMMIT or ROLL BACK when you’re done with a unit of work, whether it’s one or more statements.

I find implicit transactions confusing, and I don’t use it.

3. Autocommit transactions

If you do not enable implicit transactions, and you don’t start an explicit transaction, you are in the default “autocommit” mode.

This mode means that individual statements are automatically committed or rolled back as whole units. You can’t end up in a place where only half your statement is committed.

Our question is really about whether there are unseen problems with this default mode of autocommit for single-statement units of work.

So, to rephrase the question…

Squirrel with nuts
mmmmmmm

Are there times when we should use an explicit transaction for single statements, instead of relying on autocommit?

Generally, you’re fine staying in the default autocommit mode for singleton statements that are entirely their own unit of work.

I’ve fun into a few instances where creating a larger unit of work for modifications helped performance, however! Example: are you doing a lot of tiny modifications?

Sometimes you get better performance bundling multiple statements into a transaction

Way back when I was a Junior DBA, there was a project to modify a lot of data in our OLTP database. We had a big change coming up which required a lot of new data to be present, and some data to be modified for customers.

We couldn’t have much downtime for the change, so a plan was hatched to get the data all set up in production while the system was live, then to cut over to the new feature as the actual change. This was tested out in staging, and everything was great, but slow.

The staging environment was generally slow, so that didn’t mean much by itself.

But by the time we got to deploy the change to production, it was slow as well. Luckily, we had a plan to make it faster.

  • Indexes had been deployed to help make the modifications as fast as possible
  • The scripts to make the data changes were set up to be able to be safely stopped at any time and restarted, in case of blocking problems or any concern about performance
  • The scripts had a @BatchSize parameter, which allowed the DBAs to determine how many modification statements were run inside a single transaction
  • The scripts had a @WaitDuration parameter, which allowed the DBAs to determine how many milliseconds to wait between batches

Each modification statement was a single unit of work, but the script ran faster when multiple modification statements were committed at once!  @BatchSize = 1 was not awesome, it was very slow.

This is because lots of tiny little commits can put a lot of pressure on your transaction log, causing a bottleneck

It’s like wanting to read someone a long story, but making a phone call for every single word.

If the person is very busy and can’t talk to only you for a whole hour, to the exclusion of everyone else, then you want to figure out how long they can spare for a call, and how many words you can fit into a single call.

Amit Banerjee does a demo and breaks everything down in a great way in this classic post.

Explicit and autocommit transactions show up differently in some performance counters

I don’t think this is a reason to use explicit transactions, it’s just something to be aware of.

The basics are this:

  • Modifications in autocommit mode and select statements in explicit transactions both show up under Transactions/sec and Batch Requests/sec
  • Select statements in autocommit mode do not show up under the Transactions/sec counter, but they DO show up under Batch Requests/sec

You can watch a demo of it on my post here.

Generally speaking, I think this is a reason to use the Batch requests/sec counter over Transactions/sec (not a reason to go adding transactions to all your singleton select statements).

What about marked transactions?

The idea behind marked transactions sounds pretty cool: you can mark a transaction as a recovery point.

Let’s say I’m releasing a big change that touches multiple databases. I could use marked transactions to allow me to restore those databases all to the mark, so that I could perhaps restore them to right before the change was deployed, or right after it was complete (depending on which I used).

The reason I’m not crazy about using these “just in case,” or for very frequent use, is that there some overhead:

  • They use space in your transaction logs in each database they touch
  • They use space in msdb, and if you use a ton of these, you may need to do some maintenance there
  • There is some complexity around using this with multiple instances (details here)

I’m not against marked transactions if they meet a real need and the team has the time to monitor their overhead. For large releases that touch multiple databases, SAN snapshots are often more convenient (although obviously they are not tied to a transaction).

Do you have a question?

Send it to me here!