Posted on

Perks for My SQLPASS Summit Precon: Prevent ‘Fake News’ in Your Data! Isolation Levels Demystified

I’m proud, giddy, and just plain overly excited to be giving a pre conference session at the SQL PASS Summit in Seattle, Washington on November 5, 2018.

This is going to be a fun, demo-packed day which will teach you skills you can use throughout your career as a developer, database administrator, or data professional.

Why attend?

While features and tricks change from year to year (or even month to month), understanding isolation levels is critical for your long term future.

The Session: Prevent ‘Fake News’ in Your Data! Isolation Levels Demystified

Are your users seeing bad data? If you’re using the default isolation level of read committed in SQL Server, chances are that sometimes your users get incorrect results. If NOLOCK hints lurk in your code, the odds are even higher that sometimes your customers see information that just isn’t right.

In this day-long, demo packed session, you’ll learn why a single statement may read rows twice, miss rows entirely, or return combinations of data that never existed in the database — and why that’s not a bug. You’ll learn what “read phenomena” are, which isolation levels are vulnerable to them, and the performance trade-offs which come from raising your isolation level to protect your users from bad data.

You’ll see how isolation levels work with newer technologies such as columnstore indexes, In-Memory OLTP, and Always On Availability Groups. You’ll discover why version-based isolation levels can be awesome, and what you need to look out for with these isolation levels to avoid race conditions that produce– you guessed it– incorrect results.

At the end of the day, we’ll pull together all this information into a guide. You’ll leave the seminar with the tools and knowledge to choose the right isolation levels for new and existing applications based on business and performance requirements.

What Does it Cost?

A single pre-conference session is $499. You can also bundle pre-conference sessions with registration at the full Summit.

The Perks

Attendees at my pre-conference session will not only get all the demos we show live, and a download of the PDF of slides, but also….

Perk: Watch It Again Later

Access to a recorded online session of my 8 hour seminar, “Conquer Blocking & Isolation Levels” for a full year.

Want to revisit a topic and get the nitty gritty details? You’ll be able to watch anytime, online.

Note: the live pre-conference sessions at PASS aren’t recorded for purchase anymore– however, you can buy recordings of “regular” sessions during the week from PASS in different formats.

Perk: A Month of SQLChallenges!

For four weeks following the conference, I’ll be hosting SQLChallenges on blocking and isolation level problems.

  • Attendees will get access to the problems and a chance to apply the skills they’ve learned.
  • I’ll hold a live, online session going over sample solutions for the problem. Attend, share your solution in Slack if you’d like, and ask questions.
  • Can’t make the SQLChallenge live? The recording will be posted and you’ll get online access for a year. All SQLChallenges support Q&A in the course pages.

Perk: Private Follow-Up Quizzes

Want even more learning? Following the SQLChallenges, attendees will get access to a series of four weekly quizzes about blocking and isolation levels to confirm your knowledge.

Combining this will the month of SQLChallenges will fully establish your new knowledge in place, getting you the most value for your dollar.

Register here

I hope to see you at the PASS Summit! Sign up here to join my pre-conference session.

Postscript: SQL Symposium subscribers get access to all these perks as well. As sustaining members during the first year of SQLWorkbooks, SQL Symposium members get it all 🙂 

Posted on

Row Width Impact on Version Store Usage under Snapshot Isolation

A question came up in my webcast today for subscribers on the topic of Snapshot and Read Committed Snapshot Isolation – what impact will enabling these have on my SQL Server instance?

I talked a bit about how, for tempdb, this can vary widely depending on the number of updates and deletes running against the instance. (This is a simplification, there is more overhead, but this is often the biggest component.)

But after the webcast, I realized that the impact on the version store isn’t simply on the number of updates and deletes run. It will also vary based on the width of your rows — how many columns are in the table, what data types the columns have, and what data you have in the table.

Even if you change one very small column, the entire row is versioned – sort of

My favorite thing about SQL Server is that it includes rich tools to let us learn how the engine works by setting up simple demos. It’s truly awesome.

So I went to work to demonstrate row width impact on the version store — when only a tiny bit column is changed in the row.

Here’s how I did the test:

  • I created two tables, dbo.Narrow and dbo.Wide. They each each have a bit column named bitsy, along with some other columns.
  • I inserted one row in each table, but I put a lot more data into the row in dbo.Wide.
  • I allowed snapshot isolation on the database
  • I began a transaction in another session under snapshot isolation and left the transaction open (so version store cleanup wouldn’t kick in while I looked around)
  • I updated the bit column named bitsy for the single row in each table, thereby generating a row-version in tempdb for each table

The code I ran to test this is here, if you’d like to play around with it.

The row in dbo.Narrow is 38 bytes of in-row data

I measured the width of the largest row in dbo.Narrow using sys.dm_db_index_physical_stats. It only has one row, so the max width IS our row width. It’s a lil bitty 38 bytes.

The row in dbo.Wide is 8,054 bytes of in-row data, and 8,012 bytes off row

On the other side of the scale is dbo.Wide. This takes up 8,050 of in-row data (this happened because the ‘j’ column is a char column, which is fixed width). It also is using 8,012 bytes of LOB_DATA storage, because I used the REPLICATE function to put a lot of garbage into the k column in dbo.Wide.

Let’s look at the version store!

We can query the version store through sys.dm_tran_version_store. Querying this in detail on a production system may not be great if there are a lot of versions, but this is an idle test instance and I’ve only updated two rows.

There are two rows in my version store. The Narrow table has 24 bytes of versioning data, the Wide table has 8,040 bytes of versioning data.

Observation 1: my whole row was versioned except for the LOB data

Large object data stored on LOB pages, like that in my NVARCHAR(MAX) column, can be versioned. Snapshot Isolation and Read Committed Snapshot Isolation work with LOB data too, and it’s easy to verify that that they can be versioned by updating the column named ‘k’ in this demo.

But in this case, I don’t see them in the version store.

That’s awesome– because I didn’t update the column that’s stored on LOB pages! I only updated a bit type column on that row. SQL Server is being clever about this.

Observation 2: my versioned rows are 14 bytes smaller

Comparing the size of my row-versions with the rows in the table:

  • dbo.Narrow: 38 bytes in the table row, 24 bytes in the row version = 14 bytes difference
  • dbo.Wide: 8,054 bytes of in-row data in the table row, 8,040 bytes in the version store = 14 bytes difference

That 14 bytes is a transaction timestamp/pointer piece of information in row of the data table itself that is an additional part of the overhead of row versioning — but it doesn’t have to be copied into tempdb as well.

Summing up: Wider rows will generally mean more tempdb usage for versioning

Even if you’re just updating one tiny column in your table, the entire row will be versioned — with one important exception.

My testing shows that data stored on LOB pages isn’t appearing in the version store unless I modify that data itself (not simply modifying another column in the row). Note: I haven’t found this explicitly documented while searching, and I only tested this against SQL Server 2017, so there may well be times when this is not the case– but it does make sense that SQL Server could do this quite elegantly for LOB pages.

What should I do if I am considering enabling Snapshot Isolation?

If you have the ability to load test or run a ‘replay’ workload outside of your production environment, that’s absolutely the best bet.

If that isn’t realistic in your world, you can baseline performance on your current production workload, then test allowing snapshot isolation against the database before anything uses it at all. Versioning will still take place, and you can compare performance with your established baseline.

Want to learn more about transaction isolation?

Join me for my upcoming online SQL Seminar on isolation levels and blocking (a few seats are currently on sale) – or come to my in-person pre-conference session at the SQLPASS Summit on preventing fake news in your data.

Posted on

Watch the first lesson in my new “Repeatable Read and Serializable Isolation Levels” course! (Video with transcript and captions)

Woo hoo! I’ve just added a new course: Repeatable Read and Serializable Isolation Levels in SQL Server.

Here’s the scoop on what the course covers…

Serializable and Repeatable Read isolation levels offer protections so your users won’t see weird or incorrect data — but there are tradeoffs for those protections. Learn how to tell if your existing applications are using these types of isolation levels, when you might want to raise your isolation level in SQL Server, and the tradeoffs you make if you choose serializable or repeatable read with disk-based tables.

This course includes 47 minutes of videos, plus a quiz:

  • Why do we have repeatable read and serializable? (10 minutes)
  • Course downloads: scripts and slides (PDF)
  • Demo: Give me correct data or give me death (15 minutes)
  • Animated play diagram of the index intersection problem (6 minutes)
  • How you can get serializable isolation, even if you didn’t ask for it (4 minutes)
  • How to tell if apps are using repeatable read or serializable (5 minutes)
  • What’s the best way to get correct data? (7 minutes)
  • Course quiz (10 questions)
  • Course survey (optional)

Register for the course here — it’s free at the time of this writing! Whoops, this giveaway has ended.

Note sure if this course is for you? Watch the first lesson right now, below — it’s 10 minutes long, and it covers important problems that you need to know about the default read committed isolation level in SQL Server.

Why do we have repeatable read and serializable? (10 minutes)

Transcript

Welcome to Repeatable Read and Serializable Isolation Levels in SQL Server. I’m Kendra Little from SQLWorkbooks.com.

Today, we will be answering five questions:

  1. We’ll be talking about why we need repeatable read and serializable isolation levels; we’ll be doing demos of how we can get incorrect data in the read committed isolation level
  2. If we’re using repeatable read and serializable, under different situations we say we have to have correct data or we’re just not going to finish our query
  3. We will talk about ways that the serializable isolation level may sneak in, even if you aren’t using it on purpose! Even if you if you haven’t changed your isolation level to serializable, or use a hint, there’s some scenarios where SQL Server will escalate your isolation levels behind-the-scenes.
  4. And we’ll talk about how to tell if your apps are even using these — as well as a very important question:
  5. What’s the way to make sure that you get correct data in SQL Server?

First up: why do we need repeatable read and serializable?

In fact there can be clowns and ghosts — or phantoms– that are sneaking into your data!

The default isolation level in SQL Server is called read committed. This is the default isolation level in what’s called the “boxed product” of SQL Server. If you install SQL Server yourself in a virtual machine or on a server– I’m not talking about Azure SQL Database– but if you create a database on a SQL Server you install yourself, the default isolation for your queries is called read committed. In Azure SQL Database, it’s different, because read committed has some issues.

“Read committed” sounds really good, right?

Okay, I’m going to read committed data. That sounds like it would be great, but read committed is lock based.

When we read, say from an index in read committed, we run through that index getting locks as we go and releasing them right away. That also sounds good, but when you look into the details, what this means is: since we let go of the locks as soon as we read them, and our query may still be running, data can move around on us.

We may read some rows twice: if we read it and passed it and then it jumped ahead, we may see it again.

We may miss a row entirely: if I’m running through the data and a row is updated and moves behind me, I may never see it as I traverse that index.

And, we even can return combinations of data that never existed. Just plain part “before data”, part “after data.” And we’ll dig into this last one. It’s the weirdest one, we will dig into a demo in detail and see how that can happen in read committed.

Isolation levels are defined by “phenomena”

When we talk about isolation levels, not just in SQL Server, but when it comes to relational databases altogether, isolation levels are actually defined by what phenomena can happen in that isolation level.

These phenomena are different types of weirdness that can happen– different “issues” if you will that can happen under that isolation level.

Dirty reads

Under the default isolation level of read committed, the phenomena of dirty reads… Represented here by the poop emoji… that is not possible. So read committed at least has the virtue of not allowing the dirty reads phenomenon.

But there are other phenomena that can happen in read committed: we can have non repeatable reads, which we will see as a clown. And we can have phantom reads.

Non-repeatable reads

The name, at first didn’t make any sense to me. But it really means what it says.

It means: I read this data and if in my transaction I were to read it again, I would not get the same value. This can be a huge problem, and the data will just look wront to users.

Phantom reads are a little different

If I read the same set of rows again, I won’t get the same set. Maybe there is a row inserted, maybe there were three rows when I read the data first, maybe there were four rows one I read the data again. Nothing was updated, but a phantom row appeared.

Simple examples of non-repeatable reads and phantom reads

This is not that hard to imagine.

For a non-repeatable read, imagine that I’m running a report

I’m running this in read committed, and I do begin tran. My first SQL statement selects the SUM of my revenue from the table. I’ve summed up my revenue. After that first statement completes, another transaction updates a row in that table, and it changes some of the data. My report it has the summary data at the top, and it has a detail section below it. So I then run a statement that’s getting all the detail data. In read committed we are going to get the updated information in that second query. So, when we selected the sum of revenuel we have a non repeatable read in there. We read some data that changed after that. We read it again and it was different.

Phantom rows are similar. We’re running our same report.

We select our summary data, and then after we select that summary data, but before we do anything else, another transaction along. They insert a new row into the same data we’re selecting. In this case we don’t even have a where clause, we don’t even have any predicates. They just insert a row into that table.

Now, when we run the detail data, now we’ve got an extra row in there that isn’t accounted for in our sum.

This data isn’t right

In both of these cases, in the case of the non-repeatable read and in the case of the phantom read, if your CEO is reading this report, she’s going to look at this and be like, “This data isn’t right. These numbers don’t match.”

So, we either look like a clown, or we’re seeing a ghost in our records, neither of which are good.

Higher isolation can protect you from clowns and ghosts

The isolation levels repeatable read and serializable were introduced to make sure that if we don’t want to have these phenomenon, if we don’t have these issues, we don’t have to. We can prevent them. If we aren’t worried about phantoms and we only want to prevent non-repeatable reads, we can set our isolation level to repeatable read. You can also use a query hint, but this as for my entire session, for everything right, you know, I want to use repeatable read unless I hint otherwise. We start our report, we select the SUM of our revenue.

The difference is, now that I’m in repeatable read, I am, after I select this data, I going to hold on to some locks on that data that protects it and says if anyone comes in and they try to update a row, I have a transaction that is — if I read this data again, it needs to be the same. I need to protect it. So while my transaction is ongoing, somebody else comes in and tries update that row. They are going to be blocked, and they can’t read until I’m done, until I commit my transaction or roll it back.

So now, when I select my detail data, my read are repeatable, they are all the same.

My CEO sees data that matches. I do not look like a clown.

This is good, right?

But there was an impact. There was a tradeoff.

That transaction that came in to update a row, it was blocked, and it had to wait.

What happens if in the rest of this transaction I’m doing stuff that takes a whole lot of time? And, what if the update is really important and there’s an executive from one of my clients waiting for that update to complete on a screen?

So, there are tradeoffs here, if we’re using this method to make sure our data is correct.

Because if we are using a disk-based table, repeatable read is going to protect this with locks. And with locks can come blocking.

What if I’m worried about inserts? Well, repeatable read isolation level is holding locks on what I read: it isn’t doing anything to prevent against rows being inserted. If I’m worried about those phantom rows, I need to set my transaction isolation level to serializable.

Serializable says: you won’t have any non-repeatable reads, and you won’t have any phantoms. So it includes the protections of repeatable read, and then adds additional protections. In this case, I run my first query and I acquire key range locks that are held to the end of my transaction. I’m not just protecting exactly what I’ve read, but I’m protecting a range that says– new stuff cannot be inserted into this range. If an insert happens, the insert will be blocked. Same thing for an update, if an update happens it will be blocked. So when I run my last statement, I can get data consistent with the first statement.

But again, this is a pessimistic– what we’re talking about with repeatable read and serializable are pessimistic isolation levels. We’re collecting key range locks, and, our tradeoff is blocking.

But we get correct data!

And that is a good thing!

Register for the course here. Whoops, this giveaway has ended.

Posted on

When does a Snapshot Transaction Really Begin?

Somebody needs a selfie stick

They say, “never read the comments” on the internet, but I’m lucky to get lots of interesting points and questions in my comments.

Recently, Jim mentioned that he was doing some testing in a database that allows snapshot isolation level, and he saw the something like the following sequence of events. (These are fake timestamps, just for the purpose of illustration.)

  • 00.000 – Session A sets its isolation level to snapshot
  • 00.001 – Session A explicitly begins a transaction with BEGIN TRAN
  • 00.002 – Session A starts a WAITFOR command for 15 seconds
  • 10.000 – Before the WAITFOR completes, Session B inserts rows into dbo.Table
  • 15.001 – Session A starts a SELECT from dbo.Table, which returns the rows that Session B inserted

This seems wrong, because many of us commonly say things like, “in Snapshot Isolation level, all statements see data consistent with the beginning of the transaction.”

But in this case, Session B inserted the rows after Session A began its transaction using Snapshot Isolation level. So why did Session A see those rows?

Snapshot transactions don’t start with BEGIN TRAN

I hadn’t really thought much about this before Jim’s comment. But this behavior is documented deep within the whitepaper, SQL Server 2005 Row Versioning-Based Transaction Isolation. There’s a paragraph titled ‘Understanding the “Beginning” of a Transaction’ which explains:

…the version that a transaction will use is based on the first statement that accesses data, and not the BEGIN TRAN that creates the transaction.

To make this even clearer, the version the transaction will used is based on the first statement that accesses data using SNAPSHOT ISOLATION. If you hint the first statement in the transaction to lower the isolation level (with a NOLOCK hint, for example), it doesn’t read versioned data or ‘set’ the version for the transaction.

So perhaps I should change my language a bit, and start saying something more like, “Under snapshot isolation, all statements see data consistent with the first time data is accessed within the transaction.”

For most users, this won’t make a difference, as usually we perform data access immediately after starting a transaction.

But oftentimes folks do need to get a bit creative, so making this explicit is worthwhile.