The Case of the Rowgroup Deadlock in a Columnstore Index

I came across a fun deadlock when writing demos for my session on the Read Committed isolation level this week. (It’s OK to call it “fun” when it’s not production code, right?)

I was playing around with a nonclustered columnstore index on a disk-based table. Here’s what I was doing:

Session 1: this session repeatedly changed the value for a single row, back and forth. This puts it into the delta store.

Session 2: this session repeatedly counted the number of rows in the table, using the columnstore index.

With my sample data in this scenario, I found I frequently generated deadlocks.

Let’s look at the deadlock graph

I started up an Extended Events trace for the xml_deadlock_report event, and here’s what this deadlock looks like (with some annotations)…

Click for a larger view

Breaking it down

The circle on the left is Session 2 – it was running a SELECT COUNT that used the nonclustered columnstore index. It was chosen as the deadlock victim.

Reading the arrows, the SELECT COUNT query:

  • Had a lock on the compressed rowgroup in the columnstore index
  • Wanted a shared lock on the b-tree delta store for the columnstore index to count the rows for data that had been changed

The circle on the right is Session 1 – it was running an update that changed the value for one column in one row. Reading the arrows, the UPDATE query:

  • Had an intent exclusive lock on the b-tree delta store
  • Wanted an intent exclusive lock on the compressed rowgroup (presumably to do the work to make sure it was clear the related row is in the delta store)

Voila– deadlock!

How do I fix it?

If I really want to churn changes into my columnstore index at the same time that I rapid-fire query the data, I may want to change my isolation level for the query counting the data.

Read committed snapshot isolation or snapshot isolation for the SELECT COUNT query  can change this blocking scenario and sneak us right past those deadlocks — as long as enabling those is the right thing for my application and my database server.

The more clever we get, the more ways things can backfire

I’m not writing this post to bash columnstore indexes — far from it. Churning tons of updates into a columnstore index isn’t necessarily what you want to do.

Changing to indexes or data can cause deadlocks. It can also make them go away! That’s a fact of life, and as we add more indexing tools to our toolkits, we still gotta live with it.

Want to learn more about deadlocks?

Check out my course, Troubleshooting Blocking & Deadlocks for Beginners. It gives you example code to create a deadlock, and walks you through tracing and decoding the deadlock graph, too.

3 thoughts on “The Case of the Rowgroup Deadlock in a Columnstore Index

  1. Consider yourself fortunate to not hit one of the stack dumping bugs that exists with CCIs (one is fixed in 2016 SP1 CU5).

    1. Except that would be great for demo’ing the SSMS add in for stack dump analysis! Does it correctly identify them? 🙂

      1. I couldn’t tell you unfortunately. None of the machines with access to the server have access to the dumps, and we can’t just blindly let them go out to a random web service due to PII (and other compliance) concerns.

        If you want to create a stack dump for testing, the following query (fixed in SP1 CU5) will give you one most of the time:

        SELECT contention_factor FROM sys.dm_os_memory_objects;

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.