I recently received a terrific question about table partitioning:
I’m writing up a proposal for my company to start partitioning a 2.5TB table. The idea/need is to control the size of the table while saving the old data. The old data will be moved to an archive database on a different server where the BI guys work with it.
In none of the videos articles I’ve seen is the explanation of how the rolling partition works on a long term daily basis.
- Are the partitions reused, like in a ROUND ROBIN fashion?
- Or, do you add new partitions each day with new filegroups, drop the oldest partition off – this would be FIFO?
Lots of folks assume the answer here is always #2, simply because there’s a bunch of sample code out there for it.
But option #1 can be simpler to manage when it fits your data retention technique!
Reusing partitions – the “rotating log” model
Reusing partitions can absolutely work if you have a fixed number of partitions.
By reusing partitions, you don’t need to manage the code or the locking required with splitting or merging your partition functions.
You can still leverage switching, however! As long as you know that data in a given partition doesn’t need to be read, you can switch the data out and switch new data into it. You can also truncate individual partitions as of SQL Server 2016. (That is, of course, going to require high level locks.)
This also means that you don’t have to constantly drop and recreate staging tables for switching data in, if you don’t want to.
Thomas Kejser writes more about the rotating log pattern here: http://kejser.org/table-
What if I don’t have a fixed number of partitions?
In Thomas’ model, the data which is switched out will be used to update an aggregate table of older data.
If you need to keep detail of older data, you could switch the data from the oldest partition into another table which is prepared as an archive table. You can even used indexed views to access data from both the “active” rolling table and the archive table together.
Remember that switching is only allowed when partitions are on the same filegroup, so you would want to plan this carefully if that was the case.
In the case of our questioner, older data will be moved out to another database, so it could simply be BCP’d out.
One note about filegroups
This question somewhat implies that each partition will be on its own filegroup.
Sometimes folks use way more file groups than they need with the sliding window model, and it added extra complexity to their project.
Filegroups can be very useful for:
- Individual restore sequences
- Running CHECKDB on part of the partitioned table with DBCC CHECKFILEGROUP (there is no partition level checkdb)
- Putting some data on slower storage and some data on faster storage
However, if you are planning a very granular partition scheme, having one partition per filegroup can lead to a lot of wasted space if you ever rebuild data in each partition. It’s worth thinking about if it makes sense to group multiple partitions on the same filegroups.
Thanks for the great question!
If you have a question, ask it here.