Donnerstag, 08. Januar 2009

Cleaning versions in the umbraco db for given nodes via SQL

In this thread I want to explain how to clean the umbraco database from unnecessary versions to reduce the database size. Again I got some mails pointing me to help them to clean the database from millions of versions some nodes have. In one of the last mails Jennifer mentioned that she has 4 nodes which have each more than 900.000. This results in a size for the cmsPropertyData table of more than 1.7 GB.

Symbol-StopBefore we go on I have to mention again that this post is a "Works on my machine" post. Please be sure what you do and please, please make a backup of your database first. Also please read my post Understanding the umbraco database first to get a glue of what you are doing.

If you only have a few nodes which have millions of versions you can clean them easily via SQL, far from it it is the easiest way. In the next paragraphs I will show how you can do it by yourself.

Identifying the nodes

First of all identify the nodes which have the various versions and their ids. As in my first post described it can be done with the following SQL:

SELECT nodeId, COUNT(*) Number
FROM cmsDocument
GROUP BY nodeId
HAVING COUNT(*) > 2
ORDER BY Number DESC

Just mark the ids down.

Get the newest and the published version

Each time umbraco saves or publishes a node a new version is created in the database. But for each published version there is always a newer version in the database. But why? Umbraco is by concept working with a new version (a copy of the published version, or a copy of the last saved version) and not the one which is published. So it prevents the user to break the published version. So if you publish a node the actual version you are working on is saved as the published one, the previously published version is marked as not published, a new version is created and send back to the umbraco user to let him work with this. Each published node has a published and a newest row in cmsDocument:

umbracocmsdocument

For each id (node) you now have to identify which version is the newest and which version is the published one. I added some variables via DECLARE for typing horror:

DECLARE @id int
SET @id = 3243

DECLARE @newest UNIQUEIDENTIFIER
SET @newest = (SELECT versionId FROM cmsDocument WHERE nodeId = @id AND newest = 1)

DECLARE @published UNIQUEIDENTIFIER
SET @published = (SELECT versionId FROM cmsDocument WHERE nodeId = @id AND published = 1)

Optional: Delete the indexes

The cmsPropertyData table has four indexes for better reading performance. For deleting it slows down and let the SQL Server log file grow heavily. So if you have the possibility you can delete the indexes first like this:

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData')
DROP INDEX [IX_cmsPropertyData] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData_1')
DROP INDEX [IX_cmsPropertyData_1] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData_2')
DROP INDEX [IX_cmsPropertyData_2] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData_3')
DROP INDEX [IX_cmsPropertyData_3] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

You can create these entry's by a rightclick on the index in SQL Server Management Studio.

Delete the unnecessary versions

Now you can delete the not needed versions via

DELETE FROM cmsPropertyData WHERE contentNodeId = @id AND versionId NOT IN (@newest, @published)
DELETE FROM cmsContentVersion WHERE ContentId = @id AND VersionId NOT IN (@newest, @published)
DELETE FROM cmsDocument WHERE nodeId = @id AND versionId NOT IN (@newest, @published)

Be aware that these scripts can last a very long time thus you deleted the indexes (without deleting the indexes it will last hours ;-))

Recreate indexes

After deleting all unneeded versions you can now recreate the indexes:

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData] ON [dbo].[cmsPropertyData]
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData_1] ON [dbo].[cmsPropertyData]
(
    [contentNodeId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData_2] ON [dbo].[cmsPropertyData]
(
    [versionId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData_3] ON [dbo].[cmsPropertyData]
(
    [propertytypeid] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

After completing the cleaning you now can reorganize or shrink your database files.

Conclusion

The complete SQL script can be downloaded here.

I hope I could show how various versions easily can be deleted via SQL if you have a growing database like in this umbraco forum post described. As mentioned before please be careful with your database and make a backup first. If you have any suggestions just write a comment or get in contact with me via mail (th |at| thoehler |dot| com).

back to overview Comments: 4

Comments

  1. James wrote at Mon, 25 May 2009 10:56:40 GMTInterestingly I ran this on an out of control Umbraco DB and got the below: Msg 512, Level 16, State 1, Line 11 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression. (0 row(s) affected) (0 row(s) affected) (0 row(s) affected) Any ideas?
  2. Christian Palm wrote at Tue, 07 Jul 2009 13:10:58 GMTThanks, work out perfect. Just need to do some tweaks. In umbraco 3.x there is more than 4 indexes on [cmsPropertyData] The problem that James describe was also a problem for me. Changed it to: SET @newest = (SELECT TOP 1 versionId FROM cmsDocument WHERE nodeId = @id AND newest = 1 ORDER BY newest DESC, updateDate DESC) I was going from 687.009 versions down to 188.039 by clearing versions from 3 nodes. There must be a bug in the website
  3. Thomas Höhler wrote at Wed, 08 Jul 2009 14:15:02 GMTGlad that it helped you. The problem with the growing db was only encountered in v3. I never heared about it in v4. It was a bug in < v3.0.7 I guess. I did that script for an 8 GB db which I shrinked down to 230 MB. Thanks for the update of the script.
  4. Michael Wheelaghan wrote at Tue, 27 Oct 2009 01:25:40 GMTGreat info here! I read on the Umbraco forums you had some success in incorporating your own webservice to handle the creation of media types? Any info you can share would be greatly appreciated on that So far creating documents seems fine thru asmx, but for media the only way I can see is to insert directly into the umbracoNode table as Media.MakeNew accepts only a User type which requires the web context, however Document types permit soap authentication...any light u can shed on this greatly appreciated. cheers !

Add a Comment







Archive


Subscribe Feed

Me on Twitter

07.29.10 18:04:  So, auf in die Pfalz... 5 Tage raus aus dem Alltag...

07.28.10 20:19:  Erst mal den R# installieren, ohne ist schon doof wenn man's mal gewohnt ist...

07.28.10 06:03:  @thorstenh Ein Michey Mouse NB: so ziemlich das billigste Sony: http://tinyurl.com/2wv5yfs aber läuft 1a und ist erst mal ausreichend

07.27.10 20:50:  Installationsorgie...

07.27.10 20:23:  Ahhh, neues Notebook in action...

About

It's me
This blog is written by Thomas Höhler. Living next to Frankfurt, Germany, I am trying to share my experiences about Umbraco [the most flexible CMS I know], my ClientTools for Umbraco, and some other more or less usefull stuff

XING

Latest Posts

Design & Masterpage by 69° media solutions