Freitag, 27. Februar 2009

Cleaning all unnecessary versions via SQL

Symbol-StopAfter my first small paternity leave from about two weeks I got a direct tweed asking for a sql script which is cleaning all unnecessary versions from all nodes at once. Unlike to the SQL Script posted before which deletes all versions from a given nodeId (usefull if you have a massive growing database with thousand of versions for one node) this script will delete all unnecessary versions from all nodes at once. As usual this is a works on my machine post. Please make a backup from your database and report misbehaviour.

How it works

At first the script will create a temporary table and inserts all nodeId's from nodes which are documents (remember, not only documents have versions, also members and medias can have versions) and which are published and not in the recycle bin.

-- Create a temporary table for all documents which are published and not in the recycle bin
CREATE TABLE #Nodes (id int)
-- Delete all rows if the table exists before
TRUNCATE TABLE #Nodes

-- Insert all nodeIds from all documents which are published and not in the recycle bin
INSERT INTO #Nodes
    SELECT N.id
    FROM umbracoNode N
        INNER JOIN cmsDocument D ON N.ID = D.NodeId
    WHERE nodeObjectType = 'C66BA18E-EAF3-4CFF-8A22-41B16D66A972'
        AND [path] NOT LIKE '%-20%'
        AND D.Published = 1

After that it creates another temporary table and inserts all versions from the nodeIds in the first temporary table which not are published an which not are the newest ones.

-- Create a temporary table for all versionId's to delete
CREATE TABLE #Versions (id UniqueIdentifier)
-- Delete all rows if it exists before
TRUNCATE TABLE #Versions

-- Insert all versionId's from all nodeIds in the #Nodes table
-- and where published is set to false and newest is set to false
INSERT INTO #Versions
    SELECT versionId
    FROM cmsDocument
    WHERE nodeId IN (SELECT id FROM #Nodes)
        AND published = 0 AND newest = 0

Now it will delete all rows from the cmsPropertyData, cmsContentVersion and cmsDocument where the versionId's are in the second temporary table.

-- DELETE all versions from cmsPropertyData, cmsContentVersion, cmsDocument
-- from the nodes which are published and which are not in the recycle bin
-- and which are not published and which are not the newest
DELETE FROM cmsPropertyData WHERE VersionId IN (SELECT id FROM #Versions)
DELETE FROM cmsContentVersion WHERE VersionId IN (SELECT id FROM #Versions)
DELETE FROM cmsDocument WHERE VersionId IN (SELECT id FROM #Versions)

Unlike the script in the post Cleaning versions in the umbraco db for given nodes via SQL this script doesn't delete the indexes. If the script runs to long you are free to add these step to this script. On the other hand this script can be used as a regularly cleanup.

Download the complete script here and give me your feedback.

back to overview Comments: 5

Comments

  1. Comicsken wrote at Thu, 05 Mar 2009 18:14:58 GMTThanks a lot! i applied the script on a backup i could get my hands on and it went from 5,7gb to 3,8 gb One question, i ran the SELECT nodeId, COUNT(*) Number FROM cmsDocument GROUP BY nodeId HAVING COUNT(*) > 2 ORDER BY Number DESC and it gives me the same numbers before and after, but shouldn't it be different? As said before, really thanks a lot for the help! Really appreciate it :)
  2. Thomas Höhler wrote at Thu, 05 Mar 2009 21:09:24 GMT3.8 gb seems to be absolutely too much. In my company we have 6 sites running on one umbraco and have a db size from 50 mb. Did you run the identifying script from http://blog.thoehler.com/2008/12/10/understanding-the-umbraco-database. Regarding the select statement: do you have the same numbers of Count(*) or the same numbers of rows?
  3. Comicsken wrote at Fri, 06 Mar 2009 00:00:23 GMTHi Thomas! Thanks for the reply! 3.8gb is idd too much, i agree. Yes, i ran the script, both numbers where the same: the count(*) and the numbers of rows returned. Hence my question, it looked weird.
  4. James wrote at Mon, 25 May 2009 11:03:44 GMTI ran this one too trying to clean up all the multiple nodes. It ran once through ok, and got a lot of them, but there is still one node with 20 odd thousand versions. I tried the other one that just does individual nodes, but it gives the error I mentioned on that page. Trying to re-run this one I get: Msg 2714, Level 16, State 6, Line 2 There is already an object named '#Nodes' in the database. I tried dropping the #nodes table but that doesn't work, neither does running just a truncate...
  5. Gerty wrote at Thu, 22 Apr 2010 14:36:06 GMTTo limit it in time (f.e. delete only versions older than 2 months) change INSERT INTO #Versions SELECT versionId FROM cmsDocument WHERE nodeId IN (SELECT id FROM #Nodes) AND published = 0 AND newest = 0 to INSERT INTO #Versions SELECT versionId FROM cmsDocument WHERE nodeId IN (SELECT id FROM #Nodes) AND published = 0 AND newest = 0 and updateDate < dateadd(month,-2,getdate())

Add a Comment







Archive


Subscribe Feed

Me on Twitter

09.07.10 20:02:  Aaaaadler ;-)

09.07.10 20:01:  Einmal Bier holen = Tor verpasst => Der Weg in den Keller ist eindeutig zu lang...

09.04.10 06:04:  @Shazwazza Done ;-) #ucomponents #Umbraco

09.03.10 20:43:  @netaddicts LOL

09.03.10 20:43:  @asp_net wie war das bei der 2010 Roadshow, Zitat: 'google das mal mit Bing'

About

It's me
This blog is written by Thomas Höhler. Living next to Frankfurt, Germany, I am trying to share my experiences about Umbraco [the most flexible CMS I know], my ClientTools for Umbraco, and some other more or less usefull stuff

XING

Latest Posts

Design & Masterpage by 69° media solutions