Blogs tagged with 'Umbraco Database'

Freitag, 27. Februar 2009

Cleaning all unnecessary versions via SQL

Symbol-StopAfter my first small paternity leave from about two weeks I got a direct tweed asking for a sql script which is cleaning all unnecessary versions from all nodes at once. Unlike to the SQL Script posted before which deletes all versions from a given nodeId (usefull if you have a massive growing database with thousand of versions for one node) this script will delete all unnecessary versions from all nodes at once. As usual this is a works on my machine post. Please make a backup from your database and report misbehaviour.

How it works

At first the script will create a temporary table and inserts all nodeId's from nodes which are documents (remember, not only documents have versions, also members and medias can have versions) and which are published and not in the recycle bin.

-- Create a temporary table for all documents which are published and not in the recycle bin
CREATE TABLE #Nodes (id int)
-- Delete all rows if the table exists before
TRUNCATE TABLE #Nodes

-- Insert all nodeIds from all documents which are published and not in the recycle bin
INSERT INTO #Nodes
    SELECT N.id
    FROM umbracoNode N
        INNER JOIN cmsDocument D ON N.ID = D.NodeId
    WHERE nodeObjectType = 'C66BA18E-EAF3-4CFF-8A22-41B16D66A972'
        AND [path] NOT LIKE '%-20%'
        AND D.Published = 1

After that it creates another temporary table and inserts all versions from the nodeIds in the first temporary table which not are published an which not are the newest ones.

-- Create a temporary table for all versionId's to delete
CREATE TABLE #Versions (id UniqueIdentifier)
-- Delete all rows if it exists before
TRUNCATE TABLE #Versions

-- Insert all versionId's from all nodeIds in the #Nodes table
-- and where published is set to false and newest is set to false
INSERT INTO #Versions
    SELECT versionId
    FROM cmsDocument
    WHERE nodeId IN (SELECT id FROM #Nodes)
        AND published = 0 AND newest = 0

Now it will delete all rows from the cmsPropertyData, cmsContentVersion and cmsDocument where the versionId's are in the second temporary table.

-- DELETE all versions from cmsPropertyData, cmsContentVersion, cmsDocument
-- from the nodes which are published and which are not in the recycle bin
-- and which are not published and which are not the newest
DELETE FROM cmsPropertyData WHERE VersionId IN (SELECT id FROM #Versions)
DELETE FROM cmsContentVersion WHERE VersionId IN (SELECT id FROM #Versions)
DELETE FROM cmsDocument WHERE VersionId IN (SELECT id FROM #Versions)

Unlike the script in the post Cleaning versions in the umbraco db for given nodes via SQL this script doesn't delete the indexes. If the script runs to long you are free to add these step to this script. On the other hand this script can be used as a regularly cleanup.

Download the complete script here and give me your feedback.

Comments: 5  Add Comment Filed under: Umbraco | Umbraco Database |
Donnerstag, 08. Januar 2009

Cleaning versions in the umbraco db for given nodes via SQL

In this thread I want to explain how to clean the umbraco database from unnecessary versions to reduce the database size. Again I got some mails pointing me to help them to clean the database from millions of versions some nodes have. In one of the last mails Jennifer mentioned that she has 4 nodes which have each more than 900.000. This results in a size for the cmsPropertyData table of more than 1.7 GB.

Symbol-StopBefore we go on I have to mention again that this post is a "Works on my machine" post. Please be sure what you do and please, please make a backup of your database first. Also please read my post Understanding the umbraco database first to get a glue of what you are doing.

If you only have a few nodes which have millions of versions you can clean them easily via SQL, far from it it is the easiest way. In the next paragraphs I will show how you can do it by yourself.

Identifying the nodes

First of all identify the nodes which have the various versions and their ids. As in my first post described it can be done with the following SQL:

SELECT nodeId, COUNT(*) Number
FROM cmsDocument
GROUP BY nodeId
HAVING COUNT(*) > 2
ORDER BY Number DESC

Just mark the ids down.

Get the newest and the published version

Each time umbraco saves or publishes a node a new version is created in the database. But for each published version there is always a newer version in the database. But why? Umbraco is by concept working with a new version (a copy of the published version, or a copy of the last saved version) and not the one which is published. So it prevents the user to break the published version. So if you publish a node the actual version you are working on is saved as the published one, the previously published version is marked as not published, a new version is created and send back to the umbraco user to let him work with this. Each published node has a published and a newest row in cmsDocument:

umbracocmsdocument

For each id (node) you now have to identify which version is the newest and which version is the published one. I added some variables via DECLARE for typing horror:

DECLARE @id int
SET @id = 3243

DECLARE @newest UNIQUEIDENTIFIER
SET @newest = (SELECT versionId FROM cmsDocument WHERE nodeId = @id AND newest = 1)

DECLARE @published UNIQUEIDENTIFIER
SET @published = (SELECT versionId FROM cmsDocument WHERE nodeId = @id AND published = 1)

Optional: Delete the indexes

The cmsPropertyData table has four indexes for better reading performance. For deleting it slows down and let the SQL Server log file grow heavily. So if you have the possibility you can delete the indexes first like this:

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData')
DROP INDEX [IX_cmsPropertyData] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData_1')
DROP INDEX [IX_cmsPropertyData_1] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData_2')
DROP INDEX [IX_cmsPropertyData_2] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[cmsPropertyData]') AND name = N'IX_cmsPropertyData_3')
DROP INDEX [IX_cmsPropertyData_3] ON [dbo].[cmsPropertyData] WITH ( ONLINE = OFF )

You can create these entry's by a rightclick on the index in SQL Server Management Studio.

Delete the unnecessary versions

Now you can delete the not needed versions via

DELETE FROM cmsPropertyData WHERE contentNodeId = @id AND versionId NOT IN (@newest, @published)
DELETE FROM cmsContentVersion WHERE ContentId = @id AND VersionId NOT IN (@newest, @published)
DELETE FROM cmsDocument WHERE nodeId = @id AND versionId NOT IN (@newest, @published)

Be aware that these scripts can last a very long time thus you deleted the indexes (without deleting the indexes it will last hours ;-))

Recreate indexes

After deleting all unneeded versions you can now recreate the indexes:

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData] ON [dbo].[cmsPropertyData]
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData_1] ON [dbo].[cmsPropertyData]
(
    [contentNodeId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData_2] ON [dbo].[cmsPropertyData]
(
    [versionId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [IX_cmsPropertyData_3] ON [dbo].[cmsPropertyData]
(
    [propertytypeid] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

After completing the cleaning you now can reorganize or shrink your database files.

Conclusion

The complete SQL script can be downloaded here.

I hope I could show how various versions easily can be deleted via SQL if you have a growing database like in this umbraco forum post described. As mentioned before please be careful with your database and make a backup first. If you have any suggestions just write a comment or get in contact with me via mail (th |at| thoehler |dot| com).

Comments: 4  Add Comment Filed under: Umbraco | Umbraco Database |
Mittwoch, 10. Dezember 2008

Understanding the umbraco database

In the last weeks I got many mails asking for help regarding a growing umbraco database. Two months ago I helped a friend cleaning his database which was grown up to 8 GB. But be aware: working in the database is not best practice. We don't know exactly the internal relations how umbraco handles the entries in the database. We don't know what effects a direct change in the database has.

Symbol-StopRegarding this I have to say: this post is a "Works on my machine" post. Please be sure what you do and please, please make a backup of your database first.

umbracoNode

Ok, one of the main things you have to know regarding the umbraco database is the umbracoNode table. This table is the main table where all things in umbraco, regardless of which type or meaning, are stored. This means that there are content nodes, members, document types, property types, and all other things in this table. How does umbraco now identify what kind of type this row is? Is it a content node or a member or a document type? The answer is stored in the column nodeObjectType. Each type has its own GUID. The GUID is defined in the code of Umbraco. For example: a document has the GUID "c66ba18e-eaf3-4cff-8a22-41b16d66a972". It is defined in the umbraco.cms.businesslogic.web.Document class:

namespace umbraco.cms.businesslogic.web
{
    /// <summary>
    /// Document represents a webpage,
    /// type (umbraco.cms.businesslogic.web.DocumentType)
    ///
    /// Pubished Documents are exposed to the runtime/the public website in a cached xml document.
    /// </summary>
    public class Document : Content
    {
        public static Guid _objectType = new Guid("c66ba18e-eaf3-4cff-8a22-41b16d66a972");
        ...

Documents and versions

After knowing what the umbracoNode table is we can try to identify which tables are related to documents and versions. We have to look at the following tables:

  • cmsContent:
    The cmsContent table contains only the nodeids from all content documents (this means all ids from the umbracoNode where the nodeobjectType is the document guid).
  • cmsContentVersion
    The cmsContentVersion contains all definitions from all versions a node (also members etc.) has. Each version is stored with the create date and an own guid for identifying.
  • cmsContentXml
    The cmsContentXml stores the xml representation of each content node from the published version.
  • cmsDocument
    The cmsDocument stores some information for each node per version. here is defined if this version is published or if it is the newest.
  • cmsPropertyData
    The cmsPropertyData now stores for each node per version and per created "Generic Property".

You can see all these tables and their database relations here:

umbracodatabase

Identifying with which nodes you have problems

Coming back to the growing databases. You can now identify which nodes have massive publishing issues by the number of versions the nodes have. For example you can run the following SQL:

SELECT nodeId, COUNT(*) Number 
FROM cmsDocument
GROUP BY nodeId
HAVING COUNT(*) > 2
ORDER BY Number DESC 

This SQL gives you all nodeIds which have more than two versions (a published node has at minimum two versions). Check the nodes with massive numbers, unpublish it or delete it manually and the issue should stop.

I hope you got a glue on how umbraco acts in the database. If you have massive number of versions you now can identify the related nodes. In the next post I will explain how to delete the old versions manually or by my ClientTools.

Thomas

Comments: 8  Add Comment Filed under: Umbraco | Umbraco Database |

Archive


Subscribe Feed

Me on Twitter

09.07.10 20:02:  Aaaaadler ;-)

09.07.10 20:01:  Einmal Bier holen = Tor verpasst => Der Weg in den Keller ist eindeutig zu lang...

09.04.10 06:04:  @Shazwazza Done ;-) #ucomponents #Umbraco

09.03.10 20:43:  @netaddicts LOL

09.03.10 20:43:  @asp_net wie war das bei der 2010 Roadshow, Zitat: 'google das mal mit Bing'

About

It's me
This blog is written by Thomas Höhler. Living next to Frankfurt, Germany, I am trying to share my experiences about Umbraco [the most flexible CMS I know], my ClientTools for Umbraco, and some other more or less usefull stuff

XING

Latest Posts

Design & Masterpage by 69° media solutions