Mittwoch, 10. Dezember 2008

Understanding the umbraco database

In the last weeks I got many mails asking for help regarding a growing umbraco database. Two months ago I helped a friend cleaning his database which was grown up to 8 GB. But be aware: working in the database is not best practice. We don't know exactly the internal relations how umbraco handles the entries in the database. We don't know what effects a direct change in the database has.

Symbol-StopRegarding this I have to say: this post is a "Works on my machine" post. Please be sure what you do and please, please make a backup of your database first.

umbracoNode

Ok, one of the main things you have to know regarding the umbraco database is the umbracoNode table. This table is the main table where all things in umbraco, regardless of which type or meaning, are stored. This means that there are content nodes, members, document types, property types, and all other things in this table. How does umbraco now identify what kind of type this row is? Is it a content node or a member or a document type? The answer is stored in the column nodeObjectType. Each type has its own GUID. The GUID is defined in the code of Umbraco. For example: a document has the GUID "c66ba18e-eaf3-4cff-8a22-41b16d66a972". It is defined in the umbraco.cms.businesslogic.web.Document class:

namespace umbraco.cms.businesslogic.web
{
    /// <summary>
    /// Document represents a webpage,
    /// type (umbraco.cms.businesslogic.web.DocumentType)
    ///
    /// Pubished Documents are exposed to the runtime/the public website in a cached xml document.
    /// </summary>
    public class Document : Content
    {
        public static Guid _objectType = new Guid("c66ba18e-eaf3-4cff-8a22-41b16d66a972");
        ...

Documents and versions

After knowing what the umbracoNode table is we can try to identify which tables are related to documents and versions. We have to look at the following tables:

  • cmsContent:
    The cmsContent table contains only the nodeids from all content documents (this means all ids from the umbracoNode where the nodeobjectType is the document guid).
  • cmsContentVersion
    The cmsContentVersion contains all definitions from all versions a node (also members etc.) has. Each version is stored with the create date and an own guid for identifying.
  • cmsContentXml
    The cmsContentXml stores the xml representation of each content node from the published version.
  • cmsDocument
    The cmsDocument stores some information for each node per version. here is defined if this version is published or if it is the newest.
  • cmsPropertyData
    The cmsPropertyData now stores for each node per version and per created "Generic Property".

You can see all these tables and their database relations here:

umbracodatabase

Identifying with which nodes you have problems

Coming back to the growing databases. You can now identify which nodes have massive publishing issues by the number of versions the nodes have. For example you can run the following SQL:

SELECT nodeId, COUNT(*) Number 
FROM cmsDocument
GROUP BY nodeId
HAVING COUNT(*) > 2
ORDER BY Number DESC 

This SQL gives you all nodeIds which have more than two versions (a published node has at minimum two versions). Check the nodes with massive numbers, unpublish it or delete it manually and the issue should stop.

I hope you got a glue on how umbraco acts in the database. If you have massive number of versions you now can identify the related nodes. In the next post I will explain how to delete the old versions manually or by my ClientTools.

Thomas

back to overview Comments: 8

Comments

  1. Daniel wrote at Thu, 11 Dec 2008 12:48:50 GMTGreat post! Nice introduction to the Umbraco database design. Looking forward to the next post.
  2. Ruben Verborgh wrote at Thu, 11 Dec 2008 18:25:08 GMTHi Thomas, I'm glad you finally have a blog. Just started mine about a month ago... Can I subscribe to an RSS feed? Ruben
  3. Thomas Höhler wrote at Fri, 12 Dec 2008 09:58:42 GMTHi Ruben, the RSS isn't implmented yet, but it will come asap. Thomas
  4. Thomas Höhler wrote at Wed, 17 Dec 2008 16:33:48 GMTRSS is now integrated. I have only some sorting problems where I will look into it the next week. Thomas
  5. Ruben Verborgh wrote at Wed, 17 Dec 2008 16:35:26 GMTSubscribed! I really like your RSS icon ;)
  6. Thomas Höhler wrote at Wed, 17 Dec 2008 16:41:47 GMTThe icon is also included in the masterpage design from 69°: (http://www.69grad.de/171/en/community/masterpages.html). But you are right: it is really neat. I did choose this design only because of this icon ;-)
  7. Jessica Hart wrote at Tue, 06 Jan 2009 01:46:03 GMTGreat article. When will your post appear on how to delete the nodes manually? I currently have a database which has four nodes all with more than 900,000 versions!!!
  8. Hendy Racher wrote at Sun, 18 Jan 2009 19:53:22 GMTHi Thomas, Just wanted to say thanks for this article - it prompted me to take a closer look at the db to see if I could figure out how it works too. Look forward to the next version of your client tools. if you had a moment I don't suppose you could have a look at the constraints mentioned in this post (http://blog.hendyracher.co.uk/umbraco-database-part-1/) to see if they tally up with your view of the data ? Thanks, Hendy

Add a Comment







Archive


Subscribe Feed

Me on Twitter

02.04.12 11:50:  wpi feed wurde noch nicht aktualisiert

02.04.12 10:31:  Yihaa, ich sehe Schnee

02.04.12 09:52:  @Kilroy576 winke winke zurück und danke

02.03.12 20:51:  +1 RT @attack_monkey: @umbraco stalker.Umbraco.com

02.03.12 20:51:  Das passiert wenn man Kinder mal unbeaufsichtigt lässt: http://t.co/p2HrAPm0 #monster

About

It's me
This blog is written by Thomas Höhler. Living next to Frankfurt, Germany, I am trying to share my experiences about Umbraco [the most flexible CMS I know], my ClientTools for Umbraco, and some other more or less usefull stuff

XING

Latest Posts

Design & Masterpage by 69° media solutions