Larry Steinle

November 3, 2013

Using the Command Pattern to Store Versionable Data


In the article, Create a Persistent Data Structure, we learned the fundamental theories to provide versionable data management within a database. Today’s article moves out of the abstract theory and into a working code sample demonstrating how to create a versionable database using the command pattern.

Background

Data ages. Each new change to a database changes the meaning of the data. In some cases it is enlightening to be able to view the data as it existed at a previous point-in-time. That’s the purpose of a persistent data structure. To provide the ability to access the data as it currently exists, as it existed a week ago or even years ago.

There are numerous articles on the internet describing complex relational strategies to accomplish the goals of a relational data structure. However, in all the articles I have read over the years not one single article considered using a software code pattern. In the article, Create a Persistent Data Structure, I reviewed the standard database design strategies and then suggested a very simple solution…Use the command pattern.

The command pattern offers the ability to store historical data more efficiently, provides a fully auditable system while introducing the capability to undo/redo any given action without any extra effort!

Strategy

The command pattern is simple. Instead of storing the data we store the action. In one sense we are creating our own transaction log. Using the custom stored transaction log we can rebuild the database structure at any desired interval. It’s that simple.

This article will introduce a simple demonstration of how to use the command pattern to create a persistent data structure. The article will demonstrate the basics. Additional work will be required to make this suitable to any project. For example, in this article I will use a single table to store all the actions while your own system may require this ability to be split up among multiple tables. Perhaps you will need a relational table version that stores the current data set with the transactional commands stored in another table to provide the ability to access historical information. This way the system remains fast and responsive for 90% of the activities and when the historical information is needed the slower transactions can be reconstructed to get access to the data as it existed in the past.

The sole point of this article is to demonstrate a very simple solution to the persistent data structure problem. It is my sincere hope that this article will provide fresh perspective into a decades-old strategy of coding. Namely, don’t look at a database just as a set of related data. Don’t look at data just in terms of storage. Instead, open your eyes and look at data as a living, breathing element in the database. Information that flows thru a process. A process that itself can be stored in the system bringing data to life.

Let’s begin!

The Challenge

The example data set that we will work with is a binary tree. The tree that will be stored in the system is based on the animal class relationship.

A binary tree structure provides an additional challenge to a relational data model. It represents one of the most difficult to version because any folder change affects all child folders. The command pattern comes to the rescue making it extremely simple to create a versionable data tree.

Storing Actions; Not Data

There are three actions that we will track for our example today:

  • Adding a node to the tree,
  • Removing a node from the tree, and
  • Adding a version label to quickly and easily access the tree as it existed at a given point-in-time.

These three actions are enough to demonstrate the power of the command pattern. Of course in your own solution you will need track update changes and perhaps other customer actions. But for today, simply being able to add and delete is enough.

We begin by create a Commands table that will host the supported actions that we will track in our persistent database.

SQL Server Code Sample

IF OBJECT_ID('dbo.Commands', 'U') IS NOT NULL DROP TABLE dbo.Commands;
CREATE TABLE dbo.Commands (
  CommandID   INTEGER     NOT NULL,
  CommandName VARCHAR(50) NOT NULL
);
INSERT INTO dbo.Commands (CommandID, CommandName)
VALUES (1, 'CreateNode'),
       (2, 'DeleteNode'),
	   (3, 'VersionNode');

Oracle 10g Code Sample

BEGIN
  EXECUTE IMMEDIATE 'DROP TABLE Commands PURGE';
EXCEPTION WHEN OTHERS THEN NULL;
END;
/
CREATE TABLE Commands (
  CommandID   INT         NOT NULL,
  CommandName VARCHAR(50) NOT NULL
);
INSERT INTO Commands (CommandID, CommandName) VALUES (1, 'CreateNode');
INSERT INTO Commands (CommandID, CommandName) VALUES (2, 'DeleteNode');
INSERT INTO Commands (CommandID, CommandName) VALUES (3, 'Version');

Now that the supported commands are defined we need a table to track the individual actions. This command log is a type of transaction log of all changes made to the database structure we want to persist.

SQL Server Code Sample

IF OBJECT_ID('dbo.CommandLog', 'U') IS NOT NULL DROP TABLE dbo.CommandLog;
CREATE TABLE dbo.CommandLog (
  CommandLogID  INTEGER          IDENTITY(1,1) PRIMARY KEY,
  EffectiveDate DATETIME         NOT NULL DEFAULT GETDATE(),
  CommandID     INTEGER          NOT NULL,
  ParentID      UNIQUEIDENTIFIER,
  NodeID        UNIQUEIDENTIFIER,
  OriginalID    UNIQUEIDENTIFIER,
  Value         VARCHAR(MAX)
);
GO

Oracle 10g Code Sample

BEGIN
  EXECUTE IMMEDIATE 'DROP TABLE CommandLog PURGE';
EXCEPTION WHEN OTHERS THEN NULL;
END;
/
CREATE TABLE CommandLog (
  CommandLogID  RAW(16)  DEFAULT SYS_GUID() NOT NULL,
  EffectiveDate DATE     DEFAULT SYSDATE NOT NULL,
  CommandID     INTEGER  NOT NULL,
  ParentID      RAW(16),
  NodeID        RAW(16),
  OriginalID    RAW(16),
  Value         VARCHAR2(4000)
);

If we wanted an to audit all actions in our system we could simply add a field to the CommandLog to track which user performed the action. If we wanted undo/redo capability we could add an active bit field that would disable the last command, or re-enable the last command. Any disabled actions then would be excluded when constructing the data structure.

The CommandLog provides a centralized table to manage the actions the user wants to take against the data structure.

Helper Functions

For each supported command we will create a separate function. These functions will ensure that the data is stored in our system consistently and correctly.

  • The CreateNode function will be responsible for adding nodes to the tree. It will support adding the node at the root level or under any branch node in the list.
  • The DeleteNode function will be responsible for removing a node and its child nodes from the tree.
  • Finally, the AddVersion function will provide a stamp making it easier to remember important points in the construction of the tree. Versions provide the ability to retrieve the data as it exists at the point-in-time the version was attached to the tree.

SQL Server Code Sample

IF OBJECT_ID('dbo.CreateNode', 'P') IS NOT NULL DROP PROCEDURE dbo.CreateNode;
GO
CREATE PROCEDURE dbo.CreateNode
  @nodeName VARCHAR(50),
  @parentId UNIQUEIDENTIFIER = NULL,
  @nodeId   UNIQUEIDENTIFIER OUTPUT
AS BEGIN
  SET @nodeId = NEWID();
  INSERT INTO dbo.CommandLog (CommandID, ParentID, NodeID, Value)
  VALUES (1, @parentId, @nodeId, @nodeName);
END;
GO
IF OBJECT_ID('dbo.DeleteNode', 'P') IS NOT NULL DROP PROCEDURE dbo.DeleteNode;
GO
CREATE PROCEDURE dbo.DeleteNode
  @nodeId UNIQUEIDENTIFIER
AS BEGIN
  INSERT INTO dbo.CommandLog (CommandID, NodeID)
  VALUES (2, @nodeId);
END;
GO
IF OBJECT_ID('dbo.AddVersion', 'P') IS NOT NULL DROP PROCEDURE dbo.AddVersion;
GO
CREATE PROCEDURE AddVersion
  @labelName VARCHAR(50)
AS BEGIN
	INSERT INTO CommandLog (CommandID, Value)
	VALUES (3, @labelName);
END;
GO

Oracle 10g Code Sample

BEGIN
  EXECUTE IMMEDIATE 'DROP PACKAGE BODY Persister';
EXCEPTION WHEN OTHERS THEN NULL;
END;
/
BEGIN
  EXECUTE IMMEDIATE 'DROP PACKAGE Persister';
EXCEPTION WHEN OTHERS THEN NULL;
END;
/
CREATE OR REPLACE PACKAGE Persister IS

  PROCEDURE CreateNode(
    nodeName IN  VARCHAR2,
    parentId IN  RAW DEFAULT NULL,
    nodeId   OUT RAW
  );

  PROCEDURE DeleteNode(nodeId IN RAW);
  PROCEDURE AddVersion(labelName IN VARCHAR2);
END Persister;
/
CREATE OR REPLACE PACKAGE BODY Persister IS

  PROCEDURE CreateNode(
    nodeName IN  VARCHAR2,
    parentId IN  RAW DEFAULT NULL,
    nodeId   OUT RAW
  ) IS BEGIN
    nodeId := SYS_GUID();
    INSERT INTO CommandLog (CommandID, ParentID, NodeID, Value)
    VALUES (1, parentId, nodeId, nodeName);
  END CreateNode;

  PROCEDURE DeleteNode(nodeId IN RAW)
  IS BEGIN
    INSERT INTO CommandLog (CommandID, NodeID)
    VALUES (2, nodeId);
  END;

  PROCEDURE AddVersion(labelName IN VARCHAR2)
  IS BEGIN
    INSERT INTO CommandLog (CommandID, Value)
    VALUES (3, labelName);
  END;
END Persister;

Use a Facade

It is a standard security measure to wrap the management of our data inside procedures. Procedures provide additional safety against SQL Code Injection attacks as the data to be passed into the database engine is cast to a specific data type. By using arguments and parameters it is much more difficult for a hacker to inject his own code into our system.

This wrapper around the data manipulation statements is a facade. The procedures hide how the system will store the data establishing a common interface for how all external systems can persist or retrieve data from the database.

With our animal-relationship example, we should have separate procedures for adding and removing kingdom, phylum, class, order, family, genus and specific ethet data types to and from our database. These separate procedures would be facades around the helper procedures just created. When adding a new kingdom animal group to our system we would call CreateKingdom passing in the name of the kingdom. The CreateKingdom procedure would then call the CreateNode routine. This facade permits external systems to connect to our persistent data structure like any normal relational data structure.

Again, this example is meant to be simple. In our example I will not be creating facade routines. I merely mention it as in a real-world environment this would be a wise addition to hide the complexities we are introducing with our persistent data structure.

Adding Sample Data

Now to use the data from the animal class relationship diagram to create our data tree structure.

In our example data we will enter an incorrect structure. Then we will remove the errors and enter the correct structure. We will version the changes at appropriate points to simplify accessing the data as it exists at that point-in-time.

SQL Server Code Sample

DECLARE @Parent  UNIQUEIDENTIFIER;
DECLARE @Child   UNIQUEIDENTIFIER;
DECLARE @Felidae UNIQUEIDENTIFIER;
DECLARE @Canidae UNIQUEIDENTIFIER;
DECLARE @Ursidae UNIQUEIDENTIFIER;

EXECUTE dbo.CreateNode @nodeName = 'Animalia', @nodeId = @Child OUTPUT;
EXECUTE dbo.AddVersion @labelName = 'Kingdom';
SET @Parent = @Child;
SET @Child = NULL;

EXECUTE dbo.CreateNode @nodeName = 'Chordata', @parentId = @Parent, @nodeId = @Child OUTPUT;
EXECUTE dbo.AddVersion @labelName = 'Phylum';
SET @Parent = @Child;
SET @Child = NULL;

EXECUTE dbo.CreateNode @nodeName = 'Mammalia', @parentId = @Parent, @nodeId = @Child OUTPUT;
EXECUTE dbo.AddVersion @labelName = 'Class';
SET @Parent = @Child;
SET @Child = NULL;

EXECUTE dbo.CreateNode @nodeName = 'Carnivora', @parentId = @Parent, @nodeId = @Child OUTPUT;
EXECUTE dbo.AddVersion @labelName = 'Order';
SET @Parent = @Child;
SET @Child = NULL;

EXECUTE dbo.CreateNode @nodeName = 'Felidae', @parentId = @Parent, @nodeId = @Felidae OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Canidae', @parentId = @Parent, @nodeId = @Canidae OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Ursidae', @parentId = @Parent, @nodeId = @Ursidae OUTPUT;
EXECUTE dbo.AddVersion @labelName = 'Family';
  
EXECUTE dbo.CreateNode @nodeName = 'Catus', @parentId = @Felidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Pardalis', @parentId = @Felidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Perdus', @parentId = @Felidae, @nodeId = @Child OUTPUT;

EXECUTE dbo.CreateNode @nodeName = 'Familiaris', @parentId = @Canidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Lupus', @parentId = @Canidae, @nodeId = @Child OUTPUT;

EXECUTE dbo.CreateNode @nodeName = 'Arctos', @parentId = @Ursidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Horribilus', @parentId = @Ursidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.AddVersion @labelName = 'Genus';

EXECUTE dbo.DeleteNode @nodeId = @Felidae;
EXECUTE dbo.DeleteNode @nodeId = @Canidae;
EXECUTE dbo.DeleteNode @nodeId = @Ursidae;
EXECUTE dbo.AddVersion @labelName = 'Remove Incorrect Associations';

EXECUTE dbo.CreateNode @nodeName = 'Felidae', @parentId = @Parent, @nodeId = @Felidae OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Canidae', @parentId = @Parent, @nodeId = @Canidae OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Ursidae', @parentId = @Parent, @nodeId = @Ursidae OUTPUT;

EXECUTE dbo.CreateNode @nodeName = 'Felis', @parentId = @Felidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Panthera', @parentId = @Felidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Canis', @parentId = @Canidae, @nodeId = @Child OUTPUT;
EXECUTE dbo.CreateNode @nodeName = 'Ursus', @parentId = @Ursidae, @nodeId = @Child OUTPUT;

EXECUTE dbo.AddVersion @labelName = 'Genus Correction';

Oracle 10g Code Sample

DECLARE 
  vParent  RAW(16);
  vChild   RAW(16);
  vFelidae RAW(16);
  vCanidae RAW(16);
  vUrsidae RAW(16);
BEGIN
  Persister.CreateNode(nodeName => 'Animalia', nodeId => vChild);
  Persister.AddVersion('Kingdom');
  vParent := vChild;
  vChild := NULL;

  Persister.CreateNode(nodeName => 'Chordata', parentId => vParent, nodeId => vChild);
  Persister.AddVersion('Phylum');
  vParent := vChild;
  vChild := NULL;

  Persister.CreateNode(nodeName => 'Mammalia', parentId => vParent, nodeId => vChild);
  Persister.AddVersion('Class');
  vParent := vChild;
  vChild := NULL;

  Persister.CreateNode(nodeName => 'Carnivora', parentId => vParent, nodeId => vChild);
  Persister.AddVersion('Order');
  vParent := vChild;
  vChild := NULL;

  Persister.CreateNode(nodeName => 'Felidae', parentId => vParent, nodeId => vFelidae);
  Persister.CreateNode(nodeName => 'Canidae', parentId => vParent, nodeId => vCanidae);
  Persister.CreateNode(nodeName => 'Ursidae', parentId => vParent, nodeId => vUrsidae);
  Persister.AddVersion('Family');
  
  Persister.CreateNode(nodeName => 'Catus', parentId => vFelidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Pardalis', parentId => vFelidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Perdus', parentId => vFelidae, nodeId => vChild);

  Persister.CreateNode(nodeName => 'Familiaris', parentId => vCanidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Lupus', parentId => vCanidae, nodeId => vChild);

  Persister.CreateNode(nodeName => 'Arctos', parentId => vUrsidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Horribilus', parentId => vUrsidae, nodeId => vChild);
  Persister.AddVersion('Genus');

  Persister.DeleteNode(nodeId => vFelidae);
  Persister.DeleteNode(nodeId => vCanidae);
  Persister.DeleteNode(nodeId => vUrsidae);
  Persister.AddVersion('Remove Incorrect Associations');

  Persister.CreateNode(nodeName => 'Felidae', parentId => vParent, nodeId => vFelidae);
  Persister.CreateNode(nodeName => 'Canidae', parentId => vParent, nodeId => vCanidae);
  Persister.CreateNode(nodeName => 'Ursidae', parentId => vParent, nodeId => vUrsidae);

  Persister.CreateNode(nodeName => 'Felis', parentId => vFelidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Panthera', parentId => vFelidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Canis', parentId => vCanidae, nodeId => vChild);
  Persister.CreateNode(nodeName => 'Ursus', parentId => vUrsidae, nodeId => vChild);

  Persister.AddVersion('Genus Correction');
END;

Retrieving the Data Tree

Now that the activities against the database are stored in the system we need to execute each command to recreate the animal class binary tree. Both SQL Server and Oracle provide effective commands to help with recursive patterns like those found in binary trees. We will use these commands to generate the animal class binary tree.

SQL Server Code Sample

DECLARE @VersionId INTEGER;
SELECT @VersionId = CommandLog.CommandLogID
  FROM CommandLog
 WHERE CommandLog.CommandID = 3
   AND CommandLog.Value = 'Genus';

WITH CodeBinarySearchTree (EffectiveDate, RootID, NodeID, ParentID, NodeName, FullName, Level) AS (
  SELECT CommandLog.EffectiveDate,
         CommandLog.NodeID AS RootID,
         CommandLog.NodeID,
         CommandLog.ParentID,
         CommandLog.Value AS NodeName,
         CommandLog.Value AS FullName,
         0 AS Level
    FROM dbo.CommandLog
   WHERE CommandLog.CommandID = 1
     AND CommandLog.ParentID IS NULL
	 AND CommandLog.CommandLogID <= @VersionId
     AND NOT EXISTS(SELECT 1 FROM dbo.CommandLog AS DeletedLog WHERE DeletedLog.CommandID = 2 AND DeletedLog.NodeID = CommandLog.NodeID AND DeletedLog.CommandLogID < @VersionId)
   UNION ALL
  SELECT CommandLog.EffectiveDate,
         CBST.RootID AS RootID,
         CommandLog.NodeID,
         CommandLog.ParentID,
         CommandLog.Value AS NodeName,
         CBST.FullName + '.' + CommandLog.Value AS FullName,
         Level + 1 AS Level
    FROM dbo.CommandLog
    JOIN CodeBinarySearchTree CBST ON CBST.NodeID = CommandLog.ParentID
   WHERE CommandLog.CommandID = 1
	 AND CommandLog.CommandLogID <= @VersionId
     AND NOT EXISTS(SELECT 1 FROM dbo.CommandLog AS DeletedLog WHERE DeletedLog.CommandID = 2 AND DeletedLog.NodeID = CommandLog.NodeID AND DeletedLog.CommandLogID < @VersionId)
)

SELECT *
  FROM CodeBinarySearchTree
 ORDER BY FullName;

Oracle 10g Code Sample

SELECT FilteredLog.EffectiveDate,
       CONNECT_BY_ROOT FilteredLog.NodeId AS RootId,
       FilteredLog.NodeId,
       FilteredLog.ParentId,
       FilteredLog.Value AS NodeName,
       SUBSTR(SYS_CONNECT_BY_PATH(FilteredLog.Value, '.'), 2) AS FullName,
       LEVEL
  FROM (
    SELECT CommandLog.*
      FROM CommandLog
     WHERE RowNum < (
            SELECT Row_Id
              FROM (SELECT ROWNUM AS Row_Id, NodeId, CommandId, Value FROM CommandLog) IndexedLog
             WHERE IndexedLog.CommandId = 3
               AND IndexedLog.Value = 'Genus Correction'
           )
  ) FilteredLog
  START WITH ParentId IS NULL AND CommandId = 1
  CONNECT BY PRIOR NodeId = ParentId
 AND NOT EXISTS(
     SELECT 1
       FROM (
        SELECT CommandLog.*
          FROM CommandLog
         WHERE RowNum < (
                SELECT Row_Id
                  FROM (SELECT ROWNUM AS Row_Id, NodeId, CommandId, Value FROM CommandLog) IndexedLog
                 WHERE IndexedLog.CommandId = 3
                   AND IndexedLog.Value = 'Genus Correction'
               )
      ) DeletedLog
      WHERE DeletedLog.CommandId = 2
        AND DeletedLog.NodeId = FilteredLog.NodeId
   );

Now we have access to versionable data in a binary tree! If we run the queries using the “Genus” label we will get the following results:

Used the 'Genus' Version

Animal Family Before Data Correction

And if we run the query with the “Genus Correction” label we will get the following results:

Used the 'Genus Correction' Version

Animal Family After Data Correction

Or substitute the query with another label to see how the data looked at that point-in-time! The point is that we have an audit of all actions taken on the system in the CommandLog table and we are able to access the data structure at any desired interval.

Summary

Since we are storing actions instead of data associations the relationships are all logical. Normal indexes and keys do not apply with this strategy. Every solution has it’s benefits and costs, it’s capabilities and limitations. As stated earlier, you may want to consider using a traditional data-entity structure along with this persistent structure to get the benefits of both worlds. In light of these limitation the command pattern does provide a much easier to manage persistent data structure storage mechanism.

In closing, the next time your designing a database, in addition to looking at 3rd Form Normalization and Entity-Relationships, take a moment to consider patterns. In addition to the data-structure take a moment to consider the data-flow.

Happy Coding!

Advertisement

1 Comment »

  1. […] article, Using the Command Pattern to Store Versionable Data, provides a simple example demonstrating how to implement a persistent data storage system with the […]

    Pingback by Create a Persistent Data Structure | Larry Steinle — November 11, 2013 @ 8:47 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: