Larry Steinle

October 19, 2013

Create a Persistent Data Structure


Occasionally I encounter a business need to track all changes allowing a user to view the data as it currently is stored or as it was stored. Having the ability to view the structure of a system from a previous point of reference is called a historical database or a persistent data structure. In this article we will review various strategies to create a persistent data structure. In the next article I will demonstrate how to create a persistent data structure.

Approaches to Accessing Historical Data

The primary objective of a persistent data structure is to provide the ability to access the data structure at a specified interval.

An interval provides access to a specific historical set of data based on one of the following methods:

  • Time-based: Allows the user to view the structure and data as it appeared at a specified date/time in the past.
  • Version-based: At a given interval the structure and data is assigned a version or a label that allows the user to more easily locate a specific instance of the data.
  • Snapshot-based: Provides a means to store the data as it currently exists for future reference.

An audit-able database tracks what happened at a specified moment and time. An audit-able database is not a persistent data structure. It merely reports that at a specified moment a specific change was made to the system. An audit does not provide the ability to get the data as it looked at a specified interval. An audit simply states what happened at a specified moment and time.

The windows event log is an example of an audit. The event log informs you of what happened at a given time but you can’t restore the system to that point and time. The windows recovery point is an example of a repository system because you can reset your computer back to the state it was in at a previous stored moment. Windows recovery is a snapshot interval-based repository database.

Repository Types

A partially persistent system provides read access to older versions of the system and write access to the current version. A fully persistent system provides read/write access to all versions of the system.

Strategies

In various articles the following strategies have been documented for a persistent data structure:

  • Path-copying Strategy: Copies the entire path to reproduce the structure at a specified interval.
  • (Fat) Node-copying Strategy: Copies a single node at a time while retaining old information.
  • Combination of path-copying and node-copying strategy: Here the architect attempts to combine the best features of both path-copying and node-copying to create a more responsive, easier to manage system.

The snapshot interval system is a type of path-copying repository. A version interval system is a type of node-based repository. The time interval may be either a path-copying or node-based repository or a combination of the two.

For detailed information about these three strategies refer to the following articles:

Path-copying Method

In the path-copying method all of the data at a specified path node and child path nodes must be copied. Want to change a single value in a node. Well, that node and all child nodes must be copied, versioned and updated. This consumes a large amount of memory over time but does make it easier to get back to a specified interval.

Node-copying Method

In the node-copying method each change must be tracked with the node and be reconstructed at a moments notice. Over time the node grows very large as more and more changes are tracked. Hence the addition of the leading word, “fat,” added to the name in Wikipedia’s article.

Why Are Databases Different Than Programming Languages?

What I fail to understand and find quite alarming is how a database architect lacks many of the tools of a programmer. For some reason the database architect forgets that even SQL is a type of programming language. SQL architects are often forgiven for violating basic programming tenants as long as they adhere to normalization rules, foreign-key constraints and server management practices.

SQL is a very powerful database system. Once database architects stop looking at their databases as simply data-structure storage and begin to see it as a language for storing data then a world of opportunity becomes available. The same principles, best practices, guidelines, methodologies and patterns that apply to Java, VB.Net, C#.Net, JavaScript or any other programming language equally apply in the world of databases.

Thus I am going to introduce yet another option for creating a persistent data-structure that originates from the gang-of-four’s patterns.

The Command Pattern

In the programming world the command pattern introduces a way to structure an application so that new functionality can easily be added making a system extensible. When this pattern is implemented correctly it inherently provides undo/redo functionality.

The command pattern operates on the principle that all “command” classes must implement a specific function usually called execute. When undo functionality is needed another function called rollback is added.

Each command has the logic to analyze the data updating it or acting on it as needed to produce the desired output or affect. Each command supports one and only one capability. The combination of a set of commands then provides one or more features needed to support the system. Whenever a new behavior is required of the system simply create a new command (or set of commands) adding them to the system.

Command-based Persistent Data Structure

The command pattern is inherently perfect for a persistent data structure. Instead of thinking of data as simply a structure or object, think of it as a set of actions. All database designers are familiar with the CRUD principle. Data must be Created, Read, Updated and Deleted. These are commands. The CRUD principle is a type of command pattern.

Instead of creating routines to perform the changes, store the action to take against the data. For example, you want to create a new record? Instead of actually creating the record store the action to create a record. Want to update a record? Instead of actually changing the record store the request to change the record.

Benefits

The command-based repository offers the following advantages over path-copying and node-copying repositories:

  • The command-based repository supports both full and partial data persistence.
  • Automatic logging. As the command pattern stores the actions the user is taking instead of changing the data we automatically inherit an audit of our changes that can be referenced at any time.
  • Reduced memory footprint. Storing the action consumes much less memory than coping a path or appending changes to a node.
  • Detailed. With the command pattern we can access the structure at any change interval. The change interval is much more detailed than even the time-interval as many changes may be recorded at the same moment in time.
  • Supports all interval types.
    • At any moment we can version the data. In the command-based repository a snapshot is simply a label added to the data. In fact, both versioning and snapshots become commands acting against our data!
    • Each action can be recorded with a time-stamp for time-interval data access.
  • Inherit the benefits of the proven command pattern.
    • Take advantage of a clear, structured strategy for adding new behaviors.
    • Want to rollback to a specified point and time? Simply delete the list of actions recorded from that interval forward!
  • Best of all the command pattern is very simple to implement with a small code print.

Summary

Today’s article reviewed the methodologies for storing detailed historical data in a persistent data structure. We learned various methods for accessing the historical data at a specified interval. We covered path-copying and node-copying strategies for building a persistent data structure. I introduced the idea of applying the command-pattern to more easily create a persistent data structure with full-membership benefits associated with the command-pattern.

The article, Using the Command Pattern to Store Versionable Data, provides a simple example demonstrating how to implement a persistent data storage system with the command pattern.

Happy coding!

Advertisement

1 Comment »

  1. Reblogged this on Sutoprise Avenue, A SutoCom Source.

    Comment by SutoCom — October 23, 2013 @ 1:49 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: