Project

General

Profile

CDM Versioning

The CdmLibrary is supposed to support versioning of CDM data. The simplest approach to create a new version would be to copy every current object in the CDM and leave the old ones untouched. This would effectively mean a copy of the entire database for every single version, because nearly all objects in the CDM are related to each other in some ways. Obviously this is not a scalable solution. We will have to address versioning therefore on the atomic domain objects. The following graphic shows the complexity and dependencies of a single change to a Person in the CDM:

Version In Time - Views

For a single point in time a consistent complex object can be assembled based on the multiple versions stored. We refer to it as a view into historic versions.

The following example shows a change to a name object which is part of a taxon and itself has a reference and author object attached. Note that the same author object is used by the name and by the reference object (the name in fact caches the author object which in a fully normalised system should only be attached to the nomenclatural reference):

The system therefore can extract 2 views on the complex taxon object. One with the old name and one that includes the new version of it:

Unidirectional Relations

If we change the name N alone but leave all other objects untouched, the new name version will not be part of the taxon T as there is a unidirectional relationship from taxon to name. So we have to update the name property of taxon T too.

In case the name N was shared by 2 taxa T1 and T2, both taxa would have to be updated for the change of the name N to be propagated everywhere. A new version of an object should not automatically be applied to all other previously related objects. A modified name N' will therefore not be applied to all taxa using that name without manual interaction, i.e. agreement from the user that owns the taxon T2.

Versioning and this non-automatic propagation of updated objects therefore leads to very similar effects as denormalisation. With the exception that there is a clear link between the versions so an automatic update could be done at any time.

Bidirectional Relations

A worse scenario are bidirectional relations. Because a change on one side would trigger a change on the other side, effectively all connected objects would have to be versioned.

Accessing versions. DAO Implementations

Access to CDM objects in the database is done by Data Access Objects (DAOs). It would be elegant to implement versioning entirely outside of the domain model. That would allow us to create pure model POJO classes without the need to worry about versioning. Just the DAO methods would need to know which View that should return; with the latest version being the default if no View was supplied.

Domain class methods like getTaxonName() should not need to know (i.e. have a parameter) which view they are working on. If a taxon has different name versions in time, how does the taxon object know which to return? The method would have to accept a view parameter or store the view in the taxon object. But as the identical, persistent taxon object is used for several views, it cannot be stored in the persistent taxon object.

It therefore seems best to create transient copies of historic versions and only allow the current view to be persistent, i.e updateable. Having read-only historic versions is no problem, in fact they should be! The assembling of such transient deep copies would have to be done in the DAO layer. As those copies are transient, the complex object boundaries have to be defined as part of the DAO method too, as no further lazy loading will be possible. Domain method calls to related objects which were not immediately loaded will get NULL.

Changes To Datamodel

There are different ways to store versions. Older version are of the same class as the current ones, so they will be stored in the same database tables by Hibernate. But the way old version are related to each other and the current one offers several possibilities:

Many-Many Relations

This solution modifies the referencing side of each domain model class and keeps a list of all versions for a certain property. That means converting all properties to lists or adding an additional list property for each existing property. For example consider the name property of the Cdm:taxon:TaxonBase class:

class TaxonBase extends VersionableEntity
    TaxonNameBase name; 
    Set<TaxonNameBase> nameVersions = new HashSet<TaxonNameBase>();; 

the name property is the pointer to the current name, while the nameVersions property is a set pointing to all version of the name object that are valid for this taxon. This construction allows for relatively fast database access to a complex object view, as it can be loaded in basically one sql statement. It means quite a lot of extra coding, as all properties have to be duplicated and probably regular setter/getter methods have to be adapted as well.

Linked Lists

See Cdm:common:VersionableEntity previous and later property. This essentially is a linked list pointing to the next and previous version for this object. Generic classes allow this properties to be implemented in the VersionableEntity base class once and for all keeping type safety.

class VersionableEntity<T extends VersionableEntity>
    T nextVersion;
    T previousVersion;

Version Array

Cdm:common:VersionableEntity could have a version array property in addition to a pointer to the latest version. As with linked lists this is implemented in VersionableEntity once.

class VersionableEntity<T extends VersionableEntity>
    T currentVersion;
    List<T> allVersions = new ArrayList<T>();

Bidirectional CDM Relations

Versioning bidirectional relations could cause problems as a change to any side will change the object n both sides, propagating a change.

Creating New Versions

Does a new version become a new object (in Hibernate sense) or does the old version become a new object?

Creating new current versions are bad for desktop applications like the TaxonomicEditor, as they need to destroy all their objects, use the new ones and reregister listeners and other GUI components.

Creating a new old version means updating a lot of unchanged, existing objects as they now point to different object than before.

Unversioned CDM Classes

Parts of the CommonDataModel would not need to be versioned. This probably applies to Cdm:common:DefinedTermBase and all relationship classes that currently do not inherit from a common superclass (but probalby should).

version_complex_objects.png View (289 KB) Markus Döring, 02/27/2008 06:03 PM

name_change.png View (7.27 KB) Markus Döring, 02/28/2008 01:35 PM

taxon_versions.png View (8.97 KB) Markus Döring, 02/28/2008 01:36 PM

taxon2_versions.png View (11.7 KB) Markus Döring, 02/28/2008 01:36 PM

version_bidirect.png View (10.2 KB) Markus Döring, 02/28/2008 01:36 PM

version_unidirect.png View (8.69 KB) Markus Döring, 02/28/2008 01:36 PM

Add picture from clipboard (Maximum size: 40 MB)