Project

General

Profile

CdmVersioning » History » Version 17

Markus Döring, 02/28/2008 03:06 PM

1 1 Markus Döring
{{>toc}}
2
3
4
5
6
# CDM Versioning
7
8 3 Markus Döring
The [[CdmLibrary]] is supposed to support versioning of CDM data. The simplest approach to create a new version would be to copy every _current_ object in the CDM and leave the old ones untouched. This would effectively mean a copy of the entire database for every single version, because nearly all objects in the CDM are related to each other in some ways. Obviously this is not a scalable solution. We will have to address versioning therefore on the atomic domain objects. The following graphic shows the complexity and dependencies of a single change to a Person in the CDM:
9 1 Markus Döring
10
11 2 Markus Döring
![](version_complex_objects.png)
12 1 Markus Döring
13
14 4 Markus Döring
## Version In Time. A View
15 1 Markus Döring
16 12 Markus Döring
For a single point in time a consistent complex object can be assembled based on the multiple versions stored. We refer to it as a _view_ into historic versions.
17 4 Markus Döring
18
19 13 Markus Döring
The following example shows a change to a name object which is part of a taxon and itself has a reference and author object attached. Note that the same author object is used by the name and by the reference object (the name in fact _caches_ the author object which in a fully normalised system should only be attached to the nomenclatural reference):
20 1 Markus Döring
21 15 Markus Döring
22 12 Markus Döring
![](name_change.png)
23 1 Markus Döring
24 14 Markus Döring
The system therefore can extract 2 views on the complex taxon object. One with the old name and one that includes the new version of it:
25 15 Markus Döring
26 12 Markus Döring
27 16 Markus Döring
![](taxon_versions.png)
28 12 Markus Döring
29
30 17 Markus Döring
### Unidirectional Relations
31 1 Markus Döring
32 17 Markus Döring
If we change the name N alone but leave all other objects untouched, the new name version will not be part of the taxon. So we have to update the name property of taxon T too.
33 12 Markus Döring
34 1 Markus Döring
35 17 Markus Döring
![](version_unidirect.png)
36 1 Markus Döring
37 17 Markus Döring
In case the name N was shared by 2 taxa T1 and T2, both taxa would have to be updated for the change of the name N to be propagated everywhere. A new version of an object should not automatically be applied to all other previously related objects. A modified name N' will therefore not be applied to all taxa using that name without manual interaction, i.e. agreement from the user that owns the taxon T2.
38 1 Markus Döring
39 16 Markus Döring
40 17 Markus Döring
![](taxon2_versions.png)
41
42 16 Markus Döring
Versioning and this non-automatic propagation of updated objects therefore leads to very similar effects as denormalisation. With the exception that there is a clear link between the versions so an automatic update _could_ be done at any time.
43 1 Markus Döring
44
45
46 17 Markus Döring
### Bidirectional Relations
47 1 Markus Döring
48 17 Markus Döring
49
A worse scenario are bidirectional relations. Because a change on one side would trigger a change on the other side, effectively all connected objects would have to be versioned.
50
51
52
![](version_bidirect.png)
53 16 Markus Döring
54 4 Markus Döring
55
56
57
## Accessing versions. DAO Implementations
58
59
It would be elegant to implement versioning entirely outside of the domain model, only in the Data Access Objects (DAOs).
60
61
62
Access to the latest version should be the default and as fast as possible.
63
64
65
Historic versions can and maybe even should be read-only.
66
67
68
Domain class methods like _getTaxonName()_ should not need to know (i.e. have a parameter) which view they are working on. If a taxon has different name versions in time, how does the taxon object know which to return? The method would have to accept a view parameter or store the view in the taxon object. But as the identical, persistent taxon object is used for several views, it cannot be stored in the persistent taxon object.
69
70
71 5 Markus Döring
It therefore seems best to create transient copies of historic versions and only allow the current view to be persistent, i.e updateable. The assembling of such _transient deep copies_ would have to be done in the DAO layer. As those copies are transient, the complex object boundaries have to be defined as part of the DAO method too, as no further lazy loading will be possible. Domain method calls to related objects which were not immediately loaded will get NULL!
72 4 Markus Döring
73
74
75
76
## Changes To Datamodel
77
78
79
### Many-Many Relations
80
81 9 Markus Döring
This solution modifies the referencing side of a domain model and keeps a list of all versions for a certain property. That means converting all properties to lists or adding an additional list property for each existing property. For example consider the name property of the Cdm:taxon:TaxonBase class:
82 4 Markus Döring
83 7 Markus Döring
~~~
84 8 Markus Döring
class TaxonBase extends VersionableEntity
85
    TaxonNameBase name; 
86
    Set<TaxonNameBase> nameVersions = new HashSet<TaxonNameBase>();; 
87 7 Markus Döring
~~~
88 4 Markus Döring
89
It allows for relatively fast database access to a complex object view.
90
91
92
It means a lot of extra coding, as all properties have to be duplicated and probably regular setter/getter methods have to be adapted.
93
94
95
96
### Linked Lists
97
98 8 Markus Döring
See Cdm:common:VersionableEntity previous and later property. Needs generic classes.
99 4 Markus Döring
100 7 Markus Döring
~~~
101
class VersionableEntity<T extends VersionableEntity>
102 8 Markus Döring
    T nextVersion;
103
    T previousVersion;
104 7 Markus Döring
~~~
105
106
107 1 Markus Döring
### Version Array
108 5 Markus Döring
109 8 Markus Döring
Cdm:common:VersionableEntity could have a version array property in addition to a pointer to the latest version. Needs generic classes.
110 1 Markus Döring
111
~~~
112
class VersionableEntity<T extends VersionableEntity>
113
    T currentVersion;
114
    List<T> allVersions = new ArrayList<T>();
115
~~~
116 8 Markus Döring
117
118 11 Markus Döring
## Bidirectional CDM Relations
119
120
Versioning bidirectional relations could cause problems as a change to any side will change the object n both sides, propagating a change.
121
122
123
124 8 Markus Döring
## Creating New Versions
125
126
Does a new version become a new object (in Hibernate sense) or does the old version become a new object?
127
128
129
Creating new current versions are bad for desktop applications like the [[TaxonomicEditor]], as they need to destroy all their objects, use the new ones and reregister listeners and other GUI components.
130
131
132
Creating a new old version means updating a lot of _unchanged_, existing objects as they now point to different object than before.
133
134 5 Markus Döring
135
136
## Unversioned CDM Classes
137
138 6 Markus Döring
Parts of the [[CommonDataModel]] would not need to be versioned. This probably applies to Cdm:common:DefinedTermBase and all relationship classes that currently do not inherit from a common superclass (but probalby should).