CdmVersionTwoDiscussion » History » Version 50
Helene Fradin, 03/19/2009 11:08 AM
1 | 2 | Andreas Müller | {{>toc}} |
---|---|---|---|
2 | |||
3 | |||
4 | |||
5 | |||
6 | # CDM v2.0 Discussion |
||
7 | |||
8 | |||
9 | |||
10 | |||
11 | ---- |
||
12 | |||
13 | 4 | Andreas Müller | _This is a site to discuss possible changes to the [CDM v1.4](http://wp5.e-taxonomy.eu/cdm/v14/) to go into CDM v2.0_ |
14 | 2 | Andreas Müller | |
15 | 40 | Helene Fradin | _See also Component C5.80 - Review of CDM v.1 and model for descriptive data in CDM v.2_ |
16 | 2 | Andreas Müller | |
17 | 40 | Helene Fradin | |
18 | 2 | Andreas Müller | ---- |
19 | |||
20 | |||
21 | |||
22 | 10 | Helene Fradin | |
23 | 8 | Helene Fradin | ## DESCRIPTIVE DATA - PROPOSED REVISIONS |
24 | 3 | Andreas Müller | |
25 | 1 | Andreas Müller | |
26 | 10 | Helene Fradin | |
27 | ---- |
||
28 | |||
29 | |||
30 | 37 | Helene Fradin | |
31 | 10 | Helene Fradin | ## 1. MAJOR - Character/Descriptor/Feature concept |
32 | |||
33 | |||
34 | 30 | Helene Fradin | **Impacted objects: Feature** |
35 | 6 | Helene Fradin | |
36 | 1 | Andreas Müller | |
37 | The Feature class is described in the class comments by: "The class for individual properties (also designed as character, type or category) of observed phenomena able to be described or measured." |
||
38 | 7 | Helene Fradin | |
39 | |||
40 | 30 | Helene Fradin | **a. Issues** |
41 | 8 | Helene Fradin | |
42 | 31 | Helene Fradin | |
43 | 15 | Helene Fradin | It is very interesting that the object Feature is not typed such as Characters in SDD (Categorical, Quantitative, etc.) or many other models. However, if the information is needed as to what kind of data is supported by a certain Feature, it is not clearly stated how to understand and use the different attributes. Moreover, there are a dozen categories of Features (Additional Publication, Image, Cultivation, Description, ...) that are rich but difficult to interpret in the case of the import. |
44 | |||
45 | |||
46 | As a reminder, below is the list of the Feature class attributes: |
||
47 | |||
48 | 16 | Helene Fradin | - supportsTextData -> feature can be described with TextData objects |
49 | 15 | Helene Fradin | |
50 | 16 | Helene Fradin | - supportsQuantitativeData -> feature can be described with QuantitativeData objects |
51 | 15 | Helene Fradin | |
52 | 16 | Helene Fradin | - supportsDistribution -> feature can be described with Distribution objects (geographical) |
53 | 15 | Helene Fradin | |
54 | 16 | Helene Fradin | - supportsIndividualAssociation ~~> feature can be described with IndividualsAssociation objects (between the described specimen and a second one -~~ for instance a host, only for SpecimenDescription) |
55 | 15 | Helene Fradin | |
56 | 16 | Helene Fradin | - supportsTaxonInteraction ~~> feature can be described with TaxonInteraction objects (between the described Taxon and a second one -~~ for instance a parasite, a prey or a hybrid parent, only for TaxonDescription) |
57 | 15 | Helene Fradin | |
58 | 16 | Helene Fradin | - supportsCommonTaxonName -> feature can be described with CommonTaxonName objects |
59 | 15 | Helene Fradin | |
60 | 16 | Helene Fradin | - recommendedModifierEnumeration -> set of TermVocabulary containing the Modifier objects recommended to be used for DescriptionElementBase elements |
61 | 15 | Helene Fradin | |
62 | 16 | Helene Fradin | - recommendedStatisticalMeasures -> set of StatisticalMeasure recommended to be used in case of QuantitativeData |
63 | 15 | Helene Fradin | |
64 | 16 | Helene Fradin | - supportedCategoricalEnumerations -> set of TermVocabulary containing the list of possible State to be used in CategoricalData |
65 | 12 | Helene Fradin | |
66 | 17 | Helene Fradin | |
67 | The flexibility of the Feature class is not a problem for the import of SDD descriptive data: for each character, a new DESCRIPTION Feature instance is created: |
||
68 | |||
69 | - for SDD CategoricalCharacter, supportedCategoricalEnumerations is set with the states defined in SDD in the elements StateDefinition |
||
70 | |||
71 | - for SDD QuantitativeCharacter, supportsQuantitativeData is set to true. |
||
72 | |||
73 | - for SDD TextCharacter, support supportsTextData is set to true. |
||
74 | |||
75 | 19 | Helene Fradin | - SDD SequenceCharacter: so far, this data are not imported and I don't have an SDD example of this element being used. I guess it should be imported in a Sequence object? |
76 | 18 | Helene Fradin | |
77 | |||
78 | However, exporting SDD data raises questions about the object Feature. I can see 3 different problems: |
||
79 | |||
80 | 1. There is no safeguard to ensure that DescriptionElementBase objects used for a description tally with the way the corresponding Feature has been described (for example, a DescriptionElementBase associated with a Feature that has only information on supportedCategoricalEnumerations, could be of the type QuantitativeData). |
||
81 | |||
82 | 1. The SDD standard and most descriptive models require the definition of a descriptive system (list of characters, potential states, potential measures) before expressing the strutured descriptions through this descriptive system. It is difficult to export properly this descriptive system to SDD: I can either export all the Feature (but most of them will be non relevant to the exported descriptions), or I can create the descriptive system by scanning all descriptions to extract only characters that are effectively used in the concerned descriptions (loss of efficiency). |
||
83 | |||
84 | 22 | Helene Fradin | 1. In SDD, categorical states do not have to be defined at the Character level, they can be defined at a more general level and shared. Therefore, the supportedCategoricalEnumerations could well be empty: how do we know then that it supports StateData? |
85 | 17 | Helene Fradin | |
86 | 12 | Helene Fradin | |
87 | 30 | Helene Fradin | **b. Example** |
88 | 31 | Helene Fradin | |
89 | 19 | Helene Fradin | |
90 | If we consider the feature (character/descriptor in other models) "Leaf length". Below are examples corresponding to each problem described above: |
||
91 | |||
92 | 1 | Andreas Müller | 1. A new Feature Instance names "Leaf length" is created with the attribute supportsQuantitativeData set to true and supportedCategoricalEnumerations set to null. It is still possible to create a DescriptionElementBase of type CategoricalData with the attribute feature set to "Leaf length" feature, and for example, the attribute states set to a list of StateData containing one item {"small"}. -> A feature described as a quantitative feature is used as a categorical feature. |
93 | 23 | Helene Fradin | |
94 | 19 | Helene Fradin | |
95 | 21 | Helene Fradin | 1. Exporting 2 descriptions from the CDM, which contain only 1 DescriptionElementBase, such as: |
96 | 19 | Helene Fradin | |
97 | 1 | Andreas Müller | Viola hederacea -> Leaf Length (mm) -> {Min = 2.3, Mean = 5.1, Max = 7.9, SD = 1.3, N = 20} |
98 | 19 | Helene Fradin | |
99 | 21 | Helene Fradin | |
100 | 1 | Andreas Müller | Viola betonicifolia -> Leaf Length (mm) -> {Min = 2.9, Mean = 5.3, Max = 7.4, SD = 1.3, N = 21} |
101 | 19 | Helene Fradin | |
102 | 21 | Helene Fradin | |
103 | 19 | Helene Fradin | There might be other Feature instances stored in the CDM ("Leaf complexity", "Body shape", "Flattening of body", ...) related or not to the descriptions of such plants. |
104 | 1 | Andreas Müller | |
105 | 22 | Helene Fradin | Therefore, when exporting the descriptive system, either there will be a majority of non-used features exported, if all feature are exported, or descriptions will have to be scanned one by one to detect only effectively used ones. For the last solution, it is ok with this simple example, but if with potentially hundreds of descriptions and hundreds of characters, the complexity increases quickly. |
106 | 1 | Andreas Müller | |
107 | 22 | Helene Fradin | 1. The states "small", "medium", "large" could be defined as DescriptiveConcept elements in SDD and the CategoricalCharacter "Leaf length" could contain no StateDefinition elements, using the stated defined more generally in CodedDescriptions. In this case, when the character "LeafLength" is imported, a Feature with no supportedCategoricalEnumerations is created. This Feature type is undefined while it supports CategoricalData. |
108 | 1 | Andreas Müller | |
109 | 22 | Helene Fradin | |
110 | 30 | Helene Fradin | **c. Current solution** |
111 | 1 | Andreas Müller | |
112 | 22 | Helene Fradin | |
113 | 26 | Helene Fradin | For now, all Feature instances are exported. |
114 | 1 | Andreas Müller | |
115 | |||
116 | 30 | Helene Fradin | **d. Proposed change (NOT IMPLEMENTED)** |
117 | 21 | Helene Fradin | |
118 | |||
119 | 26 | Helene Fradin | I think there should be a distinction within Feature attributes, between the type of data supported by the Feature (supportsTextData, supportsQuantitativeData, etc.) and the domain of possible values or frame of reference (recommendedStatisticalMeasures, supportedCategoricalEnumerations). |
120 | 1 | Andreas Müller | |
121 | 26 | Helene Fradin | In practical terms: |
122 | |||
123 | 41 | Helene Fradin | - I would add a boolean to the attribute: 'supportsCategoricalData' *(IMPLEMENTED)*, |
124 | 26 | Helene Fradin | |
125 | - I would remove the domain of possible values (recommendedModifierEnumeration, recommendedStatisticalMeasures, supportedCategoricalEnumerations) and create a new class that we could call for example PossibleValues or RecommendedValues from which would inherit RecommendedModifiers, RecommendedStates, and RecommendedStatisticalMeasures. |
||
126 | |||
127 | - I would add an attribute (e.g. PossibleValuesDomains) that would be a Set<RecommendedValues>). |
||
128 | |||
129 | |||
130 | It doesn't prevent problem 1 from happening but at least it clarifies the typing of Feature objects: it is set only through the boolean attributes 'supports...'. |
||
131 | |||
132 | 27 | Helene Fradin | It doesn't resolve problem 2. I would suggest to attach an DescriptiveSystem object to a DescriptionBase object (see item 6). |
133 | 1 | Andreas Müller | |
134 | 42 | Helene Fradin | It resolves problem 3. The typing of the Feature will only depend on the boolean attributes. |
135 | 1 | Andreas Müller | |
136 | 42 | Helene Fradin | |
137 | 48 | Helene Fradin | [[Gregor|Hagedorn - 27/02/2009]] One comment on PossibleStatisticalMeasures: at this point both SDD and CDM take the position that all statistical measures known to the system are in principle valid data and thus allowed. At the same time, the designer of a matrix has a valid interest to make a choice of preferred measures. This is the reason why we speak of "recommendedStatisticalMeasures". Example: Leaf Length, Kurtosis = 2.3 is just as valid a statement (although highly unlikely) as Leaf Length, mean = 12.3. However: Flower color = Long is simply wrong. Thus the strict enforcement of possible states. |
138 | |||
139 | The base class seems reasonable, I would, however, recommend renaming it from PossibleStates to AvailableStates. |
||
140 | |||
141 | |||
142 | [[Andreas|Müller - 27/02/2009]] The PossibleValues class seems reasonable to me but instead of having subclasses all having the same structure we could use Java generics instead |
||
143 | |||
144 | |||
145 | 50 | Helene Fradin | Class PossibleValues<T implements IPossibleValue>{ |
146 | |||
147 | Set<T> supportedValues; |
||
148 | |||
149 | } |
||
150 | 48 | Helene Fradin | |
151 | |||
152 | and/or something similar for the Vocabulary based supported values and IPossibleValue implemented by all relevant classes like MeasurementUnit and StatisticalMeasure |
||
153 | |||
154 | |||
155 | 44 | Helene Fradin | ![](Feature.png) |
156 | 12 | Helene Fradin | |
157 | |||
158 | ---- |
||
159 | |||
160 | |||
161 | |||
162 | 32 | Helene Fradin | ## 2. HIGHLY CRITICAL - Mixed properties associated with mixed objects |
163 | 12 | Helene Fradin | |
164 | 1 | Andreas Müller | |
165 | 33 | Helene Fradin | **Impacted objects: all objects inheriting from VersionableEntity** |
166 | 1 | Andreas Müller | |
167 | 32 | Helene Fradin | |
168 | 33 | Helene Fradin | **a. Issue** |
169 | 32 | Helene Fradin | |
170 | 1 | Andreas Müller | |
171 | 40 | Helene Fradin | [[Helene] Some very useful properties are available only for a restricted number of objects I found that extremely hard when importing SDD data into the CDM because I sometimes needed a property that I knew existed for other objects but was not available for the considered object|[Gregor]] I find your observation about the limitation that "essential general properties (title, description, media and original sources) are available only for a restricted number of objects" very interesting. I had some discussions with Markus, trying to get him on erring on the side of allowing sometimes a property which is only necessary under very special use cases, rather than custom tailoring properties to the currently perceived needs. I can understand that Markus wanted to have a clean model, but since in SDD we started doing this, and in the end found that more and more things are shared, we at some point decided to move quite a bit (I am not claiming the fully correct bit) into the abstract base classes. |
172 | 1 | Andreas Müller | |
173 | 40 | Helene Fradin | The "precision" aimed at, is also in my view responsible to the deep class hierarchy, which hinders a ready understanding of the model. From the UML it is difficult to derive which properties some derived classes have, because all inheritance layers contribute. |
174 | 1 | Andreas Müller | |
175 | 40 | Helene Fradin | |
176 | 34 | Helene Fradin | **d. Proposed change (NOT IMPLEMENTED)** |
177 | |||
178 | |||
179 | I think these properties should be made generic, therefore available at a higher level. |
||
180 | |||
181 | The specific attributes I am thinking of are: **representations** (Set<Representation>), **media** (Set<Media>), **sources** (Set<OriginalSource>). |
||
182 | |||
183 | To implement this, I can see 2 solutions: a drastic one and a less drastic one. |
||
184 | |||
185 | |||
186 | - drastic (directly inspired from the use of the SDD Representation element) : the problem is that it would impact the CDM at a high level so I am probably overlooking important issues raised by this. |
||
187 | |||
188 | It consists in having these attributes at the level of the VersionableEntity object. However, as the Representation, Media and OriginalSource classes all inherit from VersionableEntity, they should be removed from this hierarchy of objects and defined independantly. |
||
189 | |||
190 | The new VersionableEntity attribute would be: |
||
191 | |||
192 | representations: Set<Representation> |
||
193 | |||
194 | and the Representation object, defined independantly, would contain media and sources as attributes. |
||
195 | |||
196 | In parallel, redundant attributes in lower classes could be removed. |
||
197 | |||
198 | Therefore, any CDM object inheriting from VersionabeEntity could be represented in the same way: a title and a description (possibly available in several languages), one or several images attached to the object, and one or several sources. |
||
199 | |||
200 | |||
201 | - less drastic: to make available these properties largely, they could be put back up in the hierarchy. |
||
202 | |||
203 | I would suggest: |
||
204 | |||
205 | > adding to TermBase: sources + media |
||
206 | |||
207 | > adding to Media: representations |
||
208 | |||
209 | > adding to ReferencedEntityBase: media |
||
210 | |||
211 | > adding to IdentifiableEntity: representations + media |
||
212 | |||
213 | > adding to FeatureNode: representations + media + sources |
||
214 | |||
215 | > removing media from DefinedTermBase |
||
216 | |||
217 | > removing media from DescriptionElementBase |
||
218 | |||
219 | > removing media from IdentifiableMediaEntity |
||
220 | |||
221 | 32 | Helene Fradin | |
222 | 45 | Helene Fradin | [[Ben|Clark]] suggested that we could make a TermBase an IdentifiableEntity - IdentifiableEntities do have a collection of OriginalSources, and space for the IdInSource. |
223 | |||
224 | |||
225 | 47 | Helene Fradin | ![](TermBase.PNG) |
226 | 45 | Helene Fradin | |
227 | 32 | Helene Fradin | |
228 | 12 | Helene Fradin | ---- |
229 | |||
230 | |||
231 | |||
232 | 13 | Helene Fradin | ## 3. MAJOR - Creation of a defined set of descriptions |
233 | 12 | Helene Fradin | |
234 | |||
235 | 35 | Helene Fradin | **Impacted objects: new object** |
236 | |||
237 | |||
238 | **a. Issue** |
||
239 | |||
240 | |||
241 | Cf. mail exchanges between Gregor Hagedorn, Ben Clark and Helene Fradin in December 2009 "Keys and descriptions in the CDM". |
||
242 | |||
243 | There is no equivalent way of representing a SDD Dataset into the CDM and multi-access keys. |
||
244 | |||
245 | |||
246 | **d. Proposed change (NOT IMPLEMENTED)** |
||
247 | |||
248 | |||
249 | 36 | Helene Fradin | The solution proposed by Ben was a delimited set of taxa and their description. It would certainly be helpful for the import/export between SDD and CDM. |
250 | 35 | Helene Fradin | |
251 | 36 | Helene Fradin | [Gregor] Perhaps to generalize this, a working set of taxa and a default character tree (to optionally create a subset of all taxa) could be provided? Such a working set could then carry a flag that it is suitably revised to serve as a multi-access key. |
252 | 35 | Helene Fradin | |
253 | |||
254 | public class WorkingSet { |
||
255 | 1 | Andreas Müller | |
256 | 36 | Helene Fradin | private Map<Taxon,DescriptionBase> matrix; |
257 | 35 | Helene Fradin | |
258 | 37 | Helene Fradin | private DescriptiveSystem descriptiveSystem; |
259 | |||
260 | 36 | Helene Fradin | private boolean multiAccessKey; |
261 | |||
262 | 46 | Helene Fradin | private Language defaultLanguage; |
263 | |||
264 | 36 | Helene Fradin | ... |
265 | 35 | Helene Fradin | |
266 | } |
||
267 | |||
268 | |||
269 | 12 | Helene Fradin | |
270 | ---- |
||
271 | |||
272 | |||
273 | |||
274 | 13 | Helene Fradin | ## 4. MAJOR - Mapping use and rederential objects |
275 | 12 | Helene Fradin | |
276 | |||
277 | |||
278 | ---- |
||
279 | |||
280 | 1 | Andreas Müller | |
281 | |||
282 | 40 | Helene Fradin | ## 5. MAJOR - Problem how CDM handles the link between description and scientific taxonomic name |
283 | |||
284 | |||
285 | **a. Issue** |
||
286 | |||
287 | |||
288 | The fact that structured descriptions (DescriptionBase objects) cannot always be linked with a scientific taxonomic name raises problems for regrouping related descriptions. If the only possibility to regroup descriptions is by using the association with an existing taxonomic hierarchy, it limits the possibility of extracting sets of descriptions from the CDM. In addition, when importing data into the CDM, the information on potential connections between descriptions other than taxonomic is lost if not structured identically (e.g. use of the Scope class). A model such as SDD uses a Dataset object which contains a set of descriptions that can be tagged with a name, a description and media objects. |
||
289 | 12 | Helene Fradin | |
290 | |||
291 | |||
292 | ---- |
||
293 | |||
294 | |||
295 | |||
296 | 13 | Helene Fradin | ## 6. MINOR - Descriptive system |
297 | 37 | Helene Fradin | |
298 | |||
299 | **Impacted objects: DescriptionBase** |
||
300 | |||
301 | |||
302 | **a. Issue** |
||
303 | |||
304 | |||
305 | There is no possibility of associating a set of features/characters/descriptors to a description, or a set of descriptions. |
||
306 | |||
307 | |||
308 | 38 | Helene Fradin | **d. Proposed change (IMPLEMENTED as an attribute of DescriptionBase)** |
309 | 37 | Helene Fradin | |
310 | |||
311 | To create a new object called DescriptiveSystem which contains at least a set of Feature objects possibly associated with domain of values. |
||
312 | |||
313 | |||
314 | public class DescriptiveSystem { |
||
315 | |||
316 | private Set<Feature> features; |
||
317 | |||
318 | // OR private Set<Feature, Set<PossibleValues>>; |
||
319 | |||
320 | 1 | Andreas Müller | } |
321 | 38 | Helene Fradin | |
322 | |||
323 | 39 | Helene Fradin | CURRENT INTERMEDIARY IMPLEMENTATION: http://dev.e-taxonomy.eu/trac/attachment/wiki/CdmVersionTwoDiscussion/DescriptionBase.gatcl.PNG |
324 | 12 | Helene Fradin | |
325 | |||
326 | |||
327 | ---- |
||
328 | |||
329 | |||
330 | 1 | Andreas Müller | |
331 | 13 | Helene Fradin | ## 7. MINOR - How to express uncertainty or inapplicability ? |
332 | 12 | Helene Fradin | |
333 | |||
334 | |||
335 | ---- |
||
336 | |||
337 | 1 | Andreas Müller | |
338 | 12 | Helene Fradin | |
339 | 13 | Helene Fradin | ## 8. MINOR - Handling of multiple languages |
340 | 12 | Helene Fradin | |
341 | 1 | Andreas Müller | |
342 | |||
343 | 12 | Helene Fradin | ---- |
344 | |||
345 | |||
346 | |||
347 | 13 | Helene Fradin | ## 9. MINOR - Media properties and associations |
348 | 12 | Helene Fradin | |
349 | |||
350 | 13 | Helene Fradin | IMPLEMENTED |
351 | 12 | Helene Fradin | |
352 | 13 | Helene Fradin | |
353 | |||
354 | 12 | Helene Fradin | ---- |
355 | |||
356 | |||
357 | |||
358 | 13 | Helene Fradin | ## 10. MINOR - A default measurement unit for Feature |
359 | 12 | Helene Fradin | |
360 | |||
361 | |||
362 | ---- |
||
363 | |||
364 | |||
365 | |||
366 | ## 11. MINOR - Ordering of TermVocabulary for supportedCategoricalEnumerations in Feature |
||
367 | 24 | Helene Fradin | |
368 | |||
369 | |||
370 | ---- |
||
371 | |||
372 | |||
373 | |||
374 | ## 12. MINOR - Why is the setParent function not public in FeatureNode ? |
||
375 | 25 | Helene Fradin | |
376 | |||
377 | |||
378 | ---- |
||
379 | |||
380 | |||
381 | |||
382 | 1 | Andreas Müller | ## 13. MINOR - How to distinguish between characters and groups as they are both Feature objects ? |
383 | 32 | Helene Fradin | |
384 | |||
385 | Should the 'partOf' attribute be used? |
||
386 | 27 | Helene Fradin | |
387 | |||
388 | |||
389 | ---- |
||
390 | |||
391 | |||
392 | |||
393 | ## 14. MINOR - How to export and reimport multi-types characters/features/descriptors between CDM and SDD ? |