Project

General

Profile

Controlled Vocabularies » History » Version 4

Andreas Müller, 04/07/2022 04:42 PM

1 4 Andreas Müller
# Controlled Vocabularies
2 1
3 4 Andreas Müller
*Controlled vocabularies in the Common Data Model*
4 1
5
----
6
7 4 Andreas Müller
{{toc}}
8
9
{{child_pages(depth=1)}}
10 1
11
### Introduction to Problem
12
13
Lists of _"predefined, authorised terms"_ [[Wikipedia|on WikiPedia:Controlled_vocabulary]] are used throughout the taxonomic domain. I think there are common properties of controlled vocabularies that we should discuss and tackle in a general form – perhaps this is an old discussion which I just didn’t find yet. 
14
15
16
I take a rather wide definition of CVs here, including any list of two or more terms (string or numeric) that represent a list of exclusive values for a defined attribute/element/property.  Mark that this includes value/null, or yes/no values, if not covered by a bolean data type. 
17
18
19
Two examples from the LSID Ontology and one example drawing on the ABCD.RecordBasis type restriction to make my points: 
20
21
22
* Class: Taxon Rank Term  http://wiki.tdwg.org/twiki/bin/view/TAG/TaxonRankLsidVoc  
23
24
* Class: Nomenclatural Code Term http://rs.tdwg.org/ontology/voc/TaxonName * http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html
25
26
27
I assume that in designing the model we will strive to make it as simple as possible, while trying to remain open for future extensions, also unforeseen ones, as much as possible.
28
29
30
With controlled vocabularies, extension can simply mean added terms, so the model must cover that possibility. 
31
32
However, it should also cover the possibility to add further information that programs can use or act on as a part of the CV itself. These may be 
33
34
* further restrictions – e.g. for type specimens, the RecordBasis must be “PreservedSpecimen” or “DrawingOrPhotograph”. 
35
36
* functional attributes, which are exclusive to one of the terms in the list, e.g. default value 
37
38
* attributes that classify the terms (e.g. ranks not recommended by the code of nomenclature, deprecated terms)
39
40
* alternative labels, e.g. for language representations, abbreviated / not abbreviated 
41
42
* language representations of descriptions that can be used as help text
43
44
* language representations of short descriptions that can be used as prompts in forms etc. 
45
46
* references to other controlled vocabularies, that define various subsets of the term list itself (e.g. rank term used only according to Zoo
47
48
49
I don’t say that we need to implement any of this on the outset, but that our modelling method should allow to extend the model in this way. For example, I don’t think that XML schema restrictions can cover any of the above directly. 
50
51
52
Another area to be discussed is versioning of the CVs. 
53
54
55
56
----
57
58
59
### Recommended Solutions