Discussion how to handle defined terms in the CDM Library

Discussion

Ticket with emails Ben - Andreas: #598

Existing Terms

Classcountuser-definedorderedmultipleneeded in modelneeded where
1 Language48520DEFAULT is needed
2 Continent9100
3 Rank621101NonViralNameParserImpl?.parseFullName
4 TypeDesignationStatus16110
5 NomenclaturalStatusType2411?needed in getStatusByAbbrev etc. -> needed by Formatter/CacheStrategy
6 SynonymRelationshipType30101
7 HybridRelationshipType4110?
8 NameRelationshipType101-211equals and addBasionym();
9 TaxonRelationshipType271101Taxon.getTaxonomicChildren, etc.
10 MarkerType4200
11 AnnotationType21000
12 NamedAreaType210?only in TdwgArea?.addTdwgArea(NamedArea?)
13 NamedAreaLevel910?only in TdwgArea?.addTdwgArea(NamedArea?)
14 NomenclaturalCode5001getNomenclaturalCode() in TaxonNameBase?-derived classes
15 Feature26301in constructor of some DescriptionElementBase? classes
16 TdwgArea10401110
17 NamedArea03121
18 WaterbodyOrCountry250200
19 PresenceTerm18210
20 AbsenceTerm1210
21 Sex2110
22 DerivationEventType8200
23 PreservationMethod0200
24 DeterminationModifier0300
25 StatisticalMeasure81000
26 RightsTerm3?00
27 BibtexEntryType0?000
28 ExtensionType0?00
29 InstitutionType0?00
30 MeasurementUnit02000
32 ReferenceSystem02000
33 TextFormat0200
34 Keyword0310
35 Modifier0310
36 State0310
40 Scope0210
41 Stage0210
42 NameTypeDesignationStatus91000

count: Number of existing terms in csv files
user-defined: necessity to have user defined instances (0: never; 1: very seldom, 2: sometimes, 3: often)
who: who may add a new term (list may be incomplete)
ordered: The vocabulary is ordered
multiple: multiple vocabularies should be allowed (e.g. different NamedArea? vocabularies): 0: same vocabulary in all applications; 1: one vocabulary per application; 2: multiple vocabularies per application
needed in model: 1 if the 'static' methods are used in other methods in the model, 0 otherwise
needed where: description of how the 'static' methods are used

Solutions

1. Mixed Model

  • Make those terms that need to be updated often and that are not used in the model ordinary classes (saving via @Cascade). No static methods are available for these classes. Instances of the classes may be received via the service layer.
  • Make those terms that do not need to be updated or need to be updated very seldom and that are used in the model an enum. Extension is only possible via changing the code.
    • List of enums:
      • NomenclaturalCode
      • SynonymRelationshipType
      • HybridRelationshipType
      • TaxonRelationshipType
    • List of unclear classes
      • Rank (tendency: make it an enum and think later about possibilities to extend it)
      • NomenclaturalStatusType ()
      • NameRelationshipType (tendency: make it an enum as all other relationships are also enum)
    • List of ordinary classes
      • all other classes
  • Optional: Keep some classes as classes that have to be initialized (see Existing Implementation ). No cascading is realized for these classes to keep objects unique etc.

  • Define an interface IDefinedTerm that both implement

  • Problems to be solved:
    • Representations for the enums
    • Vocabularies for enums including different vocabularies for different codes
    • t.b.c.
  • Discussion on single classes
    • Ranks:
      • Pro enum:
        • Domain logic is based on the ranks. E.g. parsers and formatters need to know if a rank is suprageneric, infrageneric, etc. or what abbreviation is used for them in general
        • Additions to the vocabulary are expected to be very seldom
      • Con enum:
        • the order for ranks may differ slightly for different application. This applies to old ranks or infra-ranks like tax.infrasp. or tax.infragen.
          Possible Solution: Having different vocabularies which store the order separatly like the feature trees. Different vocabularies for the different codes are needed anyway.
        • static methods do not have to be available before either connecting to the database or using the model in an unpersistent way (e.g. a web-service for parsing names)
    • NomenclaturalStatusType:
      • Pro enum:
        • Domain logic is based on them. E.g. parsers and formatters need to know their abbreviated representation
      • Con enum:
        • Additions to the vocabulary are expected to occurr very seldom but wanted by the user (experience from the Berlin Model)
        • static methods do not have to be available before either connecting to the database or using the model in an unpersistent way (e.g. a web-service for parsing names)
        • Order is not so important (thus implementation as a class easier)
    • NameRelationshipType:
      • Pro enum:
        • Domain logic is based on them. E.g. TaxonNameBase and TaxonBase have methods like addBasionym or nameRelation.equals(NameRelationshipType.BASIONYM())
        • Additions to the vocabulary are expected to be very seldom and not so urgent
        • Other relationship types are also implemented as enums
      • Con enum:
        • Additions to the vocabulary are expected to occurr although very seldom
        • static methods do not have to be available before either connecting to the database or using the model in an unpersistent way (e.g. a web-service for parsing names)
    • Feature:
      • Pro enum:
        • More or less needed in constructor of some subclasses of DescriptionElementBase (e.g. Distribution, CommonName )
      • Con enum:
        • Many terms maybe added, so a pure enum is impossible !!

2. Existing Implementation

  • Keep the defined terms as classes that are not cascaded via @Cascade.
  • Make the application developer responsible for using and initializing defined terms in the right way (e.g. adding new ranks only when defintely no other application is using the library)
  • Write a documentation how to use defined terms in the right way
  • Implement a good working exception handling with meaningful warnings

Problems that occurr when using the existing implementation

not all of them are unsolvable

  • Transient object xxx Exception when creating 2 CdmApplicationControllers in the same JVM (but not at the same time) due to constructor of Distribution which adds a Feature.DISTRIBUTION . The second time hibernate throws the error.
  • Uninitialized collection exception when saving taxon names due to calling getFullTitleCache which needs the nomenclatural status abbreviation
  • t.b.c.