Project

General

Profile

Actions

Classification (Taxonomic View)

This page is thought to explain the idea of a taxonomic view in the CDM


See also the discussion for changes in CDM v2

Situation In CDM v1.0

In the version 1.0 of the CDM taxonomic concepts were related to each other by the use of Taxon- and SynonymRelationships. The classification of taxa could be expressed by using TaxonRelationships of type "is taxonomically included in". The TaxonRelationship of this type combined 2 taxa of type Taxon with one of them being the parent and one being the child. Additional information like a reference (who says it is a child) could also be stored.

Some problems arised by handling taxonomic classifications this way. For example it was impossible to define the whole classification tree and especially the root taxon (or better the root taxa, explanation see below) for this tree (forest). In case of each taxon having a maximum of one parent it was possible in theory as you could traverse the whole graph of taxa and determine the root elements but for performance reason this is a deprecated way to do it and also it has the mentioned limitations.

As a work-around several methods have been implemented to retrieve the root of a taxonomic tree/classification by using queries like "Give me all taxa that do not have any parent". If you wanted to restrict your search on a special taxonomic view, you could additionally pass the sec reference of the root taxon to the method assuming that the sec reference represents more or less the taxonomic view.

This method implies that a taxonomic classification more or less uses taxa that all use the same reference. This may be the preferred way of handling taxa in the pure way (a taxon name mentioned in a taxonomic classification is something slightly different compared to the same taxon name mentioned somewhere else and therefore a new taxon should be created). However often users want to go another way and for example "reuse" taxa. This leads to a situation where the same taxon can be used in different taxanomic views. Therefore the taxa of different views may have very different sec references and may also have multiple parents.

Also on the programmatic side there was a serious limitation for this work-around solution as for performance reasons the CDM Library stored the parent of a taxon in a cache field which does not work with multiple parents.

Solution for CDM v2.0

To overcome the problems mentioned above the following solution that separates classification from concept data is proposed for CDM v2.0.

  1. Creating a new class TaxonomicView that represents one classification / taxonomic view. A taxonomic view may consist of several distinct trees as not all the parts of the classification may be known yet. Therefore there may be multiple root nodes in a view or generally spoken a taxonomic view rather represents a forest than a tree despite the fact that the abstract idea of a taxonomic view is to represent represent exactly one tree that contains all included taxa and there relationship.

  2. Creating a new class TaxonNode that represents a taxon within its classification and also knows about its unique parent (the parent within the respective classification)

  3. Deleting the taxon relationship type "is taxonomically included in" from the list of TaxonRelationshipTypes.

Additional changes for the CdmLibrary are

  1. Separating business logic that handles classification logic in the Taxon class and moving it mainly to the TaxonNode class.

  2. Creating new service layer methods to retrieve taxonomic views and to retrieve roots

The resulting model (except for operations) may look like the following

!taxonomicView.png!

Open Issues

Naming

  • Should we call the class that represents the classification "!TaxonomicView", "!TaxonomicClassification" or just "Classification"

  • How to call the reference attribute that is part of the TaxonNode and that stands for the reference that describes the parent child relationship. Or maybe more general: do we need such a reference at all or must this always be the reference of the taxonomic view itself?

Subclassing

Merging of taxonomic views

Sometimes it may be needed to merge 2 taxonomic views. This may be the case when

  • It shows that the 2 taxonomic views belong to the same general view but represent different and distinct parts of it. This case is easy to implement by changing the taxonomic view information of all taxon nodes of the first view to the second and accordingly updating the list of nodes in the second view. The first view will not be deleted (as it may be identifiable) but gets an attribute TaxonomicView mergedInto that points to the second view.

  • It shows that 2 views are not distinct but differ in a subtree part.

E.g. there may be the Euro+Med view that includes botanical taxa and there may be some special view from Taraxacum experts ("Taraxacum view") that takes over the E+M classification but may differ in the classification for Taraxacum.

The +greedy solution+ for this use case may be that we create a new taxonomic view ("Taraxacum view"), copy all nodes from Euro+Med, and delete all subnodes of the node representing Taraxacum.

A more +sophisticated solution+ may be to create a new "Taraxacum view" that includes only Taraxacum and it's subnodes and to add to Taraxacum (1) the information what the parent node from the "Euro+Med" view is and (2) that it replaces the Taraxacum node of "Euro+Med" within the alternative "Taraxacum view". (3) In the Taraxacum view we only have to store the information that the Taraxacum node is the node that connects this view to the more general "Euro+Med" view.

The latter can be easily achieved by adding one attribute to the TaxonNode class that stands for the node this node replaces in some other view and by adding an attribute that points to this connection node. All other information is immanent and can easily be retrieved calling some methods on these objects (e.g. the information what the higher (more general) view of this view is can be retrieved by using the connection node, from there getting the node it replaces and the view of this replacement node then is the higher view of this view.

Use of synonyms

Synonyms in general are handled on the concept level not on the classification level. Anyway some users expressed their need to use a concept defined somewhere but using a synonym name within the taxonomic view they handle.

There are different possibilities to handle this situation:

  • The pragmatic way: A TaxonNode gets an additional attribute called "nameToBeUsed" (type: TaxonNameBase) that represents the synonym name that you want to use in your classification. This way is not a proper use of the concept idea because a concept by definition is a pair of (!TaxonName, SecundumReference) so saying that a node links to a Taxon and than using another name sounds contradictionary.

  • The pragmatic way 2: An alternative to the pragamtic way may be to store the Synonym you want to refer to instead of the TaxonNameBase. This more or less restricts the selection of names you can choose from to those really used in the secundum reference.

  • The real synonym way: You allow synonyms to be stored as the concepts of a taxon node (the taxon attribute is of type TaxonBase including taxa and synonyms instead of only type Taxon). To retrieve the accepted you may call synonym.getAcceptedTaxon().

  • The pure way: You create a new Taxon that consists of the taxonomic name (synonym name) you want to use and that has a concept relationship of type "is congruent to" to the concept you want to refere to. The secundum reference of the new Taxon must then be your view (or better the reference that represents your view). To be discussed: "What is the best reference to use here?".

  • The explicit congruency way: This is an alternative to the pure way, where you create a new Taxon as described in the pure way but you explicitly define the concept that is meant to be the congruent one by storing it as such in the TaxonNode, whereas the pure way stores it only as concept relationship between 2 taxa. The drawback of the latter is that a taxon can have multiple congruence relationships to other taxa so by storing only the taxon information in the TaxonNode may lead to a situation where you do not know exactly which congruent taxon the newly created taxon really wants to relate to. Maybe you can say that the type of relationship is more then one that expresses congruence but rather equalness. In this context equalness means that the description of the taxon is stored at one and the same place because the reference of the new taxon just links to the old taxon and does not add any additional information to the taxon(except for the parent-child relationships). The idea of equalness in general also holds for synonyms that have the same sec reference as an accepted taxon.

On the other hand by explicitly storing the congruent taxon explicitly we start to store concept relation information at 2 different places (!TaxonRelationship and TaxonNode) which leads to the known problems you get with redundant information.

  • The easy way: You leave it to the user how to handle such a use case. (This way you may force the user to think about the sense or non-sense of using taxon concepts in this way :-) )

Hybrid taxa

A taxonomic classification is not always a pure tree but may include hybrids that have 2 "parents". A solution for modelling this special case might be to have a separate class that inherits from TaxonNode called e.g. HybridTaxonNode. It knows how to handle a pair of "parents" being not really parents as they do not necessarily have a higher rank but rather express some phylogenetic information. As relations between hybrids and their "parents" are substantially different to a normal parent child relationship the TaxonNode may be extended by a list of hybrid children (which is empty in most cases).

Inheritens

From which classes should the new clases TaxonomicView (Classification) and TaxonNode inherit?

Rational Behind the Model Changes

I don't think that the data model should decouple classification, or at least inclusion of child taxa from the description of a taxon: the two are logically linked

So the main conceptual issue about taxonomic trees for me is to clearly separate the concept part of a taxon

(the descriptions and the synonymy) from the classification part (the question how this taxon is related to

other accepted taxa). In many use-cases this is not an issue because you may consider a Taxon as something

described in its sec reference which may (but not must) include the classification. But if you have a more

narrow view of a taxon to be mainly the taxon sec's description and not it's proposed classification (if

available at all) it may become relevant.

I agree that it is possible to include a taxon concept in multiple classifications. So it is important that a taxon concept can have multiple parents (but only one parent in any one taxonomic classification). However, I don't agree that the definition of a taxon concept is something that is separated from the child taxa that it includes.

I'm not convinced by idea that you can separate the synonymy and description of a taxon from the subordinate taxa which are members of that taxon. I think that the position of a taxon within a classification is logically determined by its description (at least implicitly). When you look at the taxonomic process, in which a taxonomist examines specimens, and constructs taxonomic concepts by grouping specimens based on their characteristics, the hierarchy within the classification is clearly related to the characteristics of the taxa (if not directly computational from the diagnosis). I also came across a blog post from Roger Hyam which talks about the same thing http://www.hyam.net/blog/archives/707. I don't think that there are many meaningful changes to the description of a taxon that would not change its circumscription.

You could equally take the "specimens which it includes" route and say that a taxon is circumscribed by the specimens which it includes (again this will mainly be implicitly, because taxonomist are thinking of more individuals than the types and the specimens in the "materials examined"). Removing or adding a child taxon from a parent changes the circumscription of the parent.

!classificaition1.png!

From my understanding of the problem, (i) within a well-formed classification, and at any given level (rank) accepted taxon concepts are disjoint (they don't overlap at all) (ii) a taxon concept is (often implicitly) defined by the set of individuals which it includes, or by the area of trait-space it occupies (iii) the circumscription of a well-formed parent taxon (in terms of individuals, area of trait space, or other parameters) is the union of their members (denoted membership).

So if you compare classification 1 (in which genus X has child concepts A, B, and C)and classification 2 (in which genus X contains child concepts A and B), although A and B can be the same taxon concepts in both classifications, the parent taxon X cannot be the same since its circumscription is different. X is only the same concept if either A or B (or A and B) change their circumscription to exactly compensate for the lack of C (see classification 3).

Also, taxonomically, the situation in classification 3 is a bit odd because the author of classification 3 doesn't even know about taxon C. If they did know about C then they would make it a synonym of A but since A is the same object in both classification 1, 2 and 3, and synonymy is part of the taxon concept, adding C as a synonym of A would mean that A in classification 3 is a different taxon concept. So you couldn't start with classification 1 and change it, synonymising C with A and end up with classification 3 because you would have to change the taxon concept A for a new concept with the same name as its synonymy is different from the A in classification 1.

It is safer to propagate changes to the circumscription of a child taxon to its parent

If we would like to be able to compare two classifications, the current structure would model the Cichiorieae use case thus:

In this instance Crepinidae Dumort sec. "E+M" is not only the same in a conceptual sense, it is exactly the same object in the semantic sense. Of course, it is possible that the circumscription of Taraxacum F.H.Wigg. sec. "E+M" is identical to the circumscription of Taraxacum F.H.Wigg. sec. "WP6", but if either or both classifications are going to be managed going forward, it is unlikely that the situation will remain like this. In this scenario either information is lost (because you can't distinguish between the the concept Crepinidae Dumort sec. "E+M" as used in different classifications) or the sec reference becomes somewhat redundant, and you need to invent a third dimension - the taxonomic classification a concept is used in - to distinguish taxon concepts.

A safer way to do this is to assume that the circumscription of the parent taxa have changed also and then allow them to be related using concept relations if so desired. In this case, no information is lost, and the higher taxa can be considered to be equivalent, but are distinct (and available for inference,reasoning, or validation).

Fundamentally, there is a difference between having two objects that you can assert are the same, and only having one object.

It is not clear how the CDM Taxon / TaxonNode maps on to TDWG TCS-RDF Taxon

I think that modelling a taxonomic classification using TaxonNode and TaxonBase is conceptually different from, for example TCS and the TDWG ontology.

If a Taxon object has a GUID, but is a member of two classifications, how does this map to the TCS-RDF from TDWG? Are TaxonNodes analogous to TDWG taxon concepts? If that is the case, what does the Taxon GUID map to?

Use Cases

1. Users want to be able to present alternate views

This is a very difficult requirement to model because "taxonomic view" probably has as many meanings as there are taxonomists. Also "view" has a meaning in the domain of databases.

1.1 The Cichioridae WP6 exemplar have imported the Euro + Med checklist and have made changes to the genus Taraxacum. They would like to present both the Euro+Med classification and their own "WP6 view" which essentially is identical to Euro + Med checklist apart from the genus Taraxacum, which is different.

If the Euro + Med version of Taraxacum is updated, will this be changed in the Cichoridae portal? or are they intending to change only the "WP6 view" and leave the "Euro + Med" view as it is?

I can think of slightly different use cases

1.2 A database of the Sphingidae have stored the full hierarchy of taxonomic concepts including tribes, subtribes and so on. In their website navigation they don't want to display the whole classification because most users don't know which tribe a genus is in, so they would like to present a flatter view with only family, genus, and species

1.3 A global checklist would also like to present regional checklists. The regional checklist includes the same taxa, organised hierarchically, but excludes taxa not found in the region.

There is not enough detail in the use cases to know what the best technical solution is. For example, "taxonomic view" could mean:

  • That the taxon concepts are different between the classifications, but the names are shared to some extent

  • That the classifications are the same but differ temporally (i.e. classification 1 is the first edition and classification 2 is the second edition)

  • One classification is the superset of the other, and the second classifcation presents the subset in the same way (e.g. geographical subset)

  • That the taxonomic classifications share child nodes, but at some point the higher taxa differ

2. Users want to "reuse" concepts

As expressed by the users in Paris.

Again, its not clear what "reuse" really means. It could mean

  • "take a copy of".

  • "Allow multiple taxon concepts with the same name and sec reference (but not the same object) within the same database".

  • Allow child subtrees to be included within different parent concepts

There is also a more general requirement:

3. Storage of multiple taxonomic hierarchies in a single CDM database.

It must be possible to store and retrieve multiple taxonomic hierarchies in the same CDM database (regardless of whether they contain the same taxon concepts or not).

4. Using the CDM to manage a single checklist

Users wish to produce a well formed taxonomic checklist.

Users want to manage a single taxonomic classification. They should be prompted to follow the rules of nomenclature and prevented from making simple mistakes such as homonymy, incorrect formation of the name and so on.

Relative priority of use cases

In my humble opinion:

  1. Use Case 4 is the most important - there will always be more people managing one classification than multiple classifications (within any one database).

  2. Use case 3 is the next most important - we want to be able to store, for example, the Cichioridae and the Arecaceae in the same database.

  3. Use case 1 is the next most important. Some users will want to be able to present alternate views but will not want to maintain more than one.

  4. Use case 2 is the least important. It's not clear what "reuse" means.

Proposed amendment to the TaxonNode / TaxonomicTree solution

Based on the above I think we could improve the way that taxonomic parent-child relationships are modelled by using the ideas in TaxonNode but prevent the decoupling of the circumscription of a taxon concept from the circumscription of its children, which doesn't make logical sense.

  1. Taxon concepts to have multiple parents, but only one parent in any given classification

  2. Taxon concepts to be circumscribed by their description, synonymy, specimen circumscription and by the child concepts that they include. Logically all of these factors should agree with each other. There may never be a way to validate this in software.

  3. A CDM Store to have many "TaxonomicTree" objects representing different classifications.

  4. If the user wishes to store "Multiple taxonomic views" where on view is a subset of the other and does not neccessarily represent a complete taxonomic classification (e.g. in the examples where users want to present a geographical subset, or to remove some levels of the hierarchy for display purposes), the TaxonNode idea is very appropriate because the hierarchy does not neccessarily reflect the definition of the objects in it.

Changes:

  1. Deprecate the TaxonRelationshipType "Taxonomically Includes" - relationsToThisTaxon and relationsFromThisTaxon only refer to concept relationships

  2. Add a bidirectional many-to-many relationship between Taxon.parents and Taxon.children where the parent taxon is the "owner" of the relationship - the semantics being that a child does not know about or define its parent but a parent knows about and is defined by its children. For consistency with other relationship types, it might be best to extend RelationshipBase and have a ParentChildRelationship or to rename TaxonmicRelationship ConceptRelationship or something like that

  3. Add a property TaxonomicTree classification which defines which classification the relationship is valid in - in the case of there being only one parent of a concept, assume that the relationship is valid.

  4. Amend TaxonomicTree to contain a set of valid root Taxon objects, and a set of e.g. missapplied names / incerta sedis etc.

e.g.

public class Taxon extends TaxonBase<IIdentifiableEntityCacheStrategy<Taxon>> implements Iterable<Taxon>, IRelated<RelationshipBase>{

  @OneToMany(mappedBy="relatedTo")
  private Set<SynonymRelationship> synonymRelations = new HashSet<SynonymRelationship>();

  @OneToMany(mappedBy="relatedFrom")
  private Set<ConceptRelationship> relationsFromThisTaxon = new HashSet<ConceptRelationship>();

  @OneToMany(mappedBy="relatedTo")
  private Set<ConceptRelationship> relationsToThisTaxon = new HashSet<ConceptRelationship>();

  @OneToMany(mappedBy="relatedFrom")
  private Set<ParentChildRelationship> children = new HashSet<ParentRelationship>();

  @OneToMany(mappedBy="relatedTo")
  private Set<ParentChildRelationship> parents = new HashSet<ParentChildRelationship>();

  // Always returns the same thing, regardless of taxonomic classification
  public Set<Taxon> getTaxonomicChildren()

  // Always returns the same thing regardless of taxonomic classification
  public int getTaxonomicChildrenCount()

  // Always returns the same answer regardless of taxonomic classification
  public boolean hasTaxonomicChildren()

  // Answer depends upon which taxonomic tree is supplied, but will not return null unless
  // this taxon is the root of a classification
  public Taxon getTaxonomicParent(TaxonomicTree classification)
}
public class TaxonomicTree extends IdentifiableEntity implements IReferencedEntity {
    private Set<Taxon> rootTaxa = new HashSet<Taxon>();
}

One remaining problem will occur in some edge cases where users attempt to make changes to the circumscription of concepts that are shared with other classifications. For example, users managing the "WP6" Cichiorieae classification want to recombine Syncalathium chrysocephalum (C. Shih) S. W. Liu sec. "E+M" into Taraxacum F.H.Wigg. sec. "WP6" . This results in a change to both the circumscription of Taraxacum and to Syncalathium, so we would need to change the circumscription of Crepidinae Dumort. sec. "WP6" and replace Syncalathium Lipsch. sec. "E+M" with Syncalathium Lipsch. sec. "WP6" (which is essentially identical but does not contain chrysocephalum).

We would need to explore these kind of use-cases fully to understand what it is that the users want to happen. It might be worth adding a relationship between Taxon and TaxonomicTree to record which classification is the "owner".

Answers

We have discussed the above in the Berlin team and came up with the following remarks

  1. Concepts are only sometimes (a) but not always (b) defined by it's included children. This counts atleast for the descriptive approach to define taxa. Walter even tends to deny that a taxon is ever described by it's children.

  2. Concepts are in general dependend on their parent concept by inherinting desriptions from them (e.g. Taraxum inherits from Plantae to be autotroph (?). Or a child may inherit a specified gen sequence which makes up the parents description.

Let me elaborate 1) and 2) a bit more:

IMO Roger has a substantial error in his arguing against hypothesis 2b saying that families can't be reused if species are moved. He argues that a taxon description of a higher taxon is the sum of the descriptions of the lower taxa. So he borrows an argument that may count for the denotation approach (2a) but does not count for the descriptive approach (2b), because chararcters are multidimensional.

An example: You may have a (artificial) family dogs (D), which is defined by it's characters to have four legs and to bark and which belongs to the realm of animals. Further on you have three genera white (W), brown (Y) and black (B) dogs. These genera have species white dog (Ww), yellow dog (Y y), fully black dog (B fb) and black dog with white dots (B wd).

Kingdom Animals (heterotroph)

Family D   (four legged, barking)

    Genus W  (white skin)

        W w 

    Genus Y (yellow skin)

        Y y

    Genus B (black skin)

        B fb

        B wd

If taxonomist now will find a new red specimen having 4 legs, being an anmial and barking they will create a new genus (R) and a new species (R r) for it. They will add R r to the genus of R and they will add R to the family of D because it pefectly fits into the descritpion of D being a 4 legged animal barking. The description of D does not get changed this way and therefore stays stable.

If some other taxonomists say that a white dotted dog must belong to the genus W they may move it to genus W. This may change the description of Genus W -> W', it will probably not change the description of Genus B. And it will for sure not change the description of family D or kingdom Animals

This is because the taxa use differnt characters for their description. By using just a subset of all available characters they define a set of living beings that may potentially include much more subtaxa than the ones that already exist. We call the potential taxon.

Of course there are cases where subtaxa use the same characters for circumscription as it's parent taxon. This may lead to an exceptional case where the parent taxon is fully described by it's children and adding a new child without changing the existing taxa will lead to a new parent taxon necessarily. Example: The higher taxon has a character hight which allows balues between 1 m and 2 m and it has two childtaxa, one has the character height 1 m to 1.5 m and the other is from > 1.5 m to 2m. Here the childtaxa by definition fully cover the space of the parent taxon. But still if you divide the first child into two children one having the characteristics of hight 1m - 1.3m and the second >1.3m to 1.5 m you may change the subset of taxa without changing the parent taxon.

The other way round if you change in the dogs example the description of a dog being from being a 4 legged barking animal into a 3-4 legged barking animal because you found a species with 3 legs and you want to include it into the family then this change will either change the child taxa too or you will need to move the 4 legged description down to the child taxa.

Actually this is exactly the way how we work when we use an object oriented programming language like Java. By creating a new class CdmBase we certainly did not change the concept of java.lang.Object. But by changing CdmBase we certainly change the characteristics of each class inheriting from CdmBase.

  1. Information attached to a taxon within in e.g. a checklist is usually not complete. Therefore the taxon in general and by definition is not defined by the information attached to it (though one should avoid contradicitions by adding information that is in contradiction to the taxon's description) but by ist secundum reference.

  2. The secundum reference may provide all the information needed to define the taxon. There are different opinions about what a secundum reference is.

a. A secundum reference references the opinion a taxonomist has at a certain time. Therefore it exists completely only in his head, not in any book or database. (In my opinion this leads to a situation where concepts can't seriously be used in a scientific environment, but anyway it seems to be helpfull when talking about concepts)

a. The information defining the secundum reference can be either found in the secundum (which is some kind of publication) or in the the literature cited by the secundum. The later is the general case for checklists where information available for each taxon is limited.

  1. Having 4) and 3) in mind we should differentiate those taxa that are

a. more or less defined within our database (the database is the secundum or at least is trying to reflect the secundums information)

and those that are

b. defined somewhere else (e.g. in a printed reference about high level taxonomy)

Use Cases

With the assumption (contradictionary what Roger Hyam says) that it is possible and quite usual to reuse higher taxa, we may come to some the following use cases (which I didn't check for completeness with all use cases Ben suggested.

I. Single checklist in one CDM store

This is the easy to handle default use case

II. Multiple Classifications in One CDM Store

There may be different reasons for storing multiple classifications within one CDM store instead of using one CDM store per classification

II.a. Reuse non-taxon objects

You may want to reuse some non-taxon objects like authors, references or even names. In terms of concepts the classifications are completely separate. This is a rare use case (I don't know any example). Pure sharing of non-taxon concepts is better done via global services (GNA, IPNI, BHL, ...).

II.b. Sharing concepts via concept relationships

Like a) but you have taxon concept relationships. A rare but existing use case. E.g. the BfN (German Federal Agency for Nature Conservation) is working this way with a Berlin Model based database.

II.c. Sharing concepts among classifications

Like a) or b) but some concepts are shared among classifications. There are several realistic use cases:

II.c.1. Sharing high level classifications/concepts

Many classifications want to share a common high level classification. This may become real by either attaching one or multiple (E+M) classifications to a taxon of a high level classification or by even merging classifications on higher levels (PESI). The first is a typical use case most taxonomists implicitly do this in their mind but for some reason don't in their database. But for reusing checklists in a different context it may be substantial to also do this in practice.

II.c.2. Sharing low level classifications

Taxonomists may want to either reuse or alternatively show certain subtrees from other classifications within their classification.

As an example, you want to work on Cichorieae but do not have time to handle the two horrific genera Hieracium and Taraxacum on your own. Therefore you add Hieracium sec. E+M to Hieraciinae sec. CichWP6 as a child and do the save with Taraxacum E+M and its parent.[[BR]

Additionally, for Hieracium you may know that there is a more modern approach to handle this taxon and it's child taxa so you add an alternative taxon Hieracium sec. CichWP6 which only overlaps but is not identical with Hieracium sec. E+M and therefore can't be used within the same classification.

There are two different use cases for sharing classifications: a static and a dynamic one.

  • The static one uses a classification as it is at a certain point in time, neglecting newer changes.

  • The dynamic one reflects the all changes in the original and accepts all changes in the original data.

  • A mixture of the static and the dynamic use case. Generally you choose the dynamic use case, but in some cases you want to reject changes.

II.c.3. Sharing concepts on all classification levels

A taxonomist may have a concept and wants to use a CDM store to see how this concept was used in different classifications. This is the "Paris" use case, where a collection manager knows that a specimen has been defined as taxon Aus bus sec. Cl. and wants to know how this taxon has been used in different classifications over time.

III. Views as a subset of the information available in the CDM

There are two different variants of this use case.

a. You may want to restrict access rights for certain users by allowing them only to view or edit certain objects.

b. You may want to publish only a certain part of your data

IV. Scoped classifications

As described by Ben's use case 1.2 and 1.3 users may want to show anloy a certain part of the data for a publication. This may lead to a situation where you will need new objects like parent-child relationships which are not available in the original data. E.g. if you want to leave out subgenuses you may have to create new relations between genuses and species.

Solutions

Regarding the above use cases and having in mind that taxon to some extend needs knowledge about its parent and maybe also its children (1a, 2) I agree that there is a need to remodel the existing model.

So, how can we do this? Ben mentioned above a 3rd dimension classification which is a possiblity but maybe not the best in terms of keeping it all simple. I think a better way is to clearly bind the secundum to a classification by either adding a new attribute classification to ReferenceBase or by saying that the reference of a classification must be the secundum. But I think the best way is to let TaxonomicTree (or better rename it Classification) inherit from ReferenceBase instead of inheriting only from IdentifiableEntity (... and when doing so rename it Classification...).

Taxonomic Tree (= Classification) inherits from ReferenceBase

This way it is easy to retrieve a taxon's parent and it's children by the following algorithm:

getChildren(){
    give me all taxon nodes
    choose the one where node.classification == taxon.secundum
    give me all child nodes
    return all taxa of these childNodes 
}

Retrieving the parent is even easier:

getParent(){
    give me all taxon nodes
    choose the one where node.classification == taxon.secundum
    give me the parent node
    return the nodes taxon
}

This solution avoids redundancy wich you get by having parent-child relationships attached to the taxon and separately in a classification.

Of course it would also be possible to use the above solution of having a M:N relations between taxa which includes a classifiction attribute (but leaving out the constraint that the child doesn't know about the parent - reasons see above).

Actually the existing TaxonNode solution and the M:N solution are quite similar. A taxon node is more or less an M:N relationship as it holds the information about the child taxon and about the classification but instead of refering to the higher taxon it refers to the relationship the higher taxon has to his parent and therefore it makes it easier to

a. traverse through the classification tree

b. hold the constraint that a taxon should belong to each classification only once (though it also can't prevent you from doing so).

Because TaxonNode and M:N are so similar it is easy to transform a TaxonNode into an M:N and also transforming it into an isTaxonomicallyIncluded relationship required for TCS or the TDWG ontology. For the later the other way round may not be unambigous because the TCS allows isIncludedIn relationships without requiring a classification attached. So it is up to the user /developer to define the way how to map TCS data best into CDM for a given dataset.

Finally I think we should remove citation and microCitation from the Classification (!TaxonomicTree) class. It is not needed anymore once a Classification is a reference.

Add a set of citations a Classification is based on

At the same time we should add a set of references a Classification is based on which reflects the cited literature aspect of 4b). The later also may help solving the often discussed problem which literature to show on a reference list page. This page may include all the mentioned references and a set of other references clearly connected to the classification such as references used as secundum references, description element source references, ...

So we finally come to a simple classification class

public class Classification extends ReferenceBase implements ITreeNode {
    Set<ReferenceBase> usedReferences;
    Set<TaxonNode> rootNodes;
}

We could also add some more attributes like a set of missaplied names, set of incerta sedis, set of nomen excludenda. The later I would add to ITreeNode whereas the missaplied name set I do not fully understand. Can you elaborate this a bit more, Ben?

REMARKS:

  • The above solution does not face use case III) the general need for CDM subsets. This use-case needs a more complex solution.

  • It also does not give a complete answer for all use cases defined under II).

For me an open issue is still if we always need to copy all relevant classification information if we want to reuse not only single concepts but a full subtree. Do we have to copy all nodes or can we somehow attache a node to a parent node from another classification? The later is easy to implement when you think only about traversing the tree (I have locally already done it more or less). But if you also want to make the information available via a name search for example we may get problems. How can we in an easy way make sure that, if you search for taxa in one classification, also some certain taxa of another classification are also retrieved? Adding a set of alsoUsedInClassification may be a solution but probably not a nice one.

So here we need further development in future.

Who is allowed to make changes?

The question also came up who is allowed to change a concept? The most appropriate answer is probably "nobody", because a taxon concept is something fixed per definition which is one of the reasons why we should be very careful with publishing data online without versioning. You get an extreme inflation of taxa if you do so.

Anyhow, taking the question from a different perspective we need to ask who should be allowed to change data in the database. I think there is a clear answer to this. The author(s) of the secundum may do so. If such changes are allowed (dynamic approach) and will not lead into a new taxon then the agreement for reusing a taxon within your classification must be that you accept all changes to the taxon because you accept it's authority.

Updated by Andreas Müller almost 2 years ago · 60 revisions