Project

General

Profile

Actions

Cdm Classification » History » Revision 22

« Previous | Revision 22/60 (diff) | Next »
Andreas Müller, 04/02/2009 04:59 PM


Taxonomic View (Taxonomic Classification)

This page is thought to explain the idea of a taxonomic view in the CDM

Situation In CDM v1.0

In the version 1.0 of the CDM taxonomic concepts were related to each other by the use of Taxon- and SynonymRelationships. The classification of taxa could be expressed by using TaxonRelationships of type "is taxonomically included in". The TaxonRelationship of this type combined 2 taxa of type Taxon with one of them being the parent and one being the child. Additional information like a reference (who says it is a child) could also be stored.

Some problems arised by handling taxonomic classifications this way. For example it was impossible to define the whole classification tree and especially the root taxon (or better the root taxa, explanation see below) for this tree (forest). In case of each taxon having a maximum of one parent it was possible in theory as you could traverse the whole graph of taxa and determine the root elements but for performance reason this is a deprecated way to do it and also it has the mentioned limitations.

As a work-around several methods have been implemented to retrieve the root of a taxonomic tree/classification by using queries like "Give me all taxa that do not have any parent". If you wanted to restrict your search on a special taxonomic view, you could additionally pass the sec reference of the root taxon to the method assuming that the sec reference represents more or less the taxonomic view.

This method implies that a taxonomic classification more or less uses taxa that all use the same reference. This may be the preferred way of handling taxa in the pure way (a taxon name mentioned in a taxonomic classification is something slightly different compared to the same taxon name mentioned somewhere else and therefore a new taxon should be created). However often users want to go another way and for example "reuse" taxa. This leads to a situation where the same taxon can be used in different taxanomic views. Therefore the taxa of different views may have very different sec references and may also have multiple parents.

Also on the programmatic side there was a serious limitation for this work-around solution as for performance reasons the CDM Library stored the parent of a taxon in a cache field which does not work with multiple parents.

Solution for CDM v2.0

To overcome the problems mentioned above the following solution that separates classification from concept data is proposed for CDM v2.0.

  1. Creating a new class TaxonomicView that represents one classification / taxonomic view. A taxonomic view may consist of several distinct trees as not all the parts of the classification may be known yet. Therefore there may be multiple root nodes in a view or generally spoken a taxonomic view rather represents a forest than a tree despite the fact that the abstract idea of a taxonomic view is to represent represent exactly one tree that contains all included taxa and there relationship.

  2. Creating a new class TaxonNode that represents a taxon within its classification and also knows about its unique parent (the parent within the respective classification)

  3. Deleting the taxon relationship type "is taxonomically included in" from the list of TaxonRelationshipTypes.

Additional changes for the CdmLibrary are

  1. Separating business logic that handles classification logic in the Taxon class and moving it mainly to the TaxonNode class.

  2. Creating new service layer methods to retrieve taxonomic views and to retrieve roots

The resulting model (except for operations) may look like the following

!taxonomicView.png!

Open Issues

Naming

  • Should we call the class that represents the classification "!TaxonomicView", "!TaxonomicClassification" or just "Classification"

  • How to call the reference attribute that is part of the TaxonNode and that stands for the reference that describes the parent child relationship. Or maybe more general: do we need such a reference at all or must this always be the reference of the taxonomic view itself?

Subclassing

Merging of taxonomic views

Sometimes it may be needed to merge 2 taxonomic views. This may be the case when

  • It shows that the 2 taxonomic views belong to the same general view but represent different and distinct parts of it. This case is easy to implement by changing the taxonomic view information of all taxon nodes of the first view to the second and accordingly updating the list of nodes in the second view. The first view will not be deleted (as it may be identifiable) but gets an attribute TaxonomicView mergedInto that points to the second view.

  • It shows that 2 views are not distinct but differ in a subtree part.

E.g. there may be the Euro+Med view that includes botanical taxa and there may be some special view from Taraxacum experts ("Taraxacum view") that takes over the E+M classification but may differ in the classification for Taraxacum.

The +greedy solution+ for this use case may be that we create a new taxonomic view ("Taraxacum view"), copy all nodes from Euro+Med, and delete all subnodes of the node representing Taraxacum.

A more +sophisticated solution+ may be to create a new "Taraxacum view" that includes only Taraxacum and it's subnodes and to add to Taraxacum (1) the information what the parent node from the "Euro+Med" view is and (2) that it replaces the Taraxacum node of "Euro+Med" within the alternative "Taraxacum view". (3) In the Taraxacum view we only have to store the information that the Taraxacum node is the node that connects this view to the more general "Euro+Med" view.

The latter can be easily achieved by adding one attribute to the TaxonNode class that stands for the node this node replaces in some other view and by adding an attribute that points to this connection node. All other information is immanent and can easily be retrieved calling some methods on these objects (e.g. the information what the higher (more general) view of this view is can be retrieved by using the connection node, from there getting the node it replaces and the view of this replacement node then is the higher view of this view.

Use of synonyms

Synonyms in general are handled on the concept level not on the classification level. Anyway some users expressed there need to use a concept defined somewhere but using a synonym name within the taxonomic view they handle.

There are different possibilities to handle this situation:

  • The pragmatic way: A TaxonNode gets an additional attribute called "nameToBeUsed" (type: TaxonNameBase) that represents the synonym name that you want to use in your classification. This way is not a proper use of the concept idea because a concept by definition is a pair of (!TaxonName, SecundumReference) so saying that a node links to a Taxon and than using another name sounds contradictionary.

  • The pragmatic way 2: An alternative to the pragamtic way may be to store the Synonym you want to refer to instead of the TaxonNameBase. This more or less restricts the selection of names you can choose from to those really used in the secundum reference.

  • The real synonym way: You allow synonyms to be stored as the concepts of a taxon name (the taxon attribute is of type TaxonBase instead of type Taxon). To retrieve the accepted you may call synonym.getAcceptedTaxon().

  • The pure way: You create a new Taxon that consists of the taxonomic name (synonym name) you want to use and that has a concept relationship of type "is congruent to" to the concept you want to refere to. The secundum reference of the new Taxon must then be your view (or better the reference that represents your view). To be discussed: "What is the best reference to use here?".

  • The explicit congruency way: This is an alternative to the pure way, where you create a new Taxon as described in the pure way but you explicitly define the concept that is meant to be the congruent one by storing it as such in the TaxonNode, whereas the pure way stores it only as concept relationship between 2 taxa. The drawback of the latter is that a taxon can have multiple congruence relationships to other taxa so by storing only the taxon information in the TaxonNode may lead to a situation where you do not know exactly which congruent taxon the newly created taxon really wants to relate to. Maybe you can say that the type of relationship is more then one that expresses congruence but rather equalness. In this context equalness means that the description of the taxon is stored at one and the same place because the reference of the new taxon just links to the old taxon and does not add any additional information to the taxon(except for the parent-child relationships). The idea of equalness in general also holds for synonyms that have the same sec reference as an accepted taxon.

On the other hand by explicitly storing the congruent taxon explicitly we start to store concept relation information at 2 different places (!TaxonRelationship and TaxonNode) which leads to the known problems you get with redundant information.

  • The easy way: You leave it to the user how to handle such a use case. (This way you may force the user to think about the sense or non-sense of using taxon concepts in this way :-) )

Hybrid taxa

A taxonomic classification is not always a pure tree but may include hybrids that have 2 "parents". A solution for modelling this special case might be to have a separate class that inherits from TaxonNode called e.g. HybridTaxonNode. It knows how to handle a pair of "parents" being not really parents as they do not necessarily have a higher rank but rather express some phylogenetic information. As relations between hybrids and their "parents" are substantially different to a normal parent child relationship the TaxonNode may be extended by a list of hybrid children (which is empty in most cases).

Updated by Andreas Müller about 15 years ago · 22 revisions