Project

General

Profile

task #6173

Updated by Andreas Kohlbecker almost 6 years ago

A taxonomic classification is required for the algae registry for grouping taxa logically and semantically. These groups often are not equal to the commonly used higher classification. E.g. It would be scientifically correct (according to some scientist) to use the phylum *Heterokontophyta*, whereas for practicals means it would be beneficial to use the classes *Xanthophyceae* or *Bacillariophyceae* which are included into the former phylum. Also under discussion is the option to use vernacular higher classification names. 

 A major obstacle in finding a suitable classification are the different opinions in the phycological community on a valid and acceptable classification. Any choice for a specific classification would cause irritation and rejection in parts of the community. Therefore it would be wise not to have a classification at all. But still is there a need for something like a classification to support the below use cases: 

 ~~The algae registry actually has no need for a real classification except for the purpose of making it easier to find names. The complete reference classification will be provided by the name index which is a separate cdm    instance. The classification for a given name will be retrieved from the index by sending a query to REST service.~~ 

 These considerations lead to the following draft concept: 

 # Use cases and requirements 

 **Use cases:** 

 1. **Higher ranked names as keywords**: Choose a higher taxon name to find all names that are covered by this name independently from the position of this name in a specific classification. (e.g. "*A user wants to find the latest registrations in a specific taxon group.*"). Phycobank will offer a couple of prepared **default classifications** as a backbone to which registered name can be attached.    ==> *Requirement 1.* 
 1. **Higher classification data for each name registered**: For each name registered it must be possible to add and remove the ordered list of taxon names as they occur in the higher classification. Each of this higher classifications must be associated with the according reference for this information. To each name multiple higher classification name lists + reference can be assigned. (See #???? ticket for ui implementation):  
     1. coming from the registered names it must be possible to find the path up to the higher taxa as published in a specific reference. 
     1. A name added to one higher classification must also be removable without breaking classification information associated with another name. Also must the removal not modify any higher default classification (see point 1.). ==> *Requirement 2.* 


 **Requirements & conclusions and    from the use cases:** 

 1. The phycobank quasi classification should therefore be the union of all classifications relevant for the registrations. 
 2. Each name registered must have associations to all the names mentioned in any of its higher classifications. As newly created higher classification names can be reused in by other taxa and users it is necessary to manage the ability to delete a higher ranked name.  
 3. Homotypy relations will only be modeled via the basionym name relationship since phycobank is clearly only focused on nomenclatural information. Expressing basionyms also via taxon relations to the basionym name as synonym would impose a level of taxonomic option we must avoid. 


   


 **And the winner is ... *N1T* under point 6.), see below for details)** 

 ## 1) Higher classification with concept relations 

 We will have one classification to which the taxa belong to. Each taxon will be added to this classification. In the beginning this classification will only consist of the highest ranklevel (phylum, class).  
 All lower ranked taxa will only be added as needed see 2) and 2) below. 

 ~~~ 
 Classification A 

            Heterokontophyta <---+---+---+ 
                                 |     |     | 
                                 |     |     | 
 Classification B                  |     |     | 
                                 |     |     | 
            Brown algea    +-------+     |     | 
                             part of |     | 
            Gold algea     +-----------+     | 
                                         | 
                                         | 
 Classification C                          | 
                                         | 
            Phaeophyceae +---------------+ 
                            classified within 

 ~~~ 

 ## 2) New registration of suprageneric taxa 

 suprageneric taxa are added to the classification when a registration for the according name is created: 

 1. Firstly by the author 
 2. secondly by the data curation  


 ## 3) Species or infra generic taxa 

 These taxa are added to the genus (automatically on base of the uninomial). In case the genus is not yet in the system it will be creates as post registration in progress. The data curation will then validate this new name.  

 ## 4) Higher taxonomy as published by 'TaxonInteractions' 

 The publications also provide a higher taxonomy for the new names. I makes    no sense to create a new separate classification for each case. It is sufficient and much more elegant to add theses taxa as `TaxonInteractions` to the newly registered name. The label for this feature type could be "*Classification as published*"  

 ## 5) Multiple higher classification with TaxonNode parentChild relationships  

 ### 5.a) [**N1TnTN+glue**] One Taxon per name, multiple TaxonNodes, with glue-taxa which are having as sec-reference the classification reference they belong to. 

 This is the idea which was devised after the original idea of modeling the higher 'classification' as taxon graph was mistakenly rejected.  

 ![](v1-1TnTN%2Bglue.png) 

 Adding the secReference to the Taxon entities, which are created for the registered names in this case, is not needed, since the reference is also associated with the TaxonNode via the Classification. Therefore this idea can be simplified by    and we come to the concept named **NnT1TN** 

 ### 5.b) [**N1TnTN**] One Taxon per name, multiple TaxonNodes 

 ![](N1TnNT.png) 

 In this diagram a problem is pointed out which would occur in each graph build on TaxonNode relations. A Classification which is only defined for higher ranks can not be linked in a branch of another classification, since this would require that a TaxonNode can have multiple parents. I am not sure if this theoretical problem exists with any proper classification relevant for phycobank. However if we want to use vernacular names like "Brown algae" this situation would become reality.  

 ### 5.c) [**N1TnTN**] Multiple Taxon per Name, one TaxonNodes each 

 Anreas Müller: "*Hatten wir nicht in der längeren Diskussion darüber, wie wir das mit den verschiedenen Klassifikationen bzw. includedIn Beziehungen handeln beschlossen, includedIn Beziehungen zu erzeugen, damit Suchen der Form gebe mir alle Namen die zu folgendem Familiennamen gehören, durchführen zu können mittels der getIncludedInTaxa Methode oder wie die heißt, die ich für die Roten Listen geschrieben habe. Das würde dann nicht funktionieren, wenn wir keine separaten Taxa anlegen pro Name in einer Klassifikation.*" 

 ![](v3-NnT1TN.png) 

 The **includedIn** taxon relation being used in this idea avoid the problem pointed out in 5.b). 

 Only inter-classification relations which are not implicit by usage of the same name in different taxa need to be modeled as **includedIn** taxon relation. So only the taxa ("a") and (H) need to be connected to the graph this way. This must be done manually by the curation. 

 Disadvantage: if a name is used in multiple taxa and they all have the same includedIn relationship to a higher taxon of another classification multiple includeIn relationships need to be created.  

 Search example (search for name "Mastolgoiales", this name and names found under this name in the hierarchy are marked green): 

 ![](search-5c.png) 

 ## 6) [N1T] Higher taxon-graphs with includedIn relations taxon relationships  

 **This strategy has won the competition** - congratulations!!!  

 Das ursprüngliche Konzept war ein Taxon-Graph, bei dem pro Name jeweils ein Taxon angelegt wird. Die Taxa werden per includedIn TaxonBeziehung miteinander verknüpft. Die Suche nach Namen von höheren Taxa zu niedrigeren Rängen funktioniert hier wunderbar. Was aber verloren geht, ist die Information zu welcher Referenz das Taxon gehört. Die Pfade vom registrierten Namen durch die höheren Rangstufen sind uneindeutig und lassen sich nicht mehr klar einer Referenz zuordnen. Mehrere Sec-Referenzen pro Taxon sind nun mal nicht möglich. Es gibt daher 2 weitere Möglichkeiten: 
 
 1. Ein Taxon pro Name, wobei die SecReferenz immer Phycobank ist aber pro Klassifikation eine Source-Referenz an das Taxon gehängt werden. 
 2. Mehrere Taxa pro Namen wobei die SecReferenz immer der Referenz der Klassifikation entspricht. Würde die top-down Suche mit der listIncludedTaxa Methode in diesem Fall funktionieren?    Ich denke nicht, denn diese berücksichtigt nur die Relationen und bezieht identische Namen der Taxa nicht mit ein. (Note AM: if the concepts are expected to be equivalent you may add an isCongruentTo relationship, this way the search may work again) 
 
 Also bleibt nur Möglichkeit 1. Diese hatten Henning und ich aber auch aus irgend einem Grund ausgeschlossen. Ich kann mich aber nicht mehr erinnern weshalb.  
 Im Moment fällt mir nichts ein was gegen diese Möglichkeit spricht. Darüberhinaus löst diese Variante auch das Problem der TaxonNode Graphen in denen es nicht möglich ist einen Knoten mit mehreren Eltern zu verknüpfen. 

 Note AM: General Problem with 1 is that different concepts are not reflected, some searches may lead to too large results. A solution might be to use >1 taxon per name where differences in concepts are known and relationships to parents and children are easy to define. 

 Note2 AM: Why not using the nomenclatural reference as sec reference for the taxa to indicate that we are only talking about names here, phycobank as secundum is somehow misleading 

 ![](v4-N1T.png) 

 **Outcome of the final discussion:** 

 *The winner is **N1T** under point **6.)*** 

 We (Andreas K, Andreas M & Henning) decided to also create the TaxonNode graph for each standard classification to be imported. Taxa will rather be reused for multiple TaxonNodes than creating one Taxon per TaxonNode.  

 The strategy **N1T** allows to discover names found under a specific higher name in a broad manner. It might turn out that this strategy includes too many unwanted named in some cases. These situations could be relaxed by creating multiple taxa for a single name whereas the source references are distributed to the taxa and the TaxonNodes will also reference only one of these Taxa of course. 

 The search process primarily aims in finding Genera, all species and subspecies which fall under each Genus found are to be displayed independently of the higher classification information associated with the individual names. Practically this will be solved by greating for each TaxonName a *includenIn* relation to the Taxon for the name used in the `uninomialOrGenus` and in the `specificEpithet field`. 
 Ranks like e.g. *Variety* and *Subgenus* can not be associated automatically, so their relation to the Genus or Species respectively must be created manually if this information is available or it it is required in a specific case. It also should be considered to associate e.g. Species to the according Subgenera. 

 We should also consider to **reactivate the formerly rejected the TaxonReationship *taxomicallyInduldedIn*** which has been removed in the past. Using this relation type would be semantically more correct than using *includedIn* which rather expresses that a group of taxa    is included into a bigger group of taxa of all taxa of the same rank.     



Back