Project

General

Profile

task #6173

Updated by Andreas Müller almost 6 years ago

A taxonomic classification is required for the algae registry for grouping taxa logically and semantically. These groups often are not equal to the commonly used higher classification. E.g. It would be scientifically correct (according to some scientist) to use the phylum *Heterokontophyta*, whereas for practicals means it would be beneficial to use the classes *Xanthophyceae* or *Bacillariophyceae* which are included into the former phylum. Also under discussion is the option to use vernacular higher classification names. 

 A major obstacle in finding a suitable classification are the different opinions in the phycological community on a valid and acceptable classification. Any choice for a specific classification would cause irritation and rejection in parts of the community. Therefore it would be wise not to have a classification at all. But still is there a need for something like a classification to support the below use cases: 

 ~~The algae registry actually has no need for a real classification except for the purpose of making it easier to find names. The complete reference classification will be provided by the name index which is a separate cdm    instance. The classification for a given name will be retrieved from the index by sending a query to REST service.~~ 

 These considerations lead to the following draft concept: 

 # Use cases and requirements 

 **Use cases:** 

 1. **Higher ranked names as keywords**: Choose a higher taxon name to find all names that are covered by this name independently from the position of this name in a specific classification. (e.g. "*A user wants to find the latest registrations in a specific taxon group.*"). Phycobank will offer a couple of prepared **default classifications** as a backbone to which registered name can be attached.    ==> *Requirement 1.* 
 1. **Higher classification data for each name registered**: For each name registered it must be possible to add and remove the ordered list of taxon names as as they occur in the higher classification. Each of this higher classifications must be associated with the according reference for this information. To each name multiple higher classification name lists + reference can be assigned. (See #???? ticket for ui implementation):  
     1. coming from the registered names it must be possible to find the path up to the higher taxa as published in a specific reference. 
     1. A name added to one higher classification must also be removable without breaking classification information associated with another name. Also must the removal not modify any higher default classification (see point 1.). ==> *Requirement 2.* 


 **Requirements & conclusions and    from the use cases:** 

 1. The phycobank quasi classification should therefore be the union of all classifications relevant for the registrations. 
 2. Each name registered must have associations to all the names mentioned in any of its higher classifications. As newly created higher classification names can be reused in by other taxa and users it is necessary to manage the ability to delete a higher ranked name.   

 ## 1) Higher classification with concept relations 

 We will have one classification to which the taxa belong to. Each taxon will be added to this classification. In the beginning this classification will only consist of the highest ranklevel (phylum, class).  
 All lower ranked taxa will only be added as needed see 2) and 2) below. 

 ~~~ 
 Classification A 

            Heterokontophyta <---+---+---+ 
                                 |     |     | 
                                 |     |     | 
 Classification B                  |     |     | 
                                 |     |     | 
            Brown algea    +-------+     |     | 
                             part of |     | 
            Gold algea     +-----------+     | 
                                         | 
                                         | 
 Classification C                          | 
                                         | 
            Phaeophyceae +---------------+ 
                            classified within 

 ~~~ 

 ## 2) New registration of suprageneric taxa 

 suprageneric taxa are added to the classification when a registration for the according name is created: 

 1. Firstly by the author 
 2. secondly by the data curation  


 ## 3) Species or infra generic taxa 

 These taxa are added to the genus (automatically on base of the uninomial). In case the genus is not yet in the system it will be creates as post registration in progress. The data curation will then validate this new name.  

 ## 4) Higher taxonomy as published by 'TaxonInteractions' 

 The publications also provide a higher taxonomy for the new names. I makes    no sense to create a new separate classification for each case. It is sufficient and much more elegant to add theses taxa as `TaxonInteractions` to the newly registered name. The label for this feature type could be "*Classification as published*"  

 ## 5) Multiple higher classification with TaxonNode parentChild relationships  

 ### 5.a) [**N1TnTN+glue**] One Taxon per name, multiple TaxonNodes, with glue-taxa which are having as sec-reference the classification reference they belong to. 

 This is the idea which was devised after the original idea of modeling the higher 'classification' as taxon graph was mistakenly rejected.  

 ![](v1-1TnTN%2Bglue.png) 

 Adding the secReference to the Taxon entities, which are created for the registered names in this case, is not needed, since the reference is also associated with the TaxonNode via the Classification. Therefore this idea can be simplified by    and we come to the concept named **NnT1TN** 

 ### 5.b) [**N1TnTN**] One Taxon per name, multiple TaxonNodes 

 ![](N1TnNT.png) 

 In this diagram a problem is pointed out which would occur in each graph build on TaxonNode relations. A Classification which is only defined for higher ranks can not be linked in a branch of another classification, since this would require that a TaxonNode can have multiple parents. I am not sure if this theoretical problem exists with any proper classification relevant for phycobank. However if we want to use vernacular names like "Brown algae" this situation would become reality.  

 ### 5.c) [**N1TnTN**] Multiple Taxon per Name, one TaxonNodes each 

 Anreas Müller: "*Hatten wir nicht in der längeren Diskussion darüber, wie wir das mit den verschiedenen Klassifikationen bzw. includedIn Beziehungen handeln beschlossen, includedIn Beziehungen zu erzeugen, damit Suchen der Form gebe mir alle Namen die zu folgendem Familiennamen gehören, durchführen zu können mittels der getIncludedInTaxa Methode oder wie die heißt, die ich für die Roten Listen geschrieben habe. Das würde dann nicht funktionieren, wenn wir keine separaten Taxa anlegen pro Name in einer Klassifikation.*" 

 ![](v3-NnT1TN.png) 

 The **includedIn** taxon relation being used in this idea avoid the problem pointed out in 5.b). 

 Only inter-classification relations which are not implicit by usage of the same name in different taxa need to be modeled as **includedIn** taxon relation. So the only the taxa ("a") and (H) need to be connected to the graph this way. This must be done manually by the curation. 

 ## 6) [N1T] Higher taxon-graphs with includedIn relations taxon relationships  

 Das ursprüngliche Konzept war ein Taxon-Graph, bei dem pro Name jeweils ein Taxon angelegt wird. Die Taxa werden per includedIn TaxonBeziehung miteinander verknüpft. Die Suche nach Namen von höheren Taxa zu niedrigeren Rängen funktioniert hier wunderbar. Was aber verloren geht, ist die Information zu welcher Referenz das Taxon gehört. Die Pfade vom registrierten Namen durch die höheren Rangstufen sind uneindeutig und lassen sich nicht mehr klar einer Referenz zuordnen. Mehrere Sec-Referenzen pro Taxon sind nun mal nicht möglich. Es gibt daher 2 weitere Möglichkeiten: 
 
 1. Ein Taxon pro Name, wobei die SecReferenz immer Phycobank ist aber pro Klassifikation eine Source-Referenz an das Taxon gehängt werden. 
 2. Mehrere Taxa pro Namen wobei die SecReferenz immer der Referenz der Klassifikation entspricht. Würde die top-down Suche mit der listIncludedTaxa Methode in diesem Fall funktionieren?    Ich denke nicht, denn diese berücksichtigt nur die Relationen und bezieht identische Namen der Taxa nicht mit ein. 
 
 Also bleibt nur Möglichkeit 1. Diese hatten Henning und ich aber auch aus irgend einem Grund ausgeschlossen. Ich kann mich aber nicht mehr erinnern weshalb.  
 Im Moment fällt mir nichts ein was gegen diese Möglichkeit spricht. Darüberhinaus löst diese Variante auch das Problem der TaxonNode Graphen in denen es nicht möglich ist einen Knoten mit mehreren Eltern zu verknüpfen. 

 ![](v4-N1T.png) 



Back