Concept for a useful algae registry taxon classification
A taxonomic classification is required for the algae registry for grouping taxa logically and semantically. These groups often are not equal to the commonly used higher classification. E.g. It would be scientifically correct (according to some scientist) to use the phylum Heterokontophyta, whereas for practicals means it would be beneficial to use the classes Xanthophyceae or Bacillariophyceae which are included into the former phylum. Also under discussion is the option to use vernacular higher classification names.
A major obstacle in finding a suitable classification are the different opinions in the phycological community on a valid and acceptable classification. Any choice for a specific classification would cause irritation and rejection in parts of the community. Therefore it would be wise not to have a classification at all. But still is there a need for something like a classification to support the below use cases:
The algae registry actually has no need for a real classification except for the purpose of making it easier to find names. The complete reference classification will be provided by the name index which is a separate cdm instance. The classification for a given name will be retrieved from the index by sending a query to REST service.
These considerations lead to the following draft concept:
Use cases and requirements¶
- Higher ranked names as keywords: Choose a higher taxon name to find all names that are covered by this name independently from the position of this name in a specific classification. (e.g. "A user wants to find the latest registrations in a specific taxon group."). Phycobank will offer a couple of prepared default classifications as a backbone to which registered name can be attached. ==> Requirement 1.
- Higher classification data for each name registered: For each name registered it must be possible to add and remove the ordered list of taxon names as they occur in the higher classification. Each of this higher classifications must be associated with the according reference for this information. To each name multiple higher classification name lists + reference can be assigned. (See #???? ticket for ui implementation):
- coming from the registered names it must be possible to find the path up to the higher taxa as published in a specific reference.
- A name added to one higher classification must also be removable without breaking classification information associated with another name. Also must the removal not modify any higher default classification (see point 1.). ==> Requirement 2.
Requirements & conclusions and from the use cases:
- The phycobank quasi classification should therefore be the union of all classifications relevant for the registrations.
- Each name registered must have associations to all the names mentioned in any of its higher classifications. As newly created higher classification names can be reused in by other taxa and users it is necessary to manage the ability to delete a higher ranked name.
- Homotypy relations will only be modeled via the basionym name relationship since phycobank is clearly only focused on nomenclatural information. Expressing basionyms also via taxon relations to the basionym name as synonym would impose a level of taxonomic option we must avoid.
And the winner is ... N1T under point 6.), see below for details)
1) Higher classification with concept relations¶
We will have one classification to which the taxa belong to. Each taxon will be added to this classification. In the beginning this classification will only consist of the highest ranklevel (phylum, class).
All lower ranked taxa will only be added as needed see 2) and 2) below.
Classification A Heterokontophyta <---+---+---+ | | | | | | Classification B | | | | | | Brown algea +-------+ | | part of | | Gold algea +-----------+ | | | Classification C | | Phaeophyceae +---------------+ classified within
2) New registration of suprageneric taxa¶
suprageneric taxa are added to the classification when a registration for the according name is created:
- Firstly by the author
- secondly by the data curation
3) Species or infra generic taxa¶
These taxa are added to the genus (automatically on base of the uninomial). In case the genus is not yet in the system it will be creates as post registration in progress. The data curation will then validate this new name.
4) Higher taxonomy as published by 'TaxonInteractions'¶
The publications also provide a higher taxonomy for the new names. I makes no sense to create a new separate classification for each case. It is sufficient and much more elegant to add theses taxa as
TaxonInteractions to the newly registered name. The label for this feature type could be "Classification as published"
5) Multiple higher classification with TaxonNode parentChild relationships¶
5.a) [N1TnTN+glue] One Taxon per name, multiple TaxonNodes, with glue-taxa which are having as sec-reference the classification reference they belong to.¶
This is the idea which was devised after the original idea of modeling the higher 'classification' as taxon graph was mistakenly rejected.
Adding the secReference to the Taxon entities, which are created for the registered names in this case, is not needed, since the reference is also associated with the TaxonNode via the Classification. Therefore this idea can be simplified by and we come to the concept named NnT1TN
5.b) [N1TnTN] One Taxon per name, multiple TaxonNodes¶
In this diagram a problem is pointed out which would occur in each graph build on TaxonNode relations. A Classification which is only defined for higher ranks can not be linked in a branch of another classification, since this would require that a TaxonNode can have multiple parents. I am not sure if this theoretical problem exists with any proper classification relevant for phycobank. However if we want to use vernacular names like "Brown algae" this situation would become reality.
5.c) [N1TnTN] Multiple Taxon per Name, one TaxonNodes each¶
Anreas Müller: "Hatten wir nicht in der längeren Diskussion darüber, wie wir das mit den verschiedenen Klassifikationen bzw. includedIn Beziehungen handeln beschlossen, includedIn Beziehungen zu erzeugen, damit Suchen der Form gebe mir alle Namen die zu folgendem Familiennamen gehören, durchführen zu können mittels der getIncludedInTaxa Methode oder wie die heißt, die ich für die Roten Listen geschrieben habe. Das würde dann nicht funktionieren, wenn wir keine separaten Taxa anlegen pro Name in einer Klassifikation."
The includedIn taxon relation being used in this idea avoid the problem pointed out in 5.b).
Only inter-classification relations which are not implicit by usage of the same name in different taxa need to be modeled as includedIn taxon relation. So only the taxa ("a") and (H) need to be connected to the graph this way. This must be done manually by the curation.
Disadvantage: if a name is used in multiple taxa and they all have the same includedIn relationship to a higher taxon of another classification multiple includeIn relationships need to be created.
Search example (search for name "Mastolgoiales", this name and names found under this name in the hierarchy are marked green):
6) [N1T] Higher taxon-graphs with includedIn relations taxon relationships¶
This strategy has won the competition - congratulations!!!
Das ursprüngliche Konzept war ein Taxon-Graph, bei dem pro Name jeweils ein Taxon angelegt wird. Die Taxa werden per includedIn TaxonBeziehung miteinander verknüpft. Die Suche nach Namen von höheren Taxa zu niedrigeren Rängen funktioniert hier wunderbar. Was aber verloren geht, ist die Information zu welcher Referenz das Taxon gehört. Die Pfade vom registrierten Namen durch die höheren Rangstufen sind uneindeutig und lassen sich nicht mehr klar einer Referenz zuordnen. Mehrere Sec-Referenzen pro Taxon sind nun mal nicht möglich. Es gibt daher 2 weitere Möglichkeiten:
- Ein Taxon pro Name, wobei die SecReferenz immer Phycobank ist aber pro Klassifikation eine Source-Referenz an das Taxon gehängt werden.
- Mehrere Taxa pro Namen wobei die SecReferenz immer der Referenz der Klassifikation entspricht. Würde die top-down Suche mit der listIncludedTaxa Methode in diesem Fall funktionieren? Ich denke nicht, denn diese berücksichtigt nur die Relationen und bezieht identische Namen der Taxa nicht mit ein. (Note AM: if the concepts are expected to be equivalent you may add an isCongruentTo relationship, this way the search may work again)
Also bleibt nur Möglichkeit 1. Diese hatten Henning und ich aber auch aus irgend einem Grund ausgeschlossen. Ich kann mich aber nicht mehr erinnern weshalb.
Im Moment fällt mir nichts ein was gegen diese Möglichkeit spricht. Darüberhinaus löst diese Variante auch das Problem der TaxonNode Graphen in denen es nicht möglich ist einen Knoten mit mehreren Eltern zu verknüpfen.
Note AM: General Problem with 1 is that different concepts are not reflected, some searches may lead to too large results. A solution might be to use >1 taxon per name where differences in concepts are known and relationships to parents and children are easy to define.
Note2 AM: Why not using the nomenclatural reference as sec reference for the taxa to indicate that we are only talking about names here, phycobank as secundum is somehow misleading
Outcome of the final discussion:
The winner is **N1T* under point 6.)*
We (Andreas K, Andreas M & Henning) decided to also create the TaxonNode graph for each standard classification to be imported. Taxa will rather be reused for multiple TaxonNodes than creating one Taxon per TaxonNode.
The strategy N1T allows to discover names found under a specific higher name in a broad manner. It might turn out that this strategy includes too many unwanted names in some cases. These situations could be relaxed by creating multiple taxa for a single name whereas the source references are distributed to the taxa and the TaxonNodes will also reference only one of these taxa of course.
The search process primarily aims in finding genera, all species and subspecies which fall under each genus found are to be displayed independently of the higher classification information associated with the individual names. Practically this will be solved by creating for each TaxonName an includenIn relation to the Taxon for the name used in the
uninomialOrGenus and in the
Ranks like e.g. variety and subgenus can not be associated automatically, so their relation to the genus or species respectively must be created manually if this information is available or if it is required in a specific case. It also should be considered to associate e.g. species to the according subgenera.
We should also consider to reactivate the formerly rejected TaxonReationship *taxomicallyInduldedIn* which has been removed in the past. Using this relation type would be semantically more correct than using includedIn which rather expresses that a group of taxa is included into a bigger group of taxa of all taxa of the same rank.
#5 Updated by Wolf-Henning Kusber almost 3 years ago
PhycoBank can not use classes as published in ICN, ING, NCU because this system differs very much from current classifications. PhycoBank checked two modern resources and provides a compilation including classes and divisions most likely accepted by the majority of PhycoBank users as search criteria. PhycoBank do not want to establish an own classification system as such.
Resources consulted: Ruggiero M.A., Gordon D.P., Orrell T.M., Bailly N., Bourgoin T., Brusca R.C., et al. 2015: Correction: A Higher Level Classification of All Living Organisms. PLoS ONE 10(6): e0130114. doi:10.1371/journal.
pone.0130114Syllabus of Plant Families 13th edition 2012, 2015, 2017 (ed. Frey, W.) – Borntraeger, Stuttgart).
Main difference are the alternative usage of Bacillariophyta and its Classed as agreed by the majority of diatomists according to Cox in Frey (2015) instead of using subclasses because of consistency within the Ochrophyta.
#7 Updated by Andreas Kohlbecker over 2 years ago
ich habe das gerade mit AK nochmal telefonisch besprochen. Wir haben die entsprechenden Sammel-Taxonknoten mit „unplaced“ getagged. Dadurch erscheinen sie im Editor jetzt am Ende der Liste (jedenfalls wenn man nach Rank + alphabetisch sortiert, ab dem nächsten Release auch bei rein alphabetischer Sortierung)
Zudem habe ich Camel Case aus den Namen entfernt und den Sammel Taxa den jeweiligen Superrang gegeben, also z.B. Order Names wurde zu Superorder. No group assigned habe ich auf „unranked“ gelassen.
Den nicht mehr genutzen Rang „Families incertis sedis“ habe ich gelöscht.
Man könnte jetzt die einzelnen Pseudotaxa innerhalb der Gruppen auch noch mit unplaced taggen. Da sie sich aber alle in einem getaggten Container befinden, ist das nicht unbedingt notwendig, außer man will das unplaced/incertis sedis später auf einer Taxonseite mal explizit angezeigt bekommen (bislang nicht implementiert im Portal).
#20 Updated by Andreas Kohlbecker about 2 years ago
- File Phycobank-higher-classification-v1-N1TnTN+glue.odg added
- File Phycobank-higher-classification-v2-NnT1TN.odg added
- File Phycobank-higher-classification-v3-N1TnTN.odg added
#76 Updated by Andreas Kohlbecker almost 2 years ago
Andreas Müller wrote:
How should species (and other names below genus) be related to genus taxa. There might be multiple genus taxa per name. Is this relationship automatically created or are they only related by special search algorithms?
This relation between genus and subordinate names needs to be created on the fly when creating new Species and also when changing the genus for a species. The implementation of this mechanism is handled in #7648. All the details, which are actually found in this issue description are copied as extract to the issue #7648. The summary is that there should only be one taxon per name.