Project

General

Profile

task #7976

[PESI][ERMS] Erms taxon import

Added by Andreas Müller 2 months ago. Updated 2 months ago.

Status:
New
Priority:
New
Category:
cdmadapter
Target version:
Start date:
01/10/2019
Due date:
% Done:

0%

Severity:
normal
Tags:

Description

Subticket for #1444:

The taxon import in ERMS is the most complex one therefore we open a new ticket for it.

Kingdoms:

1: - (Super)domain
2: Animalia
3: Plantae
4: Fungi
5: Protozoa
6: Bacteria
7: Chromista ??
8: ??
10: ??
12: ??
14: ??

are 8-14 relevant? they do not appear in any tu.tu_sp tree as root

The Taxon table tu:

id: the id
tsn: ? (currently exported as extension)
tu_name: the epithet or uninomial
tu_displayname: full name without author (html?)
tu_authority: author and year (without et. al.?)
tu_parent: links to the parent, BUT for synonyms the parent is considered the name giving parent, e.g. for a species it is usually the genus given in the name and not the parent of the accepted name, for subgenus it is usually the genus; however, it is NOT always the case that the parent is name giving, there are also parent with no obvious relationship and therefore it is not clear what the parent relationship really means; tu_parent is always the last id in tu.tu_sp and therefore it is essential for creating the tree
tu_sp: the higher classification as tree index, BUT it is not the correct taxonomic classification as synonyms link to their nomenclatural parent and not to their accepted taxon

Associated revisions

Revision cd79d676 (diff)
Added by Andreas Müller about 1 month ago

ref #1444 ref #7976 some changes to ERMS import

History

#2 Updated by Andreas Müller 2 months ago

  • Description updated (diff)

#3 Updated by Andreas Müller 2 months ago

  • Description updated (diff)

Parents are generally nomenclatural parents not taxonomical parents. There are different parent types (this is not really important as we will use parents only for nomenclatural reasons).

Subgenus with acc is same name (296)

SELECT acc.id, acc.tu_rank, acc.tu_displayname ,acc.tu_sp, acc.tu_parent, acc.tu_status,syn.id, 
       syn.tu_name synName, syn.tu_rank synRank, syn.tu_displayname synDisplay, syn.tu_unacceptreason, syn.tu_sp synTree, syn.tu_parent synParent, syn.tu_status synStatus
FROM tu syn INNER JOIN tu acc ON acc.id = syn.tu_acctaxon
WHERE syn.tu_status = 2 AND syn.tu_sp <> acc.tu_sp 
  AND syn.tu_rank < 220 and syn.tu_displayname like '%(%' AND syn.tu_name = acc.tu_name
ORDER BY acc.tu_sp

Subgenus with accepted is not same genus

syn.tu_status = 2 AND syn.tu_sp <> acc.tu_sp AND syn.tu_rank < 220 and syn.tu_displayname like '%(%' AND syn.tu_name <> acc.tu_name

etc.

#4 Updated by Andreas Müller 2 months ago

Errors in Data:

  • Name inheritance:

Some (63) children differ in first name part from parent. Sometimes this is simply a wrong whitepace, but more often it is a misspelling:

   SELECT syn.id, synParent.tu_name parentEpi, synParent.tu_displayname parentDisplay, trim(replace(replace(replace(left (syn.tu_displayname, len(syn.tu_displayname)- len(syn.tu_name)), ' var.', ''), ' f.', ''), ' subsp.', '')) shorted,
       syn.tu_name synName, syn.tu_rank synRank, syn.tu_displayname synDisplay, syn.tu_unacceptreason, right(syn.tu_sp, 30) synTree, syn.tu_parent synParent,    syn.tu_status synStatus
   FROM tu syn 
     INNER JOIN tu synParent ON synParent.id = syn.tu_parent
   WHERE syn.tu_rank >= 220 AND synParent.tu_displayname <> trim(replace(replace(replace(left (syn.tu_displayname, len(syn.tu_displayname)- len(syn.tu_name)), ' var.', ''), ' f.', ''), ' subsp.', ''))

Subgenus seems not to have this problem and for genus and above it is not relevant anyway:

SELECT syn.id, synParent.tu_name parentEpi, synParent.tu_displayname parentDisplay, trim(left (replace(replace(syn.tu_displayname, ' (', ''), ')', ''), len(syn.tu_displayname)- len(syn.tu_name)-3)) shorted,
       syn.tu_name childName, syn.tu_rank childRank, syn.tu_displayname childDisplay, syn.tu_unacceptreason, right(syn.tu_sp, 30) synTree, syn.tu_parent childParent, syn.tu_status childStatus
FROM tu syn 
  INNER JOIN tu synParent ON synParent.id = syn.tu_parent
WHERE syn.tu_rank <220 AND syn.tu_rank >180 AND synParent.tu_displayname <> trim(left (replace(replace(syn.tu_displayname, ' (', ''), ')', ''), len(syn.tu_displayname)- len(syn.tu_name)-3))
  • there is one case, where a nomen nudum is the accepted taxon for an other taxon
SELECT  acc.tu_status, acc.tu_displayname acc, syn.tu_acctaxon, syn.id , syn.tu_displayname syn
FROM    tu acc INNER JOIN
               tu AS syn ON acc.id = syn.tu_acctaxon
WHERE     (acc.tu_status = 3)  AND syn.tu_acctaxon <> syn.id 
  • there are 3317 cases where the tu_acctaxon has not a status "accepted" (2213 are self-referencing, 1104 are really different records, none of the "accepted" link again to another record, though there seem to be such records in WORMS, e.g. http://marinespecies.org/aphia.php?p=taxdetails&id=210439)
SELECT  acc.id,   acc.tu_status AS accStatus, acc.tu_displayname AS accName, 
            syn.id, syn.tu_status AS synStatus, syn.tu_displayname AS synName
FROM    tu acc INNER JOIN
            tu AS syn ON acc.id = syn.tu_acctaxon
WHERE     (acc.tu_status <> 1) -- AND  syn.id = acc.id --  AND acc.id <> acc.tu_acctaxon
ORDER BY accStatus, synStatus 
  • 289 cases where parent taxon is not accepted => probably not critical assuming that the parent child relationship is not a taxonomical one

  • 610 taxa being alternative names but accepted taxon has not status accepted. Especially critical for both having status 5 (alternative name)

SELECT  acc.id, acc.tu_status AS accStatus, acc.tu_displayname AS accName, 
            syn.id, syn.tu_status AS synStatus, syn.tu_displayname AS synName
FROM    tu acc INNER JOIN
            tu AS syn ON acc.id = syn.tu_acctaxon
WHERE     (acc.tu_status <> 1) AND (syn.tu_status IN (1,5,9,10))  AND  syn.id <> acc.id -- AND acc.id <> acc.tu_acctaxon
ORDER BY accStatus, synStatus 

#5 Updated by Andreas Müller 2 months ago

  • Description updated (diff)

#6 Updated by Andreas Müller 2 months ago

699 accepted taxa (acctaxon = id) do have a parent which is not accepted. 2/3 of the parents has status synonym, 1/3 has status alternate representation.
This is CRITICAL as these children are difficult to handle.

SELECT  child.tu_status childStatus, child.tu_displayname child, substring(child.tu_sp, 2,1) kingdom, child.tu_rank childRank, child.id childId, child.tu_acctaxon childAccId,
 par.tu_status parStatus, par.id, par.tu_name, par.tu_displayname, par.tu_acctaxon, par.tu_rank, accPar.tu_displayname accParent
FROM  tu child INNER JOIN tu par ON par.id = child.tu_parent INNER JOIN tu accPar ON par.tu_acctaxon = accPar.id
WHERE child.tu_acctaxon = child.id AND par.tu_acctaxon <> par.id -- AND par.tu_status = 2
ORDER BY  kingdom, child.tu_rank, par.tu_status

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)