task #8792
Updated by Andreas Müller over 4 years ago
The functioning of the ERMS import (and PESI export) is explained at: [[Erms_mapping]] ## General information Old databases are available at BGBM-PESISQL\SQLExpress. Many issues have been handled in #1444. However as the ERMS-CDM is complex in often difficult there are many issues to take care for during upcoming imports. ## Data check before import * Check that the rule still is true that synonyms are directly linked to their accepted taxon and not via recursion (at least for field tu_accfinal (which has only less NULL values then tu_accfinal but never differs from tu_acctaxon in value) ~~~ sql SELECT * FROM [tu] INNER JOIN tu acc ON tu.tu_acctaxon = acc.id WHERE tu.tu_accfinal <> tu.id AND acc.id <> acc.tu_accfinal ~~~ * tu_acctaxon should not have relevant information ~~~ sql SELECT * FROM tu WHERE tu_acctaxon is NOT NULL AND tu_accfinal IS NULL OR tu_acctaxon <> tu_accfinal ~~~ * Ranks kingdom is identified by first number in tu_sp. Only the kingdoms 2-7 should are supported, others should not exist and the code may not work ~~~ sql SELECT * FROM tu WHERE tu.tu_sp not like '#2#%' AND tu.tu_sp not like '#3#%' AND tu.tu_sp not like '#4#%' AND tu.tu_sp not like '#5#%' AND tu.tu_sp not like '#6#%' AND tu.tu_sp not like '#7#%' ~~~ ## open issues * Misapplications Currently misapplications are identified by a UNION statement in ErmsTaxonImport.getAcceptedTaxaKeys(). It searches for misidentification patterns in tu.tu_unacceptreason . We also tried to evaluate the tu_authority field but searching for "sensu", "non", "auct". But realized that all these patterns are not used specific enough in ERMS. "Auct" is often simply used for misspellings, "sensu" is used for unassessed taxa, genus transfer, sensu in the meaning of sec and others or simply unclear. "Non" is generally to unspecific as it can be used for homonyms, too. So only the above query is possible but discussion with VLIZ is needed on data cleaning or standardization. Maybe even an own tu_status for misapplications is possible. * Multiple open issues to be hanled by or discussed with VLIZ, mentioned in email to VLIZ 2019-12-23 * the status of taxa with status "alternate representation" should be discussed with VLIZ (and mayb ERS) ## Data cleaning: * tu_accfinal IS NULL ~~~ sql SELECT * FROM tu WHERE tu_accfinal IS NULL ~~~ should be checked. VLIZ should fix these (n=479). Currently they are handled as accepted taxa. * tu_status There should be no synonyms with tu_status == 1 ~~~ sql SELECT * FROM [tu] INNER JOIN tu acc ON tu.tu_acctaxon = acc.id WHERE tu.tu_acctaxon <> tu.id AND tu.tu_status = 1 ~~~ There should be no accepted taxa with status 2,3,5 (unaccepted, nomen nudem, alternate representation) ~~~ sql SELECT * FROM [tu] INNER JOIN tu acc ON tu.tu_acctaxon = acc.id WHERE tu.tu_acctaxon = tu.id AND tu.tu_status NOT IN (1, 7, 10, 8, 9, 6) ORDER BY tu.tu_status ~~~