Project

General

Profile

task #7420

Import for higher taxon graph for phycobank

Added by Andreas Kohlbecker about 1 year ago. Updated 7 months ago.

Status:
Closed
Priority:
Highest
Category:
cdmadapter
Target version:
-
Start date:
05/15/2018
Due date:
% Done:

100%

Severity:
major
Tags:

Description

The higher taxon graph as discussed in #6173 will be imported via spreadsheets whereas we will use the normal implicit import format. The details of the format which will be defined by issue #7419

Taxa will be related as taxon relationships as well as classifications. The later is only for inner handling, not for showing or searching taxa in data portal via classifications.

Secundum for taxa should be nom. ref. => nominal taxa


The resulting taxonGraph is visualized at: http://api.cybertaxonomy.org/taxonGraph/

Algen_Syllabus_NormalImplied_Test.xlsx (13.4 KB) Andreas Müller, 08/09/2018 10:45 AM

Algen_Syllabus_NormalImplied_Worms_Test.xlsx - Test data including WoRMS data (15.2 KB) Andreas Müller, 08/10/2018 02:16 PM


Related issues

Related to AlgenRegistrierung - task #6173: Concept for a useful algae registry taxon classification In Progress 11/01/2016
Related to Edit - task #6137: Urgent imports In Progress 10/19/2016
Blocked by Edit - feature request #7419: Provide example NormalImplicit import file for phycobank higher taxon graph imports Closed 05/15/2018
Copied to AlgenRegistrierung - task #7808: Further name duplicates New 09/11/2018
Copied to Edit - task #7811: Import higher classifications Resolved 10/08/2018

Associated revisions

Revision cc9428e1 (diff)
Added by Andreas Müller 10 months ago

ref #7420 first version of phycobank higher classification import

Revision 2b1e32ea (diff)
Added by Andreas Müller 8 months ago

ref #7420 add config parameter to getWorksheetname for Excel import

Revision f36c41f1 (diff)
Added by Andreas Müller 8 months ago

ref #7420 adding phycobank as source

Revision 9eca8b4a (diff)
Added by Andreas Müller 8 months ago

ref #7420 define WorksheetName as config parameter (needed for Phycobank import)

Revision 781b44c5 (diff)
Added by Andreas Müller 8 months ago

ref #7420 updates to phycobank import according to requirements

Revision ae79d763 (diff)
Added by Andreas Müller 7 months ago

ref #7420 latest changes to PhycobankActivator

Revision 21e704da (diff)
Added by Andreas Müller 3 months ago

ref #7420 last changes to Phycobank higher classification import

History

#1 Updated by Andreas Kohlbecker about 1 year ago

  • Blocked by feature request #7419: Provide example NormalImplicit import file for phycobank higher taxon graph imports added

#2 Updated by Andreas Kohlbecker about 1 year ago

  • Related to task #6173: Concept for a useful algae registry taxon classification added

#3 Updated by Andreas Müller about 1 year ago

  • Subject changed from Adapt NormalImplecit import to import higher taxon graph phycobank to Import for higher taxon graph for phycobank
  • Target version changed from Unassigned CDM tickets to Release 5.1

#4 Updated by Andreas Müller about 1 year ago

#5 Updated by Andreas Müller about 1 year ago

  • Description updated (diff)

#6 Updated by Andreas Müller 11 months ago

  • Target version changed from Release 5.1 to Release 5.2

#7 Updated by Andreas Kohlbecker 10 months ago

  • File Algen_Syllabus_NormalImplied_Test.xls added

Hi, here comes the spread sheet with the first data for the import: attachment:Algen_Syllabus_NormalImplied_Test.xls

das Excel-File „Algen_Syllabus_NormalImplied_Test“ enthält das alte Blatt „NormalImplied.txt“ mit Gegenüberstellung von Syllabus und WoRMS.

Auf Basis der bisherigen Besprechungen habe ich für die weitere Diskussion ein neues Blatt für den Syllabus angelegt „HigherRanksEnfwurfNeu“

Andreas M. bat darum, alle höheren Ränge, die bekannt sind explizit zu benennen. Fertig, alle gelb unterlegten Rangstufen sind nicht vorhanden oder werden im Syllabus nicht genutzt.
Gelb unterlegte Ränge, die leer sind, ersetzten vielfach eine Bezeichnung wie „incertis sedis“.

Fragen zu den Gattungsnamen:
1. Einige haben wir schon als Namen im System, einige als registrierte Gattungen, andere haben wir noch nicht im System, benötigen sie aber als Namen ggf. als registrierte Namen, Vorgehen?
2. Andreas M. fragte nach nomenklatorischen Autoren, wegen eventueller Homonyme
3. Brauchen wir eine Info zum Status (valid, invalid, illeg.)? für die Gattungen? (ich würde nur sagen, wenn sie registriert werden.)

#8 Updated by Andreas Müller 10 months ago

Please save future versions of the excel file in the current .xslx format, .xls is outdated.

#10 Updated by Andreas Müller 10 months ago

  • File deleted (Algen_Syllabus_NormalImplied_Test.xls)

#11 Updated by Andreas Müller 10 months ago

  • Status changed from New to In Progress
  • Priority changed from New to Highest
  • Target version changed from Release 5.2 to Release 5.3
  • % Done changed from 0 to 30
  • Severity changed from normal to major

First version of import is ready and tested with Frey and WoRMS data running into empty local database.

Next step: test with running into phycobank database.

#13 Updated by Andreas Müller 10 months ago

What should be done with existing IAPT data. Should they be adapted to the new data model?

#14 Updated by Andreas Müller 10 months ago

Andreas Müller wrote:

What should be done with existing IAPT data. Should they be adapted to the new data model?

Also we need to decide how to handle IAPT species. They are currently attached to IAPT genus. Is this still wanted in future?

Also IAPT genus do have authors. We need to decide if these genus names should be matched during import of new names which currently have no authors.

#15 Updated by Andreas Müller 8 months ago

  • Status changed from In Progress to Resolved
  • Assignee changed from Andreas Müller to Andreas Kohlbecker
  • % Done changed from 30 to 50

please review results of import on test.cdm_phycobank

#16 Updated by Andreas Müller 8 months ago

  • Status changed from Resolved to Feedback
  • Assignee changed from Andreas Kohlbecker to Andreas Müller
  • Target version changed from Release 5.3 to Release 5.4

Needs to be adapted

#17 Updated by Andreas Kohlbecker 8 months ago

Hi Andreas,

a small change of the strategy: All Taxa and TaxonRelations which belong to the classifications-graph should have the Phycobank as secReference or citation in case of the relations. By this the taxa and relations belonging to the graph can be identified.

Andreas

#18 Updated by Andreas Müller 8 months ago

Andreas Kohlbecker wrote:

Hi Andreas,

a small change of the strategy: All Taxa and TaxonRelations which belong to the classifications-graph should have the Phycobank as secReference or citation in case of the relations. By this the taxa and relations belonging to the graph can be identified.

Andreas

I understand this for the taxa as we use only 1 taxon per name so we can't give sec references per concept used. This was already decided before.

I do not understand it for taxon relations. I thougt for them the references should be references of the classification used to not loose this information. Which garph do you need to identify this way. Are there any other "taxonomically included in" relationships expected in the database then those for phycobank? And for which use-case do you need to identify the graph?

Also I should mention that we think about having a "graph" link for each relationship in (near) future. This is an idea that comes from discussing different types of DefinedTerm relationship graphs (collections, lists, trees, directed graphs, undefined graphs). This way you can group graphs while using reference or soon sources.reference is not a good idea for holding graph data together for multiple reasons.

#19 Updated by Andreas Müller 8 months ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Andreas Müller to Andreas Kohlbecker

I did run a new import to test.cdm_phycobank. Please check the results.

One issue I can see is that the existing IAPT data sometimes do have different ranks then the imported data. Therefore the names/taxa are not deduplicated. Example: Cryptophyceae is Division in IAPT but Phylum in Frey + Worms. This needs to be sorted out before the final import.

#20 Updated by Andreas Müller 8 months ago

You should check by

SELECT titleCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.titleCache
Having n > 1

or

SELECT nameCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.nameCache
Having n > 1

for multiple occurrences of the same name in phycobank. There are even names with preliminary flag (or nameCache == null). This should probably not happen in a registration database.

#21 Updated by Andreas Kohlbecker 8 months ago

Andreas Müller wrote:

You should check by

SELECT titleCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.titleCache
Having n > 1

or

SELECT nameCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.nameCache
Having n > 1

for multiple occurrences of the same name in phycobank. There are even names with preliminary flag (or nameCache == null). This should probably not happen in a registration database.

Genus name duplicates are already handled in #7748

#22 Updated by Andreas Kohlbecker 8 months ago

all other data import and data cleaning issues are copied to #7808

#23 Updated by Andreas Kohlbecker 8 months ago

  • Copied to task #7808: Further name duplicates added

#24 Updated by Andreas Kohlbecker 8 months ago

  • Description updated (diff)

#25 Updated by Andreas Kohlbecker 8 months ago

  • Status changed from Resolved to Feedback
  • Assignee changed from Andreas Kohlbecker to Andreas Müller
  • % Done changed from 50 to 90

I reviewed the imported taxon graph relations. The resulting graph exactly matches the expectations.
The implementation is ready, so we now need the complete higher classification data for the final imports of the various classifications.
I think we should close this ticket in favor of creating a new ticket for the actual import tasks.

#26 Updated by Andreas Müller 8 months ago

Ok, can you close the ticket and open a ticket for what ever you stil need?

#27 Updated by Andreas Kohlbecker 8 months ago

  • Copied to task #7811: Import higher classifications added

#28 Updated by Andreas Kohlbecker 8 months ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100

new ticket for the actual import tasks created: #7811

#29 Updated by Andreas Müller 7 months ago

  • Target version deleted (Release 5.4)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)