Project

General

Profile

Actions

task #7420

closed

Import for higher taxon graph for phycobank

Added by Andreas Kohlbecker almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Highest
Category:
cdmadapter
Target version:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Severity:
major
Tags:

Description

The higher taxon graph as discussed in #6173 will be imported via spreadsheets whereas we will use the normal implicit import format. The details of the format which will be defined by issue #7419

Taxa will be related as taxon relationships as well as classifications. The later is only for inner handling, not for showing or searching taxa in data portal via classifications.

Secundum for taxa should be nom. ref. => nominal taxa


The resulting taxonGraph is visualized at: http://api.cybertaxonomy.org/taxonGraph/


Files

Algen_Syllabus_NormalImplied_Test.xlsx (13.4 KB) Algen_Syllabus_NormalImplied_Test.xlsx Andreas Müller, 08/09/2018 10:45 AM
Algen_Syllabus_NormalImplied_Worms_Test.xlsx (15.2 KB) Algen_Syllabus_NormalImplied_Worms_Test.xlsx Test data including WoRMS data Andreas Müller, 08/10/2018 02:16 PM

Related issues

Related to PhycoBank - task #6173: Concept for a useful algae registry taxon classification ClosedAndreas Kohlbecker

Actions
Related to EDIT - task #6137: Urgent importsIn ProgressAndreas Müller

Actions
Related to EDIT - task #7948: Editor for Classification FragmentsClosedAndreas Kohlbecker

Actions
Related to EDIT - bug #10272: Import WoRMS (and other) higher classifications for PhycobankNewAndreas Müller

Actions
Blocked by EDIT - feature request #7419: Provide example NormalImplicit import file for phycobank higher taxon graph importsClosedWolf-Henning Kusber

Actions
Copied to PhycoBank - task #7808: Further name duplicatesNewWolf-Henning Kusber

Actions
Copied to EDIT - task #7811: Import higher classificationsFeedbackWolf-Henning Kusber

Actions
Actions #1

Updated by Andreas Kohlbecker almost 6 years ago

  • Blocked by feature request #7419: Provide example NormalImplicit import file for phycobank higher taxon graph imports added
Actions #2

Updated by Andreas Kohlbecker almost 6 years ago

  • Related to task #6173: Concept for a useful algae registry taxon classification added
Actions #3

Updated by Andreas Müller almost 6 years ago

  • Subject changed from Adapt NormalImplecit import to import higher taxon graph phycobank to Import for higher taxon graph for phycobank
  • Target version changed from Unassigned CDM tickets to Release 5.1
Actions #4

Updated by Andreas Müller almost 6 years ago

Actions #5

Updated by Andreas Müller almost 6 years ago

  • Description updated (diff)
Actions #6

Updated by Andreas Müller over 5 years ago

  • Target version changed from Release 5.1 to Release 5.2
Actions #7

Updated by Andreas Kohlbecker over 5 years ago

  • File Algen_Syllabus_NormalImplied_Test.xls added

Hi, here comes the spread sheet with the first data for the import: attachment:Algen_Syllabus_NormalImplied_Test.xls

das Excel-File „Algen_Syllabus_NormalImplied_Test“ enthält das alte Blatt „NormalImplied.txt“ mit Gegenüberstellung von Syllabus und WoRMS.

Auf Basis der bisherigen Besprechungen habe ich für die weitere Diskussion ein neues Blatt für den Syllabus angelegt „HigherRanksEnfwurfNeu“

Andreas M. bat darum, alle höheren Ränge, die bekannt sind explizit zu benennen. Fertig, alle gelb unterlegten Rangstufen sind nicht vorhanden oder werden im Syllabus nicht genutzt.
Gelb unterlegte Ränge, die leer sind, ersetzten vielfach eine Bezeichnung wie „incertis sedis“.

Fragen zu den Gattungsnamen:

  1. Einige haben wir schon als Namen im System, einige als registrierte Gattungen, andere haben wir noch nicht im System, benötigen sie aber als Namen ggf. als registrierte Namen, Vorgehen?
  2. Andreas M. fragte nach nomenklatorischen Autoren, wegen eventueller Homonyme
  3. Brauchen wir eine Info zum Status (valid, invalid, illeg.)? für die Gattungen? (ich würde nur sagen, wenn sie registriert werden.)
Actions #8

Updated by Andreas Müller over 5 years ago

Please save future versions of the excel file in the current .xslx format, .xls is outdated.

Actions #10

Updated by Andreas Müller over 5 years ago

  • File deleted (Algen_Syllabus_NormalImplied_Test.xls)
Actions #11

Updated by Andreas Müller over 5 years ago

  • Status changed from New to In Progress
  • Priority changed from New to Highest
  • Target version changed from Release 5.2 to Release 5.3
  • % Done changed from 0 to 30
  • Severity changed from normal to major

First version of import is ready and tested with Frey and WoRMS data running into empty local database.

Next step: test with running into phycobank database.

Actions #13

Updated by Andreas Müller over 5 years ago

What should be done with existing IAPT data. Should they be adapted to the new data model?

Actions #14

Updated by Andreas Müller over 5 years ago

Andreas Müller wrote:

What should be done with existing IAPT data. Should they be adapted to the new data model?

Also we need to decide how to handle IAPT species. They are currently attached to IAPT genus. Is this still wanted in future?

Also IAPT genus do have authors. We need to decide if these genus names should be matched during import of new names which currently have no authors.

Actions #15

Updated by Andreas Müller over 5 years ago

  • Status changed from In Progress to Resolved
  • Assignee changed from Andreas Müller to Andreas Kohlbecker
  • % Done changed from 30 to 50

please review results of import on test.cdm_phycobank

Actions #16

Updated by Andreas Müller over 5 years ago

  • Status changed from Resolved to Feedback
  • Assignee changed from Andreas Kohlbecker to Andreas Müller
  • Target version changed from Release 5.3 to Release 5.4

Needs to be adapted

Actions #17

Updated by Andreas Kohlbecker over 5 years ago

Hi Andreas,

a small change of the strategy: All Taxa and TaxonRelations which belong to the classifications-graph should have the Phycobank as secReference or citation in case of the relations. By this the taxa and relations belonging to the graph can be identified.

Andreas

Actions #18

Updated by Andreas Müller over 5 years ago

Andreas Kohlbecker wrote:

Hi Andreas,

a small change of the strategy: All Taxa and TaxonRelations which belong to the classifications-graph should have the Phycobank as secReference or citation in case of the relations. By this the taxa and relations belonging to the graph can be identified.

Andreas

I understand this for the taxa as we use only 1 taxon per name so we can't give sec references per concept used. This was already decided before.

I do not understand it for taxon relations. I thougt for them the references should be references of the classification used to not loose this information. Which garph do you need to identify this way. Are there any other "taxonomically included in" relationships expected in the database then those for phycobank? And for which use-case do you need to identify the graph?

Also I should mention that we think about having a "graph" link for each relationship in (near) future. This is an idea that comes from discussing different types of DefinedTerm relationship graphs (collections, lists, trees, directed graphs, undefined graphs). This way you can group graphs while using reference or soon sources.reference is not a good idea for holding graph data together for multiple reasons.

Actions #19

Updated by Andreas Müller over 5 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from Andreas Müller to Andreas Kohlbecker

I did run a new import to test.cdm_phycobank. Please check the results.

One issue I can see is that the existing IAPT data sometimes do have different ranks then the imported data. Therefore the names/taxa are not deduplicated. Example: Cryptophyceae is Division in IAPT but Phylum in Frey + Worms. This needs to be sorted out before the final import.

Actions #20

Updated by Andreas Müller over 5 years ago

You should check by

SELECT titleCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.titleCache
Having n > 1

or

SELECT nameCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.nameCache
Having n > 1

for multiple occurrences of the same name in phycobank. There are even names with preliminary flag (or nameCache == null). This should probably not happen in a registration database.

Actions #21

Updated by Andreas Kohlbecker over 5 years ago

Andreas Müller wrote:

You should check by

SELECT titleCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.titleCache
Having n > 1

or

SELECT nameCache, count(*) as n
FROM TaxonName tn
GROUP BY tn.nameCache
Having n > 1

for multiple occurrences of the same name in phycobank. There are even names with preliminary flag (or nameCache == null). This should probably not happen in a registration database.

Genus name duplicates are already handled in #7748

Actions #22

Updated by Andreas Kohlbecker over 5 years ago

all other data import and data cleaning issues are copied to #7808

Actions #23

Updated by Andreas Kohlbecker over 5 years ago

  • Copied to task #7808: Further name duplicates added
Actions #24

Updated by Andreas Kohlbecker over 5 years ago

  • Description updated (diff)
Actions #25

Updated by Andreas Kohlbecker over 5 years ago

  • Status changed from Resolved to Feedback
  • Assignee changed from Andreas Kohlbecker to Andreas Müller
  • % Done changed from 50 to 90

I reviewed the imported taxon graph relations. The resulting graph exactly matches the expectations.
The implementation is ready, so we now need the complete higher classification data for the final imports of the various classifications.
I think we should close this ticket in favor of creating a new ticket for the actual import tasks.

Actions #26

Updated by Andreas Müller over 5 years ago

Ok, can you close the ticket and open a ticket for what ever you stil need?

Actions #27

Updated by Andreas Kohlbecker over 5 years ago

  • Copied to task #7811: Import higher classifications added
Actions #28

Updated by Andreas Kohlbecker over 5 years ago

  • Status changed from Feedback to Closed
  • % Done changed from 90 to 100

new ticket for the actual import tasks created: #7811

Actions #29

Updated by Andreas Müller over 5 years ago

  • Target version deleted (Release 5.4)
Actions #30

Updated by Andreas Müller about 2 years ago

  • Related to task #7948: Editor for Classification Fragments added
Actions #31

Updated by Andreas Müller about 1 year ago

  • Related to bug #10272: Import WoRMS (and other) higher classifications for Phycobank added
Actions

Also available in: Atom PDF