Project

General

Profile

task #6009

Import the IAPT database into a cdm instance

Added by Andreas Kohlbecker almost 4 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
New
Category:
Import
Target version:
Start date:
09/16/2016
Due date:
% Done:

100%

Estimated time:
(Total: 4.00 h)

Description

Import aller Algen-Namen in 2 CDM Instanzen:

  1. Eine Instanz als Ersatz für die alte IAPT Anwedung: Komplettimport - DONE http://test.e-taxonomy.eu/cdmserver/iapt/
  2. Eine Instanz als Basis für die Algenregistrierung: Gefilterer Import, SQL Filter für alle Algennamne: higherRank='%PHYCEAE%' - DONE http://test.e-taxonomy.eu/cdmserver/phycobank/portal/classification

Columns in the csv file:

  • RegistrationNo_Pk
  • HigherTaxon
  • FullName
  • AuthorsSpelling
  • LitString
  • Registration
  • Type ==> Needs to be parsed, see comment
  • Caveats
  • FullBasionym
  • FullSynSubst
  • NotesTxt
  • RegDate
  • NameString
  • BasionymString
  • SynSubstStr
  • AuthorString

CDM instances:

cdm_algea_registry-import-stats-09-20-16.txt View (561 Bytes) Andreas Kohlbecker, 02/27/2018 11:51 AM

cdm_algea_registry-import-errors-09-20-16.xls (224 KB) Andreas Kohlbecker, 02/27/2018 11:51 AM

cdm_iapt-import-stats-09-20-16.txt View (570 Bytes) Andreas Kohlbecker, 02/27/2018 11:51 AM

IAPT_Import-Analyse-Henning-2016-09-12.xls (28 KB) Andreas Kohlbecker, 02/27/2018 11:51 AM

cdm_iapt-import-errors-09-20-16.xls (2.2 MB) Andreas Kohlbecker, 02/27/2018 11:51 AM


Subtasks

task #6018: Diatomeen-Namensbestand in IAPT Registrierung prüfenClosedWolf-Henning Kusber

task #6093: Algen in 'Incertae sedis' & 'No group assigned' ClosedAndreas Kohlbecker

task #6034: DataCleaning after final importClosed

Edit - bug #6276: Nomenclatural references show parsed date string instead of publication year ClosedAndreas Müller

task #6277: use nomenclatural reference as sec. references instead of IAPTClosedAndreas Kohlbecker

task #6279: remove all 'nom. val' status from nameClosedWolf-Henning Kusber

task #6294: remove punctuation marks from end of Locality language stringsClosedAndreas Kohlbecker

task #6171: Higher RankDuplicateWolf-Henning Kusber

task #6172: Higher RankDuplicateWolf-Henning Kusber

Associated revisions

Revision 28925 (diff)
Added by Andreas Kohlbecker almost 4 years ago

ref #6009 improved plublication date parsing

Revision 28926 (diff)
Added by Andreas Kohlbecker almost 4 years ago

ref #6009 improved plublication date parsing
- all dates recognized
- all months recognized
- special time spans not implemented (Winter, Fall, April-June, etc)

History

#1 Updated by Andreas Kohlbecker almost 4 years ago

  • Tracker changed from bug to task

#2 Updated by Andreas Kohlbecker almost 4 years ago

  • Description updated (diff)

#3 Updated by Andreas Kohlbecker almost 4 years ago

  • Description updated (diff)
  • Status changed from New to In Progress

#4 Updated by Wolf-Henning Kusber almost 4 years ago

Andreas Kohlbecker wrote:

Import aller Algen-Namen in 2 CDM Instanzen:

  1. Eine Instanz als Ersatz für die alte IAPT Anwedung: Komplettimport
  2. Eine Instanz als Basis für die Algenregistrierung: Gefilterer Import, SQL Filter für alle Algennamne: higherRank='%PHYCEAE%'

Columns in the csv file:

  • RegistrationNo_Pk
  • HigherTaxon
  • FullName
  • AuthorsSpelling
  • LitString
  • Registration
  • Type
  • Caveats
  • FullBasionym
  • FullSynSubst
  • NotesTxt
  • RegDate
  • NameString
  • BasionymString
  • SynSubstStr
  • AuthorString

siehe auch Subtask

#5 Updated by Andreas Kohlbecker almost 4 years ago

  • Description updated (diff)

Die Typus informationen liegen nur als Freitext vor und enthalten teilweise recht viel Informationen, darunter auch Holotyp, Isotyp, Location.
Genügt es für den Prototypen diese Daten als Freitext zu übernehmen oder brauchen wir diese in atomisierter Form?

Henning:
Darüber haben Eckhard und ich heute gesprochen. Dort wo ein Holotyp oder Isotyp vorkommt sollte eine Atomisierung möglich sein, beginnend jeweils mit der Sammlung, gefolgt vom Barcode, bzw. der Nummer des Sammlungsobjekts. Dort wo nur "Typus" und "Locality" steht sind Texte aus den Publikationen übernommen, d.h. nicht atomisierbar weil nicht standardisiert eingegeben

#6 Updated by Andreas Kohlbecker almost 4 years ago

  • Target version set to IAPT Import ready

#7 Updated by Andreas Kohlbecker almost 4 years ago

  • Description updated (diff)

for details on the filtered import, see #6026

#8 Updated by Andreas Kohlbecker almost 4 years ago

  • Status changed from In Progress to Resolved

The import is fully implemented and has been successfully run.

#9 Updated by Andreas Kohlbecker over 3 years ago

  • Category set to Import

#10 Updated by Andreas Kohlbecker over 3 years ago

  • Description updated (diff)

#11 Updated by Andreas Kohlbecker over 3 years ago

  • Description updated (diff)

#12 Updated by Andreas Kohlbecker over 3 years ago

  • Description updated (diff)

#13 Updated by Andreas Kohlbecker almost 3 years ago

  • Status changed from Resolved to Closed

We will no longer work on the import to improve data quality, so this issue can be closed.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)