Project

General

Profile

Actions

CoL Import Dokumentation


Download

Prepare database

As the import takes very long (>7 days) it is highly recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow)

Launch

  • The import is launched by ColDwcaImportActivator in cdmlib-apps (https://dev.e-taxonomy.eu/gitweb/cdmlib-apps.git)
  • Before launch adapt
    • filename (URI) in ColDwcaImportActivator.dwca_col_All()
    • adapt the path to the mapping file databaseMappingFile
      • the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported
    • adapt classificationName
  • The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (higher taxa and lower taxa) are memory sensitive therefore it is recommended to run them separately. First you need to run taxa, lower taxa needs to run after higher taxa, everything else is order independent
    • taxa
    • extensions
    • higher taxa
    • lower taxa
    • synonymy

Configuration

  • give enough memory e.g. -Xmx9000M
  • Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/log4j_col.properties

Installation

  • when ready move DB to edit-database (production) and install mysql -h localhost -u edit -p cdm_production_col<{filename}
  • archive the file on edit-database in /var/backup/db_mysql_manual
  • compute the freetext index by either using http://160.45.63.176/jenkins/job/REINDEX-col-catalogue-services/
  • archive index afterwards on production (160.45.63.173) cd /var/lib/cdmserver // tar -cjf col_2017-08-04.tar.bz2 index/col
  • maybe install CoL also on edit-test (not necessarily required)

Tickets

# Tracker Status Priority Subject Category Target version % Done
10451 feature request Resolved New Allow adding fullName (name+author) to coldp name table cdmadapter Release 5.42

Actions
10450 task New New Handle zip-files consistent in list exports taxeditor Unassigned CDM tickets

Actions
10449 feature request Closed Highest Allow filtering out synonyms in list exports cdmadapter Release 5.42

Actions
10334 feature request Closed Priority14 Evaluate DescriptionBase.publish in webservices cdmlib-remote Release 5.42

Actions
10271 feature request In Progress Priority14 Implement CoL-DP export cdmadapter Release 5.42

Actions
6987 bug Closed Highest LIE during name catalogue service call processing cdmlib-remote Release 4.11

Actions
6941 bug New New Deduplicate authors during CoL import cdmadapter Unassigned CDM tickets

Actions
6937 bug New New Check cascading of partOf named areas (throws NUOE e.g. in CoL import) cdmlib Unassigned CDM tickets

Actions
6936 bug New New Deduplicate languages in CoL import cdmadapter Unassigned CDM tickets

Actions
6935 bug New New Deduplicate distribution status in CoL cdmadapter Unassigned CDM tickets

Actions
6900 bug New New Handle virus names correctly in CoL cdmadapter Unassigned CDM tickets

Actions
6899 bug New New Genus names have empty protected authorshipcache (CoL) cdmadapter Unassigned CDM tickets

Actions
6898 feature request New New Implement stable identifiers for CoL import cdmadapter Unassigned CDM tickets

Actions
6892 bug New New Further improvements to CoL performance cdmadapter Unassigned CDM tickets

Actions
6891 bug New New Handle ambiguous synonym in CoL cdmadapter Unassigned CDM tickets

Actions
6888 bug New New Deduplicate persons in CoL/DwCA import cdmadapter Unassigned CDM tickets

Actions
6887 feature request In Progress Priority12 Some attributes are still missing in CoL import cdmadapter Reviewed Next Major Release

Actions
6883 bug New New DwC-A common name language and area import (for CoL) cdmadapter Unassigned CDM tickets

Actions
6882 task Closed New Document CoL import, how to documentation Update Documentation

Actions
6880 feature request Closed Highest Allow removing auditing from imports cdmadapter Release 4.10

Actions
6494 bug Worksforme New Fix null titleCache for CoL import cdmadapter

Actions
5063 bug Closed Highest Fix CoL import (split in parts++) cdmadapter Release 3.8

Actions
4707 bug In Progress Highest LSID Authority service cannot find CoL LSIDs data Reviewed Next Major Release

Actions

Updated by Andreas Müller almost 2 years ago · 17 revisions