CoL Import Dokumentation


Prepare database

As the import takes very long (>7 days) it is highly recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow)


  • The import is launched by ColDwcaImportActivator in cdmlib-apps (
  • Before launch adapt
    • filename (URI) in ColDwcaImportActivator.dwca_col_All()
    • adapt the path to the mapping file databaseMappingFile
      • the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported
    • adapt classificationName
  • The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (higher taxa and lower taxa) are memory sensitive therefore it is recommended to run them separately. First you need to run taxa, lower taxa needs to run after higher taxa, everything else is order independent
    • taxa
    • extensions
    • higher taxa
    • lower taxa
    • synonymy


  • give enough memory e.g. -Xmx9000M
  • Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/


  • when ready move DB to edit-database (production) and install mysql -h localhost -u edit -p cdm_production_col<{filename}
  • archive the file on edit-database in /var/backup/db_mysql_manual
  • compute the freetext index by either using
  • archive index afterwards on production ( cd /var/lib/cdmserver // tar -cjf col_2017-08-04.tar.bz2 index/col
  • maybe install CoL also on edit-test (not necessarily required)


# Tracker Status Priority Subject Category Target version % Done
6987 bug Closed Highest LIE during name catalogue service call processing cdmlib-remote Release 4.11

6941 bug New New Deduplicate authors during CoL import cdmadapter Unassigned CDM tickets

6937 bug New New Check cascading of partOf named areas (throws NUOE e.g. in CoL import) cdmlib Unassigned CDM tickets

6936 bug New New Deduplicate languages in CoL import cdmadapter Unassigned CDM tickets

6935 bug New New Deduplicate distribution status in CoL cdmadapter Unassigned CDM tickets

6900 bug New New Handle virus names correctly in CoL cdmadapter Unassigned CDM tickets

6899 bug New New Genus names have empty protected authorshipcache cdmadapter Unassigned CDM tickets

6898 feature request New New Implement stable identifiers for CoL import cdmadapter Unassigned CDM tickets

6892 bug New New Further improvements to CoL performance cdmadapter Unassigned CDM tickets

6891 bug New New Handle ambiguous synonym in CoL cdmadapter Unassigned CDM tickets

6888 bug New New Deduplicate persons in CoL/DwCA import cdmadapter Unassigned CDM tickets

6887 feature request In Progress Priority12 Some attributes are still missing in CoL import cdmadapter Reviewed Next Major Release

6883 bug New New DwC-A common name language and area import (for CoL) cdmadapter Unassigned CDM tickets

6882 task Closed New Document CoL import, how to documentation Update Documentation

6880 feature request Closed Highest Allow removing auditing from imports cdmadapter Release 4.10

6494 bug Worksforme New Fix null titleCache for CoL import cdmadapter

5063 bug Closed Highest Fix CoL import (split in parts++) cdmadapter Release 3.8

4707 bug In Progress Highest LSID Authority service cannot find CoL LSIDs data Reviewed Next Major Release

Add picture from clipboard (Maximum size: 40 MB)