Project

General

Profile

Actions

CoL Import Dokumentation » History » Revision 14

« Previous | Revision 14/17 (diff) | Next »
Andreas Müller, 08/04/2017 09:38 AM


CoL Import Dokumentation

Download

Prepare database

As the import takes very long (>2 days) it is highly recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow)

Launch

  • The import is launched by ColDwcaImportActivator in cdmlib-apps (https://dev.e-taxonomy.eu/gitweb/cdmlib-apps.git)
  • Before launch adapt
    • filename (URI) in ColDwcaImportActivator.dwca_col_All()
    • adapt the path to the mapping file databaseMappingFile
      • the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported
    • adapt classificationName
  • The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (higher taxa and lower taxa) are memory sensitive therefore it is recommended to run them separately. First you need to run taxa, lower taxa needs to run after higher taxa, everything else is order independent
    • taxa
    • extensions
    • higher taxa
    • lower taxa
    • synonymy

Configuration

  • give enough memory e.g. -Xmx9000M
  • Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/log4j_col.properties

Installation

  • when ready move DB to edit-database (production) and install mysql -h localhost -u edit -p cdm_production_col<{filename}
  • compute the freetext index by either using http://160.45.63.176/jenkins/job/REINDEX-col-catalogue-services/
  • archive index afterwards cd /var/lib/cdmserver // tar -cjf col_2017-08-04.tar.bz2 index/col

Tickets

# Tracker Status Priority Subject Category Target version % Done
10451 feature request Resolved New Allow adding fullName (name+author) to coldp name table cdmadapter Release 5.42

Actions
10271 feature request In Progress Priority14 Implement CoL-DP export cdmadapter Release 5.42

Actions
6887 feature request In Progress Priority12 Some attributes are still missing in CoL import cdmadapter Reviewed Next Major Release

Actions
4707 bug In Progress Highest LSID Authority service cannot find CoL LSIDs data Reviewed Next Major Release

Actions
10450 task New New Handle zip-files consistent in list exports taxeditor Unassigned CDM tickets

Actions
6941 bug New New Deduplicate authors during CoL import cdmadapter Unassigned CDM tickets

Actions
6937 bug New New Check cascading of partOf named areas (throws NUOE e.g. in CoL import) cdmlib Unassigned CDM tickets

Actions
6936 bug New New Deduplicate languages in CoL import cdmadapter Unassigned CDM tickets

Actions
6935 bug New New Deduplicate distribution status in CoL cdmadapter Unassigned CDM tickets

Actions
6900 bug New New Handle virus names correctly in CoL cdmadapter Unassigned CDM tickets

Actions
6899 bug New New Genus names have empty protected authorshipcache (CoL) cdmadapter Unassigned CDM tickets

Actions
6898 feature request New New Implement stable identifiers for CoL import cdmadapter Unassigned CDM tickets

Actions
6892 bug New New Further improvements to CoL performance cdmadapter Unassigned CDM tickets

Actions
6891 bug New New Handle ambiguous synonym in CoL cdmadapter Unassigned CDM tickets

Actions
6888 bug New New Deduplicate persons in CoL/DwCA import cdmadapter Unassigned CDM tickets

Actions
6883 bug New New DwC-A common name language and area import (for CoL) cdmadapter Unassigned CDM tickets

Actions
10449 feature request Closed Highest Allow filtering out synonyms in list exports cdmadapter Release 5.42

Actions
10334 feature request Closed Priority14 Evaluate DescriptionBase.publish in webservices cdmlib-remote Release 5.42

Actions
6987 bug Closed Highest LIE during name catalogue service call processing cdmlib-remote Release 4.11

Actions
6882 task Closed New Document CoL import, how to documentation Update Documentation

Actions
6880 feature request Closed Highest Allow removing auditing from imports cdmadapter Release 4.10

Actions
5063 bug Closed Highest Fix CoL import (split in parts++) cdmadapter Release 3.8

Actions
6494 bug Worksforme New Fix null titleCache for CoL import cdmadapter

Actions

Updated by Andreas Müller over 6 years ago · 14 revisions