Project

General

Profile

CoL Import Dokumentation » History » Revision 7

Revision 6 (Andreas Müller, 08/03/2017 09:47 PM) → Revision 7/17 (Andreas Müller, 08/03/2017 09:48 PM)

# CoL Import Dokumentation 

 {{>toc}} 

 ## Download 

 * The download is available from http://www.catalogueoflife.org/DCA_Export/archive.php    (see also http://www.catalogueoflife.org/DCA_Export/index.php for partial downloads) 
 * Copy the download to \\bgbm-pesihpc\CoL or any other place you have access to 

 ## Prepare database 
 As the import takes very long (>2 days) it is highly    recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow) 

 ## Launch 
 * The import is launched by ColDwcaImportActivator in cdmlib-apps (https://dev.e-taxonomy.eu/gitweb/cdmlib-apps.git) 
 * Before launch adapt 
   * filename (URI) in `ColDwcaImportActivator.dwca_col_All()` 
   * adapt the path to the mapping file `databaseMappingFile`  
       * the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported 
   * adapt classificationName 
 * The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (*higher taxa* and *lower taxa*) are memory sensitive therefore it is recommended to run them separately. **First** you need to run *taxa*, *lower taxa* needs to run after *higher taxa*, everything else is order independent 
     * taxa 
     * extensions 
     * higher taxa 
     * lower taxa 
     * synonymy 

 ## Configuration 
 * give enough memory e.g. -Xmx9000M  
 * Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/log4j_col.properties 

 ## Installation 
 * when ready move DB to edit-database (production) and install `mysql -h localhost -u edit -p cdm_production_col<{filename}` 
 * compute the freetext index by either xxx or using jobber