Project

General

Profile

CoL Import Dokumentation » History » Version 9

Andreas Müller, 08/03/2017 10:14 PM

1 1 Andreas Müller
# CoL Import Dokumentation
2
3 3 Andreas Müller
{{>toc}}
4
5
## Download
6
7 2 Andreas Müller
* The download is available from http://www.catalogueoflife.org/DCA_Export/archive.php  (see also http://www.catalogueoflife.org/DCA_Export/index.php for partial downloads)
8
* Copy the download to \\bgbm-pesihpc\CoL or any other place you have access to
9 3 Andreas Müller
10
## Prepare database
11
As the import takes very long (>2 days) it is highly  recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow)
12
13
## Launch
14 2 Andreas Müller
* The import is launched by ColDwcaImportActivator in cdmlib-apps (https://dev.e-taxonomy.eu/gitweb/cdmlib-apps.git)
15 4 Andreas Müller
* Before launch adapt
16
  * filename (URI) in `ColDwcaImportActivator.dwca_col_All()`
17 5 Andreas Müller
  * adapt the path to the mapping file `databaseMappingFile` 
18
      * the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported
19
  * adapt classificationName
20
* The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (*higher taxa* and *lower taxa*) are memory sensitive therefore it is recommended to run them separately. **First** you need to run *taxa*, *lower taxa* needs to run after *higher taxa*, everything else is order independent
21 2 Andreas Müller
    * taxa
22 1 Andreas Müller
    * extensions
23
    * higher taxa
24
    * lower taxa
25 3 Andreas Müller
    * synonymy
26 1 Andreas Müller
27 3 Andreas Müller
## Configuration
28
* give enough memory e.g. -Xmx9000M 
29
* Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/log4j_col.properties
30 2 Andreas Müller
31 3 Andreas Müller
## Installation
32 6 Andreas Müller
* when ready move DB to edit-database (production) and install `mysql -h localhost -u edit -p cdm_production_col<{filename}`
33 8 Andreas Müller
* compute the **freetext index** by either xxx or using jobber
34 9 Andreas Müller
35
## Tickets
36
37
{{ref_issues(-f:version=Release 4.10)}}