Project

General

Profile

CoL Import Dokumentation » History » Version 17

Andreas Müller, 03/28/2022 03:09 PM

1 1 Andreas Müller
# CoL Import Dokumentation
2
3 17 Andreas Müller
----
4
5
{{toc}}
6
7
{{child_pages(depth=1)}}
8 3 Andreas Müller
9
## Download
10
11 2 Andreas Müller
* The download is available from http://www.catalogueoflife.org/DCA_Export/archive.php  (see also http://www.catalogueoflife.org/DCA_Export/index.php for partial downloads)
12
* Copy the download to \\bgbm-pesihpc\CoL or any other place you have access to
13 3 Andreas Müller
14
## Prepare database
15 16 Andreas Müller
As the import takes very long (>7 days) it is highly  recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow)
16 3 Andreas Müller
17
## Launch
18 2 Andreas Müller
* The import is launched by ColDwcaImportActivator in cdmlib-apps (https://dev.e-taxonomy.eu/gitweb/cdmlib-apps.git)
19 4 Andreas Müller
* Before launch adapt
20
  * filename (URI) in `ColDwcaImportActivator.dwca_col_All()`
21 5 Andreas Müller
  * adapt the path to the mapping file `databaseMappingFile` 
22
      * the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported
23
  * adapt classificationName
24
* The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (*higher taxa* and *lower taxa*) are memory sensitive therefore it is recommended to run them separately. **First** you need to run *taxa*, *lower taxa* needs to run after *higher taxa*, everything else is order independent
25 2 Andreas Müller
    * taxa
26 1 Andreas Müller
    * extensions
27
    * higher taxa
28
    * lower taxa
29 3 Andreas Müller
    * synonymy
30 1 Andreas Müller
31 3 Andreas Müller
## Configuration
32
* give enough memory e.g. -Xmx9000M 
33
* Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/log4j_col.properties
34 2 Andreas Müller
35 3 Andreas Müller
## Installation
36 1 Andreas Müller
* when ready move DB to edit-database (production) and install `mysql -h localhost -u edit -p cdm_production_col<{filename}`
37 16 Andreas Müller
* archive the file on edit-database in /var/backup/db_mysql_manual
38 14 Andreas Müller
* compute the **freetext index** by either using http://160.45.63.176/jenkins/job/REINDEX-col-catalogue-services/ 
39 1 Andreas Müller
* archive index afterwards on production (160.45.63.173) `cd /var/lib/cdmserver // tar -cjf col_2017-08-04.tar.bz2 index/col`
40 16 Andreas Müller
* maybe install CoL also on edit-test (not necessarily required)
41 9 Andreas Müller
42
## Tickets
43
44 11 Andreas Müller
{{ref_issues(-f:tags = col, tracker, status, priority, subject, category, fixed_version, done_ratio )}}