CoL Import Dokumentation » History » Version 13
Andreas Müller, 08/03/2017 10:59 PM
1 | 1 | Andreas Müller | # CoL Import Dokumentation |
---|---|---|---|
2 | |||
3 | 3 | Andreas Müller | {{>toc}} |
4 | |||
5 | ## Download |
||
6 | |||
7 | 2 | Andreas Müller | * The download is available from http://www.catalogueoflife.org/DCA_Export/archive.php (see also http://www.catalogueoflife.org/DCA_Export/index.php for partial downloads) |
8 | * Copy the download to \\bgbm-pesihpc\CoL or any other place you have access to |
||
9 | 3 | Andreas Müller | |
10 | ## Prepare database |
||
11 | As the import takes very long (>2 days) it is highly recommended not to run it into production directly, instead use a local database or one of the 2 col instances on edit-test (Note: edit-test is relatively slow) |
||
12 | |||
13 | ## Launch |
||
14 | 2 | Andreas Müller | * The import is launched by ColDwcaImportActivator in cdmlib-apps (https://dev.e-taxonomy.eu/gitweb/cdmlib-apps.git) |
15 | 4 | Andreas Müller | * Before launch adapt |
16 | * filename (URI) in `ColDwcaImportActivator.dwca_col_All()` |
||
17 | 5 | Andreas Müller | * adapt the path to the mapping file `databaseMappingFile` |
18 | * the mapping file stores the mapping of CoL DwC-A data to CDM, the database based mapping is required for running the import in parts (next step), it is a temporary folder that can be removed once all data is imported |
||
19 | * adapt classificationName |
||
20 | * The import is split in multiple parts, this is for performance and memory reasons, especially the classification creating parts (*higher taxa* and *lower taxa*) are memory sensitive therefore it is recommended to run them separately. **First** you need to run *taxa*, *lower taxa* needs to run after *higher taxa*, everything else is order independent |
||
21 | 2 | Andreas Müller | * taxa |
22 | 1 | Andreas Müller | * extensions |
23 | * higher taxa |
||
24 | * lower taxa |
||
25 | 3 | Andreas Müller | * synonymy |
26 | 1 | Andreas Müller | |
27 | 3 | Andreas Müller | ## Configuration |
28 | * give enough memory e.g. -Xmx9000M |
||
29 | * Consider defining your own log file and log properties e.g. by -Dlog4j.configuration=file:///C:/Users/a.mueller/.cdmLibrary/log/properties/log4j_col.properties |
||
30 | 2 | Andreas Müller | |
31 | 3 | Andreas Müller | ## Installation |
32 | 6 | Andreas Müller | * when ready move DB to edit-database (production) and install `mysql -h localhost -u edit -p cdm_production_col<{filename}` |
33 | 8 | Andreas Müller | * compute the **freetext index** by either xxx or using jobber |
34 | 9 | Andreas Müller | |
35 | ## Tickets |
||
36 | |||
37 | 11 | Andreas Müller | {{ref_issues(-f:tags = col, tracker, status, priority, subject, category, fixed_version, done_ratio )}} |