Actions
bug #6892
openFurther improvements to CoL performance
Status:
New
Priority:
New
Assignee:
Category:
cdmadapter
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Severity:
normal
Found in Version:
Description
To further improve the CoL import performance we could
improve synonym import
- add parentFk by SQL (remember mapping)
- handle synonyms within transaction immediately - store remaining mappings
- handle synonyms with prior existing taxa immediately (but often synonyms come first) - store remaining mappings
improve classification import
- create taxon nodes in first run without link to parent and classification, set parent and classification via SQL
improve description import
- create descriptions in first run, only load descriptions, not taxa later
- like above, but only save description elements, create indescriptionFk via SQL
run in parallel
- maybe we can run synonym, classification and description import in parallel
Current import times on local PESI-HCL MySQL are:
- 13,5 h (create taxa and synonyms) - 3.571 M
- 31h (create synonym relationship/FK) - 1.526 M
- 15,5 h (higher taxonomy > species) 176 M
- >7d !! (lower taxonomy 1.892
extensions 2-3 d
- vernaculars 14 h (415 M)
- species profile (not yet imported)
- description = text distributions 38 h (1.835 M)
- references (citation) : 4 h (1.907 M)
- distribution 7,5h (318 M)
SELECT count(*)
FROM TaxonBase tb INNER JOIN TaxonName tnb ON tnb.id = tb.name_id
INNER JOIN DefinedTermBase rank ON rank.id = tnb.rank_id
WHERE tb.DTYPE = 'Taxon' AND (orderindex <= 44 OR rank_id IS NULL);
Actions