bug #6892
Updated by Andreas Müller over 6 years ago
To further improve the CoL import performance we could
* improve synonym import
* add parentFk by SQL (remember mapping)
* handle synonyms within transaction immediately - store remaining mappings
* handle synonyms with prior existing taxa immediately (but often synonyms come first) - store remaining mappings
* improve classification import
* create taxon nodes in first run without link to parent and classification, set parent and classification via SQL
* improve description import
* create descriptions in first run, only load descriptions, not taxa later
* like above, but only save description elements, create indescriptionFk via SQL
* run in parallel
* maybe we can run synonym, classification and description import in parallel
Current import times on local PESI-HCL MySQL are:
* 13,5 h (create taxa and synonyms) - 3.571 M
* 31h (create synonym relationship/FK) - 1.526 M
* 15,5 h (higher taxonomy > species) 176 M
* >7d !! (lower taxonomy 1.892
* extensions 2-3 d
* vernaculars 14 h (415 M)
* species profile (not yet imported)
* description = text distributions 38 h (1.835 M)
* references (citation) : 4 h (1.907 M)
* distribution 7,5h (318 M)
~~~ sql
SELECT count(*)
FROM TaxonBase tb INNER JOIN TaxonName tnb ON tnb.id = tb.name_id
INNER JOIN DefinedTermBase rank ON rank.id = tnb.rank_id
WHERE tb.DTYPE = 'Taxon' AND (orderindex <= 44 OR rank_id IS NULL);
~~~