Project

General

Profile

bug #6892

Further improvements to CoL performance

Added by Andreas Müller almost 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
New
Category:
cdmadapter
Target version:
Start date:
08/07/2017
Due date:
% Done:

0%

Severity:
normal
Found in Version:
Tags:
col

Description

To further improve the CoL import performance we could

  • improve synonym import

    • add parentFk by SQL (remember mapping)
    • handle synonyms within transaction immediately - store remaining mappings
    • handle synonyms with prior existing taxa immediately (but often synonyms come first) - store remaining mappings
  • improve classification import

    • create taxon nodes in first run without link to parent and classification, set parent and classification via SQL
  • improve description import

    • create descriptions in first run, only load descriptions, not taxa later
    • like above, but only save description elements, create indescriptionFk via SQL
  • run in parallel

    • maybe we can run synonym, classification and description import in parallel

Current import times on local PESI-HCL MySQL are:

  • 13,5 h (create taxa and synonyms) - 3.571 M
  • 31h (create synonym relationship/FK) - 1.526 M
  • 15,5 h (higher taxonomy > species) 176 M
  • >7d !! (lower taxonomy 1.892
  • extensions 2-3 d

    • vernaculars 14 h (415 M)
    • species profile (not yet imported)
    • description = text distributions 38 h (1.835 M)
    • references (citation) : 4 h (1.907 M)
    • distribution 7,5h (318 M)
SELECT count(*)
FROM TaxonBase tb INNER JOIN TaxonName tnb ON tnb.id = tb.name_id
INNER JOIN DefinedTermBase rank ON rank.id = tnb.rank_id
WHERE tb.DTYPE = 'Taxon' AND (orderindex <= 44 OR rank_id IS NULL);

History

#1 Updated by Andreas Müller almost 2 years ago

  • Description updated (diff)

#2 Updated by Andreas Müller almost 2 years ago

  • Description updated (diff)

#3 Updated by Andreas Müller almost 2 years ago

  • Description updated (diff)

#4 Updated by Andreas Müller over 1 year ago

  • Description updated (diff)

#5 Updated by Andreas Müller over 1 year ago

  • Description updated (diff)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)