Project

General

Profile

bug #6892

Updated by Andreas Müller over 6 years ago

To further improve the CoL import performance we could 

 * improve synonym import 
     * add parentFk by SQL (remember mapping) 
     * handle synonyms within transaction immediately - store remaining mappings 
     * handle synonyms with prior existing taxa immediately (but often synonyms come first) - store remaining mappings 
 
 * improve classification import 
     * create taxon nodes in first run without link to parent and classification, set parent and classification via SQL 
  
 * improve description import 
     * create descriptions in first run, only load descriptions, not taxa later 
     * like above, but only save description elements, create indescriptionFk via SQL 

 * run in parallel 
     * maybe we can run synonym, classification and description import in parallel 

 Current import times on local PESI-HCL MySQL are: 

  * 13,5 h (create taxa and synonyms) - 3.571 M 
  * 31h     (create synonym relationship/FK)    - 1.526 M 
  * 15,5 xxx h    (higher taxonomy > species) 176 M  
  * xxx h (lower taxonomy 1.892 
  * xxx h extensions 
     * vernaculars 
     * species profile    (not yet imported) 
     * distribution 
     * description = text distributions 
     * references (citation) 
 

Back