Project

General

Profile

task #7446

Updated by Andreas Müller almost 6 years ago

about 560.000 Extensions are currently imported.  

 We need to check if they are all needed or if some of them are duplicates. Also check if they should better run into alternative identifiers. 

 Once done we need to check if they should be shown on the dataportal. 

 ~~~ sql 
 SELECT    e.type_id, et.titleCache, ab.DTYPE, r.refType,    dtb.DTYPE, count(*) as n 
 FROM Extension e INNER JOIN DefinedTermBase et ON et.id = e.type_id 
 LEFT OUTER JOIN AgentBase_Extension abMN ON abMN.extensions_id = e.id 
 LEFT OUTER JOIN AgentBase ab ON ab.id = abMN.AgentBase_id 
 LEFT OUTER JOIN DefinedTermBase_Extension dtbMN ON dtbMN.extensions_id = e.id 
 LEFT OUTER JOIN DefinedTermBase dtb ON dtb.id = dtbMN.DefinedTermBase_id 
 LEFT OUTER JOIN Reference_Extension rMN ON rMN.extensions_id = e.id 
 LEFT OUTER JOIN Reference r ON r.id = rMN.Reference_id 

 GROUP BY type_id, et.titleCache,    ab.DTYPE, r.refType,    dtb.DTYPE 
 ORDER BY et.titleCache, ab.DTYPE, r.refType,    dtb.DTYPE, n 
 ~~~ 

 Done: 

  * common name reference language deduplicated 

 TODO: 
 
  * remove experts and species experts, check if same as sec ref now and adapt PESI import to use sec in case of E+M 
  * Berlin Model IdInSource currently only for Synonyms, where do they come from, can they be moved to alternative identifiers, why do accepted taxa not have this id, is the semantics always the same, aren't there similar fields in other BM tables? 
  * very few references have DateString extensions, this should be unified with similar information in Reference.notes, RefDetail notes, and other fields with similar information 

Back