Project

General

Profile

Actions

task #10324

closed

Export Caucasus data

Added by Andreas Müller 11 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Priority13
Category:
cdmadapter
Target version:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Severity:
normal

Description

... to Georgia, Armenia and Azerbaijan databases.

Filters:

  • Taxonomic: all taxa, published and unpublished, but no Kew/ILDIS data
  • Geographic: all taxa with distribution in the respective countries
    • ... and distributions only from the respective countries
    • include parent taxa (without distribution) and absent status
    • status undefined (currently imported, but not shown)
  • Content: distribution and common names , but common name filter still needs to be discussed (which languages and which areas - see also #10324#note-28), exclude other features(?)
  • Feature tree
  • Status: do not import aggregated descriptions and empty (after filter) descriptions
  • Endemism
    • fine tuning
  • Area trees (including Caucasus local areas), excluding common name areas not in use
  • Name duplicates (synonymy <-> name used in source): sent table to ERS (2023-07-04)
  • orphaned taxa introduced due to name reusage, e.g. https://portal.cybertaxonomy.org/georgia/cdm_dataportal/taxon/6ee1e37a-0ec7-4930-bbc2-0d3e7cc80eb8
  • Decision on Tcs/Cc Taxa (see TCS_CS_Taxa.xlsx)

Before final import:

  • check status of Kew taxa with ERS

After import:

  • rename Common Name -> Common Names
  • Update taxon treeindex

Related issues

Related to EDIT - task #10239: Caucasus databases and portals availableClosedAndreas Müller

Actions
Related to EDIT - feature request #9197: Handle caucasus conspectus data in E+M and caucasus portalRejectedAndreas Müller

Actions
Related to EDIT Platform Etablierung - task #8689: Export Flora of Greece Bupleurum using Cdm2Cdm exportClosedAndreas Müller11/18/2019

Actions
Related to EDIT - task #10183: Import Buxales via Cdm2CdmClosedAndreas Müller

Actions
Related to EDIT - feature request #9771: Implement Cdm2Cdm for vocabulariesClosedAndreas Müller

Actions
Actions #1

Updated by Andreas Müller 11 months ago

  • Related to task #10239: Caucasus databases and portals available added
Actions #2

Updated by Andreas Müller 11 months ago

Actions #3

Updated by Andreas Müller 11 months ago

  • Related to task #8689: Export Flora of Greece Bupleurum using Cdm2Cdm export added
Actions #4

Updated by Andreas Müller 11 months ago

  • Related to task #10183: Import Buxales via Cdm2Cdm added
Actions #5

Updated by Andreas Müller 11 months ago

Actions #6

Updated by Andreas Müller 11 months ago

What about endemism data? See also #10239#note-2

Actions #7

Updated by Andreas Müller 11 months ago

Also discuss WFO-IDs for names

NaK:

die WFO-IDs sollten in die Kaukasus-Instanzen auf jeden Fall rein.

Actions #8

Updated by Andreas Müller 10 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10
Actions #9

Updated by Andreas Müller 10 months ago

ERS:

we did not discuss endemism yesterday. Data on endemism will be important for each checklist.

Which data on endemism do we currently have in Euro+Med Plantbase?

Endemism in Euro+Med Plantbase is not an attribute for a country or other subarea, but for the top area Euro+Med. If we have for a given taxon the area Euro+Med and the indication endemic, and distribution records for only one country, then this taxon is endemic to that country.

I would suggest, but this needs to be discussed with the partners, that two (or even three) categories of endemics will be useful for each country:

  • (local endemics (at subarea level, e.g. Nakhchivan or Abkhazia))
  • country endemics (Ab, Ar or Gg)
  • Caucasus endemics (occurring in more than one country, but only in the Caucasus, i.e. only Ab, Ar, Gg or Rf(CS)=North Caucasus).

So we need to decide how to handle endemism in the three new instances, and which endemism data should be included in the export from E+M.

Actions #10

Updated by Andreas Müller 10 months ago

MM:

Georgia is reported among distribution for Abies nordmanniana (Steven) Spach without reference. The references are addad in TaxEditor but it seems they are not published in Euro+Med.
I am reporting the cases that I think may be they need to be edited in Euro+Med. Please let me know if these are normal and do not need to be reported.

ERS:

thank you for observing this. Both Cystopteris fragilis and Abies nordmanniana are taxa with subspecies, and you find the typical subspecies in Georgia. The reference is only given for the subspecies, not for the species. The distribution for the species is calculated from the subspecies, and because it is calculated, the reference is not given directly (but we have to possibility to add it if we want). So you do not need to report further cases, this is a known fact to us that data here are not always complete.
Maybe this is also something we need to think about when creating the export files for the three Caucasus instances?

Actions #11

Updated by Andreas Müller 10 months ago

Available languages for common names:

SELECT deb.language_id, lang.titleCache, lang.uuid, COUNT(*) AS n
FROM DescriptionElementBase deb INNER JOIN DefinedTermBase lang ON lang.id = deb.language_id
GROUP BY deb.language_id
ORDER BY lang.titleCache
Actions #12

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #13

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #14

Updated by Andreas Müller 10 months ago

Remove import sources does not work yet

Actions #15

Updated by Andreas Müller 10 months ago

Excluded nodes:

SELECT tn.id, tn.UUID, treeIndex, n.nameCache
FROM TaxonNode tn INNER JOIN TaxonBase tb ON tb.id = tn.taxon_id INNER JOIN TaxonName n ON n.id = tb.name_id
WHERE (1=0) OR
 tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#50717#51601#40724#' -- (Amaryllidaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#52420#48555#40499#' -- (Araliaceae)
// OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#50717#51601#40629#' -- (Asparagaceae)  -- mostly done so I removed it from negative filter
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#52436#51035#44434#' -- (Betulaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#52436#52441#42193#' -- (Euphorbiaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#52436#50871#44688#' -- (Fabaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#52436#51035#39628#' -- (Fagaceae)

OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#50717#51601#40557#' -- (Iridaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#50717#51116#49935#' -- (Juncaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#52420#49316#39608#' -- (Lamiaceae)
OR tn.treeIndex LIKE '#t10#10#56284#54730#51914#54954#50717#51601#40684#' -- (Orchidaceae)
ORDER BY tn.treeIndex
Actions #16

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #17

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #18

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
  • % Done changed from 10 to 30
Actions #19

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #20

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #21

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #22

Updated by Andreas Müller 10 months ago

  • % Done changed from 30 to 50
Actions #23

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #24

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #25

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #26

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #27

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #28

Updated by Andreas Müller 10 months ago

Common names. ERS:

Also, wenn man individuell filtern kann, sollte es eher so aussehen:

Georgia: - Georgian (Georgia)
-   Russian (Georgia)
-   Russian (Russia)

Armenia: - Armenian (Armenia)
-   Russian (Armenia)
-   Russian (Russia)

Azerbaijan: - Azerbaijani (Azerbaijan)
-   Russian (Azerbaijan)
-   Russian (Russia)
Actions #29

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #30

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #31

Updated by Andreas Müller 10 months ago

  • Target version changed from Release 5.44 to Release 5.40
Actions #32

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #33

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #34

Updated by Andreas Müller 10 months ago

  • Priority changed from New to Highest
  • Target version changed from Release 5.40 to Release 5.44
Actions #35

Updated by Andreas Müller 10 months ago

Check how many taxa are already imported:

SELECT tn.id, tn.created, tn.countChildren, tn.treeIndex, tn.parent_id, tn.sortIndex, acc.id tId, acc.titleCache
FROM TaxonNode tn INNER JOIN TaxonBase acc ON acc.id = tn.taxon_id
ORDER BY tn.treeIndex,  acc.titleCache; 
Actions #36

Updated by Andreas Müller 10 months ago

  • Description updated (diff)
Actions #37

Updated by Andreas Müller 9 months ago

ERS:

Specifications, as discussed during the workshop in Georgia:

  • No common names in Russian nor in English. Georgian common names will be provided by our counterparts for a data import.
  • No heterotypic synonyms, because deleting superfluous synonyms will be much more cumbersome than adding only the needed ones
  • Only taxa with distribution in Gg, Gg(A), Gg(D), Gg(G)
  • Only Euro+Med Taxa (no ILDIS or Kew Taxa)

A table of administrative subdivisions of Georgia, as well as shape files, will be provided by our counterparts in Georgia. I believe this can also be added after the instance split.

At present, additional data (mainly names and distributions for grasses) are input in the E+M main instance, because they are needed for all three countries. This is planned until end of August.

Georgians are not happy with the present E+M subdivision of Georgia in Abkhasia (Gg(A)), Adjara (Gg(D) and the rest of Georgia (Gg(G)), because this somehow implies that those areas are on the same administrative level, which is not the case. For them, Georgia is Georgia and should not be divided in E+M, but only in the Georgian instance, where there will be ca. 9-10 subdivisions, Abkhasia and Adjara are just two of them and extant data for them will be kept as such only in the Georgian slice.

However, after the definite export is done, the Georgian subdivisions in Euro+Med Plantbase should be merged and not used nor shown anymore.

I believe that the Georgian colleagues have a point here, because subdivisions are used somewhat inconsistent in E+M: For instance, Hs(A) is not a part of Spain, but independent Andorra, Si(M) is not part of Sicily, but independent Malta, Au(L) is not part of Austria, but independent Liechtenstein and so on – so one might believe that also Gg(A) and Gg(D) are not part of Georgia, but independent countries. On the other hand, Rf(C), Rf(E) and other subdivisions of Russia are certainly part of Russia, but those subdivisons do not correspond to administrative units. So at present, Gg(A) and Gg(D) are the only administrative subdivisions which are not independent countries. So we have good reason not to use them in future and merge the data for them into Georgia as a whole.

Please come back to me with any questions regarding the export for Georgia.

Actions #38

Updated by Andreas Müller 8 months ago

  • Target version changed from Release 5.44 to Release 5.41
Actions #40

Updated by Andreas Müller 6 months ago

  • Target version changed from Release 5.41 to Release 5.42
Actions #41

Updated by Andreas Müller 3 months ago

  • Status changed from In Progress to Resolved

are there open issues?

Actions #42

Updated by Andreas Müller 3 months ago

  • Description updated (diff)
Actions #43

Updated by Andreas Müller 3 months ago

  • Priority changed from Highest to Priority14
Actions #44

Updated by Andreas Müller 3 months ago

  • Priority changed from Priority14 to Priority13
Actions #45

Updated by Andreas Müller 3 months ago

  • Status changed from Resolved to Closed
  • Target version deleted (Release 5.42)
  • % Done changed from 50 to 100

All databases are exported

Actions

Also available in: Atom PDF