Project

General

Profile

Actions

feature request #2625

closed

[E+M Overview] Data aggregation functionalities for E+M (TransmissionEngine)

Added by Andreas Müller over 12 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
New
Category:
cdmlib
Start date:
Due date:
% Done:

100%

Estimated time:
(Total: 0:00 h)
Severity:
normal

Description

Implement data aggregation workflows such as aggregating distribution data from sub areas to larger areas as well as lower taxa to higher taxa.

According to existing E+M aggregation functionality.

The latest code of the transmission engine is available here: Y:\BDI\PESI\EM_MCL\EM_MCL_DataAndProgramming\Anton_Programming\TransmissionEngineOccurrence (only BGBM internal access)

The latest version of the transmission engine code with additional comments is attached to this ticket TransmissionsEngineOccurrence_V14.bas

Important notes:

source references

source references of the accumulated distributions are also accumulated into the new distribution, this has been especially implemented for the EuroMed Checklist Vol2 and might not be a general requirement This was a mistake, we now have a special ticket for handling the aggregation of source references: #4366

summaryStatus

Each distribution information has a summaryStatus (emOccurSumCat.xlsx), this is an summary of the status codes (emOccurStatCat.xlsx) as stored in the fields of emOccurrence native, introduced, cutivated, ...

The summaryStatus seems to be equivalent to the distribution status @PresenceAbsenceTermBase@.

Each summary status has a priority field which specified the preference of one status over another.

These priorities have been defined an a long intensive process by Anton Güntsch and Eckhard Raab-Straube. It is questionable if these priorities are project specific or if they are applicable in general. This leads to the requirement that the priorities must not be hard coded, they must be stored in the database and in order to allow configuring them.

map generation

When generating maps from the accumulated distribution information some special cases have to be handled:

  1. if a entered or imported status information exist for the same area for which calculated (accumulated) data is available, the calculated data has to be given preference over other data.

  2. If there is an area with a sub area and both areas have the same status only the subarea status should be shown in the map, whereas the super area should be ignored. see #5050


The TransmissionEngineDistribution can be triggered via a REST service:

./description/accumulateDistributions?mode=[byAreas|byRanks|byAreasAndRanks]&frontendBaseUrl=<server-instance-base-URL>&priority=[1...7,DEFAULT:3]

This REST service is still a special implementation for the Euro+Med project. The parameters for the superAreas, the areas to which the subordinate areas should be projected, lowerRank, upperRank are hardcoded to: TDWG_LEVEL3 areas, SUBSPECIES, GENUS


check if #2083 (CICHORIEAE implement hierarchy for distribution status) is fixed by this once the transmission engine has been run on the cichorieae data


email discussion on how to treat references in the transmission engine: Discussion-Transmission_Referenzen.txt


Files

TransmissionsEngineOccurrence_V14.bas (21.1 KB) TransmissionsEngineOccurrence_V14.bas Andreas Kohlbecker, 02/18/2013 05:04 PM
emOccurStatCat.xlsx (9.44 KB) emOccurStatCat.xlsx Andreas Kohlbecker, 02/18/2013 05:04 PM
emOccurSumCat.xlsx (9.97 KB) emOccurSumCat.xlsx Andreas Kohlbecker, 02/18/2013 05:04 PM
status-mapping-eumed-cdm.ods (19.2 KB) status-mapping-eumed-cdm.ods Andreas Kohlbecker, 02/27/2013 05:14 PM
Discussion-Transmission_Referenzen.txt (3.66 KB) Discussion-Transmission_Referenzen.txt Andreas Kohlbecker, 01/20/2014 05:04 PM

Subtasks 2 (0 open2 closed)

bug #4134: Transmissionengine Distribution seems to miss distributions for higher TaxaClosedAndreas Müller

Actions
feature request #4366: Transmissionengine Distribution: implement rules for source referencesDuplicateAndreas Müller

Actions

Related issues

Related to EDIT - feature request #8677: Add distribution aggregation to set subtree menuClosedKatja Luther

Actions
Related to EDIT - bug #8312: Test, fix, improve and run Transmission engine for E+MClosedAndreas Müller

Actions
Related to EDIT - task #8651: Unify description aggregation methods (distribution and structured descriptive data)ClosedAndreas Müller

Actions
Related to EDIT - task #8679: Further unify description aggregation methods ClosedAndreas Müller

Actions
Related to EDIT - task #8871: Remaining issues to unify description aggregation methods In ProgressAndreas Müller

Actions
Related to EDIT - task #8811: Open issues for "Add distribution aggregation to set subtree menu"NewKatja Luther

Actions
Actions #1

Updated by Andreas Müller over 12 years ago

  • Aggregation of distribution data according to ranks and regions (#2630)
Actions #2

Updated by Andreas Müller over 12 years ago

  • Priority changed from Priority10 to Priority14
Actions #3

Updated by Andreas Kohlbecker about 11 years ago

@Andreas Müller:

what do you think is the best place to put the priorities into the database, Extensions?

Actions #4

Updated by Andreas Kohlbecker about 11 years ago

  • Status changed from New to In Progress
  • Assignee changed from Andreas Müller to Andreas Kohlbecker

I decided to use Extensions to store priorities.

Actions #5

Updated by Andreas Müller about 11 years ago

Replying to a.kohlbecker:

I decided to use Extensions to store priorities.

Sorry for answering only now. I am not sure if extensions are the best choice. They are string based and therefore usually not the first option to express an order. However, we do have already an ExtensionType "Order" (uuid = "ecb7770d-a295-49ee-a88f-e9e137a7cabb") which we could use for it.

For me the more natural choice would be the order of the vocabulary itself. Atleast as long as there is no other reason for having "PresenceAbsenceTerms" ordered. But if we do so we have to discuss 2 things:

  • Reorder the current PresenceTerm and AbsenceTerm vocabulary according to the "priority" in E+M

  • Merge the presence and the absence terms into 1 class (this has been discussed long time ago), instead use a absence flag maybe.

However, if you can think about any other semantics for the "ordered" attribute of the presenceAbsence vocabularies we should rethink this solution.

Actions #6

Updated by Andreas Kohlbecker about 11 years ago

Replying to a.mueller:

Replying to a.kohlbecker:

I decided to use Extensions to store priorities.

Sorry for answering only now. I am not sure if extensions are the best choice. They are string based and therefore usually not the first option to express an order. However, we do have already an ExtensionType "Order" (uuid = "ecb7770d-a295-49ee-a88f-e9e137a7cabb") which we could use for it.

For me the more natural choice would be the order of the vocabulary itself. Atleast as long as there is no other reason for having "PresenceAbsenceTerms" ordered. But if we do so we have to discuss 2 things:

  • Reorder the current PresenceTerm and AbsenceTerm vocabulary according to the "priority" in E+M

  • Merge the presence and the absence terms into 1 class (this has been discussed long time ago), instead

use a absence flag maybe.

However, if you can think about any other semantics for the "ordered" attribute of the presenceAbsence vocabularies we should rethink this solution.

Anton was not sure that the priorities are necessarily the same for all projects, so we need some flexibility here, so using the term order is not really an option. Furthermore there are also terms which are omitted during the aggregation process, these terms do not have a priority at all, maybe a negative one? Where should these terms be put at the top of the list at the bottom? Would a term select list still be useful for users or rather confusing since the terms order looks a bit arbitrary?

I think we should not try superimposing a "secret" meaning into the term order. For the moment using the Extensions is a really good choice since this is not causing a model change.

Actions #7

Updated by Andreas Kohlbecker about 11 years ago

check if #2083 (CICHORIEAE implement hierarchy for distribution status) is fixed by this once the transmission engine has been run on the cichorieae data

Actions #8

Updated by Andreas Kohlbecker almost 11 years ago

dataportal sie implemented: r17640

Actions #9

Updated by Andreas Kohlbecker almost 11 years ago

additional work on the library side [17642:17665]

Actions #10

Updated by Andreas Kohlbecker about 10 years ago

adding text of email discussion on how to treat references in the transmission engine: Discussion-Transmission_Referenzen.txt

Actions #11

Updated by Andreas Kohlbecker about 10 years ago

  • Status changed from In Progress to Resolved
  • Assignee changed from Andreas Kohlbecker to e.raab-straube -

as far as I remember this is ticket is completed and can now be reviewed

Actions #12

Updated by Andreas Kohlbecker about 10 years ago

  • Keywords set to Euro+Med,Migration
Actions #13

Updated by Andreas Kohlbecker over 9 years ago

  • Subject changed from [E+M Overview] Data aggregation functionalities for E+M to [E+M Overview] Data aggregation functionalities for E+M (TransmissionEngine)
Actions #14

Updated by Andreas Müller almost 9 years ago

  • Target version changed from Euro+Med Migration to Euro+Med Portal Release
Actions #15

Updated by Andreas Kohlbecker almost 8 years ago

  • Assignee changed from e.raab-straube - to Andreas Kohlbecker
Actions #16

Updated by Andreas Kohlbecker almost 8 years ago

  • Keywords changed from Euro+Med,Migration to Euro+Med,Migration,TransmissionEngineDistribution
Actions #17

Updated by Andreas Kohlbecker almost 7 years ago

  • Private changed from Yes to No
Actions #18

Updated by Andreas Müller over 4 years ago

  • Description updated (diff)
  • Assignee changed from Andreas Kohlbecker to Andreas Müller
Actions #19

Updated by Andreas Müller over 4 years ago

Actions #20

Updated by Andreas Müller over 4 years ago

  • Related to bug #8312: Test, fix, improve and run Transmission engine for E+M added
Actions #21

Updated by Andreas Müller about 3 years ago

  • Related to task #8651: Unify description aggregation methods (distribution and structured descriptive data) added
Actions #22

Updated by Andreas Müller about 3 years ago

  • Related to task #8679: Further unify description aggregation methods added
Actions #23

Updated by Andreas Müller about 3 years ago

  • Related to task #8871: Remaining issues to unify description aggregation methods added
Actions #24

Updated by Andreas Müller about 3 years ago

  • Related to task #8811: Open issues for "Add distribution aggregation to set subtree menu" added
Actions #25

Updated by Andreas Müller about 3 years ago

  • Status changed from Resolved to Closed

This should be fixed with the transmission engine implementations (see subtasks) and with the implementation of description aggregations (see related tickets) as well as the integration of description aggregation into the TaxEditor navigator menu (#8677).

Actions

Also available in: Atom PDF