Project

General

Profile

Actions

bug #9143

closed

Distribution data sources are missing if multiple distributions exist

Added by Andreas Kohlbecker about 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Highest
Category:
cdm-dataportal
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Severity:
normal
Found in Version:
Tags:

Description

Distribution data sources are missing if multiple distributions exist for the same area and the same status. This is the case in the below example where aggregated distributions without sources are merged with explicit distributions with sources.

ERS:

etwas anderes ist jetzt komisch: einige Quellen für Verbreitungsdaten werden nicht (mehr) angezeigt, obwohl eindeutig vorhanden.

Vergleiche: (Aster alpinus) Czech Republic, France, Greece.... keine Quellen!

http://www.europlusmed.org/cdm_dataportal/taxon/229a548c-fec8-4f15-9058-8d340bb97c25


Files

picture103-1.png (128 KB) picture103-1.png Andreas Müller, 07/10/2020 06:45 PM

Related issues

Related to EDIT - feature request #4366: Transmissionengine Distribution: implement rules for source referencesDuplicateAndreas Müller

Actions
Related to EDIT - feature request #5050: revise the subAreaPreference rule for filtering DistributionsClosedAndreas Kohlbecker

Actions
Actions #1

Updated by Andreas Kohlbecker about 4 years ago

  • Status changed from New to In Progress
  • Assignee changed from Andreas Kohlbecker to Andreas Müller

AM:

also da sind in der DB auch keine Quellen dran, da kann im Portal auch nichts rauskommen.

Übrigens sehen die Verbreitungsdaten mir aus, als ob die durch einen Import reinkamen oder so. Jedenfalls ist kein CreatedBy genannt. War aber erst am 27.11. letztes Jahr. Müssten wir nochmal checken, was da war, falls dir dazu nichts einfällt, Eckhard.

Actions #2

Updated by Andreas Müller about 4 years ago

so, jetzt seh ich das ein bisschen genauer. Das Problem ist, dass es sich um aggregierte Daten handelt:

Aber nicht nur. Z.B Czech Republic hat sowohl aggregierte Daten als auch Originaldaten an dieser Art. Für aggregierte Daten hatten wir mal gesagt, dass diese keine Quellen anzeigen sollen, also weder einen Link zu den darunter liegenden Taxa noch zu den Originalquellen. Das könnte aber geändert werden.

Das Problem ist allerdings, dass es wohl einen Fehler im Anzeigealgorithmus gibt, wenn das gleiche Areal 2x vorkommt, 1x mit Quellen und 1x ohne. Da werden die Quellen nicht zusammengezogen, wenn es sich um den gleichen Status handelt. Bei abweichendem Status ist das glaube ich gewollt, dass nur der „höchste“ Werte genommen wird und auch nur die Quellen von diesem. Bei gleichrangigen Quellen ist das aber natürlich Quatsch, da muss man die Quellen addieren.

Actions #3

Updated by Andreas Müller about 4 years ago

  • Subject changed from Aparranly distribution data source are missing to Distribution data sources are missing if multiple distributions exist
  • Description updated (diff)
Actions #4

Updated by Andreas Müller about 4 years ago

  • Assignee changed from Andreas Müller to Andreas Kohlbecker
  • Priority changed from New to Highest

The problem is already described in #4366. However, there it is handled as pure data aggregation (transmission engine) issue. But I don't think it is an aggregation issue. If data exists from aggregation and from explicit data sources they should be combined and not taken either from one or the other distribution record (except for use-cases where this is required).

So the problem is in DescriptionUtility.filterDistributions(). There is a rule applied that computed distributions are prefered over explicit distributions:

        // 2) remove not computed distributions for areas for which computed
        //    distributions exists
        //
        if(preferComputed) {

So this means that even if the status of the computed distribution is less it is still taken (e.g. if computed -from taxonomic children- status is "introduced" and the explicit distribution is "native" the computed status "introduced" is taken) which I do not understand.

@Andreas K., as you have written this code do you remember why it was written like this?
@Eckhard: is there a rule why it should be like this?

The consequence of this rule is that the sources of explicit distribution are never shown if a computed distribution exists.

See also DescriptionUtilityTest.testFilterDistributions_computed() for the currently expected behavior.

Actions #5

Updated by Andreas Müller about 4 years ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by Andreas Müller about 4 years ago

  • Related to feature request #4366: Transmissionengine Distribution: implement rules for source references added
Actions #7

Updated by Andreas Kohlbecker about 4 years ago

  • Description updated (diff)
Actions #8

Updated by Andreas Kohlbecker about 4 years ago

  • Assignee changed from Andreas Kohlbecker to Andreas Müller

Andreas Müller wrote:

The problem is already described in #4366. However, there it is handled as pure data aggregation (transmission engine) issue. But I don't think it is an aggregation issue. If data exists from aggregation and from explicit data sources should be combined and not taken either from one or the other distribution record (except for use-cases where this is required).

So the problem is in DescriptionUtility.filterDistributions(). There is a rule applied that computed distributions are prefered over explicit distributions:

        // 2) remove not computed distributions for areas for which computed
        //    distributions exists
        //
        if(preferComputed) {

So this means that even if the status of the computed distribution is less it is still taken (e.g. if computed -from taxonomic children- status is "introduced" and the explicit distribution is "native" the computed status "introduced" is taken) which I do not understand.

@Andreas K., as you have written this code do you remember why it was written like this?
@Eckhard: is there a rule why it should be like this?

The consequence of this rule is that the sources of explicit distribution are never shown if a computed distribution exists.

See also DescriptionUtilityTest.testFilterDistributions_computed() for the currently expected behavior.

As far as I can remember it was meant like that:

Once the transmission engine distribution has run, the computed information is more complete and thus more valuable than the entered or imported data. This fact shoud also be clear after reading the parameter documentation:

Prefer computed rule: Computed distributions are preferred over entered or imported elements. (Computed description elements are identified by the MarkerType.COMPUTED()). This means if a entered or imported status information exist for the same area for which computed data is available, the computed data has to be given preference over other data.

When there is a situation in which the edited/imported distribution status is "native" in contrast to the computed status with value "introduced" I would assume that this is explicitly wanted or that there is a problem in the aggregation algorithm. Otherwise the edited/imported distribution status would in many cases overwrite the computation result. And in this case the question has to be raised why this rule is not incorporated into the aggregation algorithm.

Actions #9

Updated by Andreas Müller about 4 years ago

  • Assignee changed from Andreas Müller to Andreas Kohlbecker

To me your answer seems to imply that computed data and "normal" data are based on the same data and therefore should be equal. But this is not (always) the case.

Example: We have data for A. alba from 2 sources. One has data for A. alba subsp. alba and A. alba subsp. pinus. The other source only handles species level data and has data for A. alba. Source 1 handles both subspecies as introduced, the second source handles them as native (for what ever reason both sources come to this result). Then we have on species level the computed (aggregated) data from source 1 with introduced and the data from source 2 with native. In this case source 2 should win as it has the higher status. But with the above rule it does not. And I can't see a logical reason for this.

@Eckhard: do you agree that in the above case status "native" (from source 2) should win? Or do I oversee something here?

Actions #10

Updated by Andreas Müller about 4 years ago

Andreas Müller wrote:

The problem is already described in #4366. However, there it is handled as pure data aggregation (transmission engine) issue. But I don't think it is an aggregation issue. If data exists from aggregation and from explicit data sources should be combined and not taken either from one or the other distribution record (except for use-cases where this is required).

Do we agree here? My suggestion is that sources need to be combined and not only taken from the computed data. Adapting the above example and saying that also source 1 uses status native this means that the distribution for A. alba gets status native and both sources, source 1 and source 2, are attached, not only source 1 (as it is the case now).

Actions #11

Updated by Andreas Müller about 4 years ago

  • Target version changed from Release 5.18 to Release 5.19

I move this to 5.18 as it won't be fixed today.

Actions #12

Updated by Andreas Müller about 4 years ago

  • Target version changed from Release 5.19 to Release 5.18
Actions #13

Updated by Andreas Kohlbecker almost 4 years ago

Andreas Müller wrote:

Andreas Müller wrote:

The problem is already described in #4366. However, there it is handled as pure data aggregation (transmission engine) issue. But I don't think it is an aggregation issue. If data exists from aggregation and from explicit data sources should be combined and not taken either from one or the other distribution record (except for use-cases where this is required).

Do we agree here? My suggestion is that sources need to be combined and not only taken from the computed data. Adapting the above example and saying that also source 1 uses status native this means that the distribution for A. alba gets status native and both sources, source 1 and source 2, are attached, not only source 1 (as it is the case now).

Eckhard wrote:

Yes, I agree. Both sources should be attached.

Actions #14

Updated by Andreas Müller almost 4 years ago

  • Status changed from Feedback to New
  • Assignee changed from Andreas Kohlbecker to Andreas Müller
  • % Done changed from 0 to 10
Actions #15

Updated by Andreas Müller over 3 years ago

  • Target version changed from Release 5.18 to Release 5.21
Actions #16

Updated by Andreas Müller over 3 years ago

Actions #17

Updated by Andreas Müller over 3 years ago

  • Status changed from New to Resolved
  • Target version changed from Release 5.21 to Release 5.20
  • % Done changed from 10 to 50
Actions #18

Updated by Andreas Müller over 3 years ago

  • Status changed from Resolved to Closed
  • % Done changed from 50 to 100

I tested on test https://test.e-taxonomy.eu/dataportal/preview/euromed/cdm_dataportal/taxon/229a548c-fec8-4f15-9058-8d340bb97c25 and the result looks as expected for Czech Republic, France, Greece so I close this ticket.

Eckhard, please let us know if the above link does not show the expected result or if after the next release in production there are still sources missing.

Actions #19

Updated by Andreas Müller over 3 years ago

ERS: Hi, this looks good now on the test portal for the mentioned areas.

Actions

Also available in: Atom PDF