Project

General

Profile

Actions

task #4232

closed

Check all Comparators for fulfilling the Comparator contract

Added by Andreas Müller almost 10 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Highest
Assignee:
Category:
cdmlib
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Severity:
critical

Description

Hi Everybody,

    The issue raised with http://dev.e-taxonomy.eu/trac/ticket/4230 <http://dev.e-taxonomy.eu/trac/ticket/4230>  and some other tickets  (#4226, #4114, #3333) is highly critical with Java 7+ and everybody should be aware of it when responsible for writing Comparators.

The Comparator contract requires that

1) compare(o1,o2) == 0 must be true and ONLY be true if o1.equals(o2).

2) sgn(compare(o1,o2)) = -sgn(compare(o2,o1)) (symmetrie)

3) compare function must be transitive

Checking our comparators in cdmlib-model (and some others) I realized that almost none of our comparators does fulfill all of these conditions. So I would like to remind us all to be aware of this. Java 7 is more sensitive on these conditions as the new TimSort sort algorithm throws exceptions otherwise.

Condition 1) for CdmBase objects usually means that compare(o1,o2) MUST return 0 whenever 2 objects do have the same uuid. This can easily be managed by checking if this is the case as a very first step and returning 0 if yes.

However, the second implication is more tricky. You must guarantee that in no other cases 0 is returned.

So having something like

 Compare(o1,02){

    x1 = o1.getX();
    x2 = o1.getX();
    return x1.compare(x2);
 }

which I have seen in many comparators is generally WRONG because x1.compare(x2) will usually not guarantee that o1.getUuid.equals(o2.getUuid()).

A concrete example are taxon comparators which send the comparison to the attached names. In case both taxa do use the exact same name this will definetly return 0 even if name.compare(name) is correctly implemented. However, the taxa do not have the same uuid and therefore they should not return 0.

A possible correct implementation could do

    ...

    int result = name1.compare(name2);
    if (result != 0){
        return result
   }else{
        return taxon1.getUuid().compare(taxon2.getUuid());
   }

Condition 2) and 3) require more individual handling. But be aware that your ordering must never be arbitrary. E.g.

compare(name1, name2){
    rank1 = name1.getRank()
    rank2 = name2.getRank()
    if (rank1 == null) return 1;
    if (rank2 == null) return -1;
    ...
}

will always be INVALID as compare(name1, name2) == compare(name2, name1) == 1 is possible for both ranks == null, but is against the symmetrie contract (2).

So EVERYONE who will write comparators or did so in the past please carefully check your code for correct implementations.

@Katja: we both will need to check this systematically in CdmLib for old code. Also a very critical method is e.g. IdentifiablyEntity.compare()

Cheers,

Andreas M.

P.S:

Be aware that 2) and 3) and partly 1) also hold for the Comparable interface (comparetTo() method).


Files

comparatorlist.xlsx (11.2 KB) comparatorlist.xlsx Katja Luther, 02/09/2017 01:13 PM

Related issues

Copied to EDIT - bug #6311: Refactor IdentifiableEntity.compareTo methodRejectedAndreas Müller

Actions
Actions #1

Updated by Andreas Müller almost 10 years ago

some fixes done with r20934, r20936-20939

Actions #2

Updated by Andreas Kohlbecker almost 9 years ago

  • Target version changed from cdmlib RELEASE 3.5.0 to cdmlib RELEASE 3.5.1

moving tickets to next milestone

Actions #3

Updated by Andreas Müller almost 9 years ago

  • Target version changed from cdmlib RELEASE 3.5.1 to cdmlib RELEASE 3.5.2

move open 3.5.1 tickets to next milestone after release

Actions #4

Updated by Katja Luther over 8 years ago

  • Status changed from New to In Progress

The IdentifiableEntity.compareTo method needs to be updated.

First of all I added the test of equality of the uuids.

Next step would be the problem if the name of two taxa is the same and the sec is the same (for example both taxa don't have a sec ref) the method would return 0, but this should happen only if the taxa are really equal. Normaly this should not happen but may be the last comparison should be a comparison of the created or updated date?

Actions #5

Updated by Andreas Müller about 8 years ago

  • Target version changed from cdmlib RELEASE 3.5.2 to Unassigned CDM tickets
Actions #6

Updated by Andreas Müller about 7 years ago

  • Copied to bug #6311: Refactor IdentifiableEntity.compareTo method added
Actions #7

Updated by Andreas Müller about 7 years ago

  • Description updated (diff)

moved IdentifiableEntity.compareTo issue to new ticket #6311

Actions #8

Updated by Andreas Müller about 7 years ago

  • Target version changed from Unassigned CDM tickets to Release 4.6

Please report results of current effort and maybe check for new Comparators

Actions #9

Updated by Katja Luther about 7 years ago

  • File comparatorlist.xlsx added
Actions #11

Updated by Katja Luther about 7 years ago

  • File deleted (comparatorlist.xlsx)
Actions #12

Updated by Andreas Müller about 7 years ago

KL: AreaNodeComparator in FloraCubaCondensedDistributionComposer

Hier werden die namedAreas verglichen, indem die compareTo Methode von OrderedTermBase aufgerufen wird. Wenn aber jetzt die beiden Terme nicht im gleichen Vokabular enthalten sind, werden sie trotzdem nach dem orderIndex sortiert, dieser kann dann ja gleich sein, weil sie in unterschiedlichen Vokabularen sind. D.h. die compare Methode gibt 0 bei nicht gleichen Termen zurück, oder??

Actions #13

Updated by Andreas Müller about 7 years ago

Andreas Müller wrote:

KL: AreaNodeComparator in FloraCubaCondensedDistributionComposer

Hier werden die namedAreas verglichen, indem die compareTo Methode von OrderedTermBase aufgerufen wird. Wenn aber jetzt die beiden Terme nicht im gleichen Vokabular enthalten sind, werden sie trotzdem nach dem orderIndex sortiert, dieser kann dann ja gleich sein, weil sie in unterschiedlichen Vokabularen sind. D.h. die compare Methode gibt 0 bei nicht gleichen Termen zurück, oder??

Letztlich wird OrderedTermBase.performCompareTo mit skipVocabularyCheck = false aufgerufen. Daher werden alle Fälle mit unterschiedlichen Vokabularen über eine Exception abgefangen und führen zu keinem Ergebnis, auch nicht zu einem falschen. Somit wir der Comparator Contract m.E. nicht verletzt.

Trotzdem ist das ein bisschen gefährlich, falls wir an der Implementierung mal ein bisschen was ändern. Daher habe ich in https://dev.e-taxonomy.eu/redmine/projects/edit/repository/revisions/39d13a15932e979559b21bf484a122454dd00f38 die Methode ein bisschen angepasst. Kannst du das bitte gegenchecken, auf Richtigkeit und auf mögliche NPEs oder Ähnliches.

Actions #14

Updated by Andreas Müller about 7 years ago

KL: Bei DistributionNodeByAreaLabelComparator

Würde hier auch nochmal einen weiteren Vergleich einfügen, wenn die Label gleich sind, oder kommt das nicht vor?

Actions #15

Updated by Andreas Müller about 7 years ago

Andreas Müller wrote:

KL: Bei DistributionNodeByAreaLabelComparator

Würde hier auch nochmal einen weiteren Vergleich einfügen, wenn die Label gleich sind, oder kommt das nicht vor?

Ja vermutlich hast du Recht. Ich habe den Comparator angepasst und auch einen Test geschrieben: https://dev.e-taxonomy.eu/redmine/projects/edit/repository/revisions/375d001e7222d435c799e9232e856fb98de32e1d

Actions #16

Updated by Andreas Müller about 7 years ago

  • Tracker changed from bug to task
  • Status changed from In Progress to Resolved

Probably we can close this ticket now?

Actions #17

Updated by Katja Luther about 7 years ago

  • Status changed from Resolved to Closed

Yes then we can close the ticket.

Actions #18

Updated by Andreas Müller about 7 years ago

  • % Done changed from 0 to 100
  • Private changed from Yes to No
Actions

Also available in: Atom PDF