Project

General

Profile

bug #7829

Improve deduplication of parsed names and references

Added by Andreas Müller over 2 years ago. Updated about 1 month ago.

Status:
Closed
Priority:
Highest
Category:
cdmlib
Target version:
Start date:
10/16/2018
Due date:
% Done:

100%

Severity:
normal
Found in Version:
Tags:

Description

The default matching strategies are to strict for data name and nom. ref. data created by a parser. Also there are some mistakes in the defined matching.

Generally we need to make more often use of EQUAL_OR_SECOND_NULL match mode, because the parsed data is always not very complete while existing data might be more complete. E.g. the authors of a reference might be stored with full name while the full name is usually not available by parsed data. Also place published might be unknown in the parsed data but this might be discussed as it is sometimes part of the parsed string.

We probably need a MatchingStrategyFactory that offers matching strategies specific for parsed data compared to persisted richer data.

===

There were also wrong matching like not matching null and empty datePublished and taking nomenclaturallyRelevant into account.


Related issues

Related to Edit - feature request #7800: Parse preliminary RefDetails Closed 09/29/2018
Related to Edit - feature request #9085: Improve deduplication of parsed names Closed 06/19/2020
Related to Edit - bug #1119: [PARSER] Duplicate inreferences created during parsing are NOT merged when data gets saved Duplicate 10/12/2009
Related to Edit - bug #9157: Further improve deduplication of names In Progress 07/17/2020

Associated revisions

Revision 14ec817b (diff)
Added by Andreas Müller over 2 years ago

ref #7800 parse preliminary RefDetails (first start)

Revision 1c4a145c (diff)
Added by Andreas Müller over 2 years ago

ref #7829 ref #7800 remove nomenclaturallyRelevant from Reference matching as it is not used at all

Revision fbe910aa (diff)
Added by Andreas Müller over 2 years ago

ref #7829, ref #7800 improve nom. ref. matcher (also changes the result type => needs adaptation in client Apps)

Revision 2843ea50 (diff)
Added by Andreas Müller over 2 years ago

ref #7829 fix missing getMatchModeName in TaxEditor

Revision 244ab5da (diff)
Added by Andreas Müller over 2 years ago

ref #7829 fix compilation errors after changes for !MatchStrategy (preliminary)

History

#1 Updated by Andreas Müller over 2 years ago

#2 Updated by Andreas Müller over 2 years ago

  • Tags set to euro+med

#3 Updated by Andreas Müller over 2 years ago

  • Status changed from New to In Progress

#4 Updated by Andreas Müller over 2 years ago

  • Description updated (diff)

#5 Updated by Andreas Müller over 2 years ago

  • Target version changed from Release 5.4 to Release 5.5
  • % Done changed from 0 to 10

#6 Updated by Andreas Müller about 2 years ago

  • Target version changed from Release 5.5 to Release 5.6

#7 Updated by Andreas Müller about 2 years ago

  • Priority changed from New to Highest
  • Target version changed from Release 5.6 to Release 5.7

#8 Updated by Andreas Müller almost 2 years ago

  • Target version changed from Release 5.7 to Release 5.8

#9 Updated by Andreas Müller about 1 month ago

#10 Updated by Andreas Müller about 1 month ago

  • Status changed from In Progress to Closed
  • % Done changed from 10 to 100

This has been finished (with few exceptions) in #9085.

#11 Updated by Andreas Müller about 1 month ago

  • Related to bug #1119: [PARSER] Duplicate inreferences created during parsing are NOT merged when data gets saved added

#12 Updated by Andreas Müller about 1 month ago

  • Related to bug #9157: Further improve deduplication of names added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)