Project

General

Profile

Actions

bug #7829

closed

Improve deduplication of parsed names and references

Added by Andreas Müller over 3 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Highest
Category:
cdmlib
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Severity:
normal
Found in Version:
Tags:

Description

The default matching strategies are to strict for data name and nom. ref. data created by a parser. Also there are some mistakes in the defined matching.

Generally we need to make more often use of EQUAL_OR_SECOND_NULL match mode, because the parsed data is always not very complete while existing data might be more complete. E.g. the authors of a reference might be stored with full name while the full name is usually not available by parsed data. Also place published might be unknown in the parsed data but this might be discussed as it is sometimes part of the parsed string.

We probably need a MatchingStrategyFactory that offers matching strategies specific for parsed data compared to persisted richer data.

===

There were also wrong matching like not matching null and empty datePublished and taking nomenclaturallyRelevant into account.


Related issues

Related to EDIT - feature request #7800: Parse preliminary RefDetailsClosedAndreas Müller

Actions
Related to EDIT - feature request #9085: Improve deduplication of parsed namesClosedAndreas Müller

Actions
Related to EDIT - bug #1119: [PARSER] Duplicate inreferences created during parsing are NOT merged when data gets savedDuplicateAndreas Müller

Actions
Related to EDIT - bug #9157: Further improve deduplication of namesIn ProgressAndreas Müller

Actions
Actions #1

Updated by Andreas Müller over 3 years ago

Actions #2

Updated by Andreas Müller over 3 years ago

  • Tags set to euro+med
Actions #3

Updated by Andreas Müller over 3 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Andreas Müller over 3 years ago

  • Description updated (diff)
Actions #5

Updated by Andreas Müller over 3 years ago

  • Target version changed from Release 5.4 to Release 5.5
  • % Done changed from 0 to 10
Actions #6

Updated by Andreas Müller over 3 years ago

  • Target version changed from Release 5.5 to Release 5.6
Actions #7

Updated by Andreas Müller about 3 years ago

  • Priority changed from New to Highest
  • Target version changed from Release 5.6 to Release 5.7
Actions #8

Updated by Andreas Müller about 3 years ago

  • Target version changed from Release 5.7 to Release 5.8
Actions #9

Updated by Andreas Müller over 1 year ago

Actions #10

Updated by Andreas Müller over 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 10 to 100

This has been finished (with few exceptions) in #9085.

Actions #11

Updated by Andreas Müller over 1 year ago

  • Related to bug #1119: [PARSER] Duplicate inreferences created during parsing are NOT merged when data gets saved added
Actions #12

Updated by Andreas Müller over 1 year ago

  • Related to bug #9157: Further improve deduplication of names added
Actions

Also available in: Atom PDF