bug #7200: México Distrito Federal spelling and ignore accents in area search - EDIT - EDIT Project Management

Actions

Copy link

bug #7200

closed

México Distrito Federal spelling and ignore accents in area search

Added by Katja Luther about 6 years ago. Updated over 2 years ago.

Status:

Worksforme

Priority:

Priority14

Assignee:

Katja Luther

Category:

taxeditor

Target version:

Start date:

Due date:

% Done:

100%

Estimated time:

Severity:

normal

Found in Version:

Description

NK:

es gibt eine problematische Inkonsistenz bei den level 4 areas in Mexico:

Sucht man im Area Wizard nach "Mexico" erhält man alles mögliche, auch Mexico Distrito Federal, nicht aber den Bundesstaat Mexico "Mexico State", denn dieser ist als "México State" aufgeführt (also mit Akzent auf dem e, so auch bei TDWG aufgeführt) und wird auch nur gefunden, wenn man den Akzent mitangibt. Ich hatte zuerst angenommen Mexico state wäre vergessen worden. "México Distrito Federal" hingegen müsste korrekterweise mit Akzent geschrieben werden (so auch bei TDWG), ist es aber nicht.

Ich fände es sinnvoll, wenn auch México Distrito Federal mit Akzent geschrieben würde, aber die Suche sollte unbedingt die Akzente ignorieren.

Related issues

Actions

Copy link

Updated by Andreas Müller about 6 years ago

Be aware that there are a couple of other cases with accents and similar. So we need a general solution for the search.

Some solutions could be to use lucene search (therefore I add AK). But maybe also hibernate allows according search. I even wonder that the version with accent is not found because as far as I know SQL does not distinguish accents. Is there an Java equal after the SQL search? (I have the same problem but other direction for deduplication, where I need to distinguish but SQL "order by" does not).

Also we will need an update script.

Actions

Copy link

Updated by Andreas Müller about 6 years ago

Description updated (diff)

Actions

Copy link

Updated by Andreas Kohlbecker about 6 years ago

These types of problems are usually rather a question of the collation being used in the database.

See for example this thread https://stackoverflow.com/questions/28863402/mysql-diacritic-insensitive-search-arabic#28891336

this simple tests:

select 'México' = 'Mexico'  COLLATE utf8_unicode_ci

gives back a 1 which means MySQL considers these strings being equal.

Maybe the collation in the database in question is not set correctly? It should be utf8_unicode_ci

Actions

Copy link

Updated by Andreas Müller over 2 years ago

Andreas Kohlbecker wrote:

These types of problems are usually rather a question of the collation being used in the database.

See for example this thread https://stackoverflow.com/questions/28863402/mysql-diacritic-insensitive-search-arabic#28891336

this simple tests:
select 'México' = 'Mexico'  COLLATE utf8_unicode_ci
gives back a 1 which means MySQL considers these strings being equal.

Maybe the collation in the database in question is not set correctly? It should be utf8_unicode_ci

This IMO is not a DB/SQL issue (select 'México' = 'Mexico' returns also 1 on cichorieae so they both are concidered equal) but an issue of the later equal testing in java.

Actions

Copy link

Updated by Andreas Müller over 2 years ago

Target version changed from Unassigned CDM tickets to Release 5.45

How to implement java side accent insensitive compare or contains is e.g. explained here:

https://stackoverflow.com/questions/28833797/compare-strings-ignoring-accented-characters and https://stackoverflow.com/questions/8745660/contains-with-collator/8745778#8745778

So maybe we can implement this for all searches in 5.29?

Actions

Copy link

Updated by Andreas Müller over 2 years ago

Status changed from New to Resolved
Assignee changed from Katja Luther to Norbert Kilian
Target version changed from Release 5.45 to Release 5.28

This seems to be fixed in the meanwhile. At least searching for "Mexico" also returns "México State". Norbert, can you please verify?

Actions

Copy link

Updated by Andreas Müller over 2 years ago

Copied to bug #9830: México Distrito Federal needs accent added

Actions

Copy link

Updated by Andreas Müller over 2 years ago

NK:

Ja, searching for "Mexico" also returns "México State"
Aber es heißt immer noch „Mexico Distrito Federal“ statt „México Distrito Federal“ (mit Akzent)

Actions

Copy link

Updated by Andreas Müller over 2 years ago

Status changed from Resolved to Worksforme
Assignee changed from Norbert Kilian to Katja Luther
Target version deleted (~~Release 5.28~~)
% Done changed from 0 to 100

I split the ticket. Correct spelling for „México Distrito Federal“ is now handled in #9830 as this needs a model change update script.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

EDIT

Custom queries

bug #7200

México Distrito Federal spelling and ignore accents in area search

Updated by Andreas Müller about 6 years ago

Updated by Andreas Müller about 6 years ago

Updated by Andreas Kohlbecker about 6 years ago

Updated by Andreas Müller over 2 years ago

Updated by Andreas Müller over 2 years ago

Updated by Andreas Müller over 2 years ago

Updated by Andreas Müller over 2 years ago

Updated by Andreas Müller over 2 years ago

Updated by Andreas Müller over 2 years ago