Project

General

Profile

Actions

bug #10472

open

Improve simple search performance in dataportal

Added by Andreas Müller 5 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Priority13
Category:
cdm-dataportal
Target version:
Start date:
Due date:
% Done:

70%

Estimated time:
Severity:
normal
Found in Version:
Tags:

Description

In the TaxEditor it is much faster. Need to check:

  • if the same method is used => no, different method, TaxEditor uses UuidAndTitleCache result
  • if thumbnails slow down => yes, they do, see #10418
  • impact of common name search => very important, see #note-12 and #10247
  • if paging and/or ordering is an issue => only for common name search as the count(*) method takes the same time as the instance call and therefore doubles the time; in general paging works as long as pure names are not returned (which is currently not available in the dataportal). If the method is not used elsewhere we should remove this part of the code or otherwise split the code as merging pure names in the result will make paging much more complex
  • handling for concept relationships
  • ...

Also think about using lucene also for simple search.


Related issues

Related to EDIT - bug #6195: Suppress footnotes in search results pageClosedKatja Luther

Actions
Related to EDIT - task #3348: should annotations always be returned for cdm instances? [DISCUSS]ClosedAndreas Müller

Actions
Related to EDIT - feature request #7771: Format Misapplication search results as MAN, not as accepted taxaFeedbackAndreas Müller

Actions
Related to EDIT - bug #10247: Common name search is slowNewAndreas Müller

Actions
Related to EDIT - bug #10418: Thumbnail loading takes very long in taxon searchIn ProgressKatja Luther

Actions
Actions #1

Updated by Katja Luther 5 months ago

In editor the search response contains only UuidAndTitleCache objects, the search in dataportal returns complete taxon objects.

Actions #2

Updated by Andreas Müller 5 months ago

  • Related to bug #6195: Suppress footnotes in search results page added
Actions #3

Updated by Andreas Müller 5 months ago

  • Status changed from New to In Progress

Which parts of the taxon object are needed? Is it enough to provide a taggedText? What information is needed.
Which taggedText types are needed?

Actions #4

Updated by Andreas Müller 5 months ago

  • Description updated (diff)
Actions #5

Updated by Andreas Müller 5 months ago

For each synonym there is 1 call to catch the accepted taxon. If this is necessary at all it should be done in the "find" call.

For each taxon there are calls for sources, for each name there are calls for sources, annotations, registrations and nom. refs. Are they needed at all? If yes, shoudn't they be included in the "find" call via property path?

Or can it be done in 1 call that is initializing these fields for all taxa and not on a per taxon base.

Actions #6

Updated by Andreas Müller 5 months ago

  • Status changed from In Progress to Discussed
  • Assignee changed from Andreas Müller to Katja Luther
  • % Done changed from 0 to 10

Before removing DTOs we should first remove the unnecessary calls and, if needed, adapt the property paths. This should save a lot of time. Therefore I pass the ticket to KL.

Actions #7

Updated by Katja Luther 5 months ago

Andreas Müller wrote in #note-5:

For each synonym there is 1 call to catch the accepted taxon. If this is necessary at all it should be done in the "find" call.

For each taxon there are calls for sources, for each name there are calls for sources, annotations, registrations and nom. refs. Are they needed at all? If yes, shoudn't they be included in the "find" call via property path?

Or can it be done in 1 call that is initializing these fields for all taxa and not on a per taxon base.

The nomenclatural reference call is to get the formatted nomenclatural citation string, we could add this as a tagged text or a getter method which returns the nomenclatural citation.

Actions #8

Updated by Katja Luther 5 months ago

Do we need registration informations in search result?

Actions #9

Updated by Katja Luther 5 months ago

Andreas Müller wrote in #note-5:

For each synonym there is 1 call to catch the accepted taxon. If this is necessary at all it should be done in the "find" call.

For each taxon there are calls for sources, for each name there are calls for sources, annotations, registrations and nom. refs. Are they needed at all? If yes, shoudn't they be included in the "find" call via property path?

Or can it be done in 1 call that is initializing these fields for all taxa and not on a per taxon base.

The method already has a parameter $show_annotations, for the search it is set to false now. The calls for nom ref and accepted taxon I avoided by adding a method to get accepted uuid and nomencl. citation.

Please have a look whether this is ok like this.

Actions #10

Updated by Andreas Müller 5 months ago

Katja Luther wrote in #note-8:

Do we need registration informations in search result?

I don't think. Definetely only in Phycobank if at all. But in Phycobank I couldn't find a simple search result with registration information. And the advanced search (which is not available via UI without explicitly manipulating the browser URL) does also not show any: https://www.phycobank.org/cdm_dataportal/search/results/taxon?ws=portal%2Ftaxon%2Fsearch&query=Mes*&form_build_id=form-nDcRBYQhUwll9gAKdhdCQBJL7bZ5-f99bKGq9JqsEFU&form_id=cdm_dataportal_search_taxon_form_advanced&search%5BpageSize%5D=25&search%5BpageIndex%5D=0&search%5Bareas%5D%5Bareas_filter%5D=&search%5Btree%5D=8c51efb4-3d67-4bea-8f87-4bc1cba1310d&search%5BdoTaxa%5D=1&search%5BdoSynonyms%5D=1

Actions #11

Updated by Katja Luther 5 months ago

Andreas Müller wrote in #note-10:

Katja Luther wrote in #note-8:

Do we need registration informations in search result?

I don't think. Definetely only in Phycobank if at all. But in Phycobank I couldn't find a simple search result with registration information. And the advanced search (which is not available via UI without explicitly manipulating the browser URL) does also not show any: https://www.phycobank.org/cdm_dataportal/search/results/taxon?ws=portal%2Ftaxon%2Fsearch&query=Mes*&form_build_id=form-nDcRBYQhUwll9gAKdhdCQBJL7bZ5-f99bKGq9JqsEFU&form_id=cdm_dataportal_search_taxon_form_advanced&search%5BpageSize%5D=25&search%5BpageIndex%5D=0&search%5Bareas%5D%5Bareas_filter%5D=&search%5Btree%5D=8c51efb4-3d67-4bea-8f87-4bc1cba1310d&search%5BdoTaxa%5D=1&search%5BdoSynonyms%5D=1

I just recognized that the ws call to get the registrations of a name does not return anything, neither for a name having a registration, for example:
Mesophyllum alternans (Foslie) Cabioch & M.L.Mendoza in Phycologia 37: 209. 24 Jun 1998 (https://www.phycobank.org/cdm_dataportal/name/86acc7e0-e1a9-4523-991a-9dda642f883a/null/null/)
calls https://api.phycobank.org/phycobank/name/86acc7e0-e1a9-4523-991a-9dda642f883a/registrations.json but the result is empty
The registration is:
https://www.phycobank.org/cdm_dataportal/registration?identifier=http%3A//phycobank.org/2540

So for now we can remove the registration calls for the search and should fix the ws for registrations.

Actions #12

Updated by Andreas Müller 5 months ago

  • Description updated (diff)

Most critical in the remaining is the usage of "search by common name". In larger databases like E+M this search takes ~12s on edit-test (Search by "Hieracium al*", while the same search without common name takes 1-2s.
Using an index on DescriptionElementBase improves the situation if searching without * at the beginning, but is no solution for "" searches like "*ieracium al".

Actions #13

Updated by Andreas Müller 5 months ago

  • Description updated (diff)
Actions #14

Updated by Andreas Müller 5 months ago

  • Description updated (diff)
  • Status changed from Discussed to In Progress
  • % Done changed from 10 to 40
Actions #15

Updated by Andreas Müller 5 months ago

  • Description updated (diff)
Actions #16

Updated by Andreas Müller 4 months ago

  • Related to task #3348: should annotations always be returned for cdm instances? [DISCUSS] added
Actions #17

Updated by Katja Luther about 2 months ago

  • Related to feature request #7771: Format Misapplication search results as MAN, not as accepted taxa added
Actions #18

Updated by Andreas Müller about 1 month ago

  • Description updated (diff)
Actions #19

Updated by Andreas Müller about 1 month ago

  • Related to bug #10247: Common name search is slow added
Actions #20

Updated by Andreas Müller about 1 month ago

  • Target version changed from Release 5.47 to Release 5.43

We may want to split this issue by creating a follow-up ticket.

Actions #21

Updated by Andreas Müller about 1 month ago

  • Related to bug #10418: Thumbnail loading takes very long in taxon search added
Actions #22

Updated by Andreas Müller about 1 month ago

  • Description updated (diff)
Actions #23

Updated by Andreas Müller about 1 month ago

  • Status changed from In Progress to Resolved
  • Assignee changed from Katja Luther to Andreas Müller
  • % Done changed from 40 to 70
Actions #24

Updated by Andreas Müller about 1 month ago

  • Priority changed from Highest to Priority13
Actions #25

Updated by Andreas Müller about 1 month ago

  • Description updated (diff)
Actions

Also available in: Atom PDF