CdmLibraryFreetextSearch » History » Revision 2
« Previous |
Revision 2/18
(diff)
| Next »
Andreas Kohlbecker, 01/24/2012 01:02 PM
EDIT - Freetext Search¶
Hibernate Search brings the power of full text search engines to the persistence domain model by combining Hibernate Core with the capabilities of the Apache Lucene search engine. Therefore it looks like a good idea to use Hibernate Search in the CDM Library to perform free text searches. Due to the architecture of some parts of the EDIT platform there are some caveats and problems which have to be considered carefully before deciding on making full use of Hibernate Search / Lucene in the CDM Library:
Benefits¶
With plain HQL it is possible to search for example for text snippets contained in TextData.multilanguageText: LIKE '%ext snip%'. But the situation gets a bit more complex when taking a look at some specific use cases, like for example the following tickets: implement advanced search, ALL. Image search, Implement search for multiple areas. For example the image search would require to perform a LIKE search over the following fields simultaneously. The performance of this query would not be the best.
Media.title
Media.description
Media.representations.parts.uri
Description.title
Description.taxon
Description.elements.multilanguageText
Description.elements.name
Hibernate search / Lucene allows to build documents which combine multiple fields which are distributed in the object graph. These docuemts are indexed and thus are serchable in a very quickly without the need to join multiple tables. Further benefits over plain hql are:
normalization
- lowercase/uppercase - 'lactuca' finds 'Lactuca'
- unicode (diacritics) - 'Angstrom' finds 'Ångström'
- removing special characters from words - 'donalds' finds 'donald's'
real term based free text search over a phrase based search with wildcards as 'term_1 te*rm_2 ter' in HQL
can speed up existing find*() methods in the CDM Library
Lucene can handle spacial searches
retrieve information from the Lucene index (titelCache, UUID, etc,) without the need to initialize any CDM entity
...
Open questions¶
- is the index for the type A always updated when an associated object D like in A.B.C.D has been changed?
Projects which require Hibernate search¶
- Vibrant: Task 2 - CDM Datastore as a ViBRANT Index ( ... allows humans to perform full text searches ...)
Problems¶
Side effects¶
Solutions¶
Updated by Andreas Kohlbecker about 12 years ago · 2 revisions