bug #6100
openName parser problems
Added by Andreas Kohlbecker almost 8 years ago. Updated about 3 years ago.
0%
Description
When running the IAPT import (#6026) a lot of names can not be parsed.
For a better overview I split the affected names into a couple of files. The name feature which has been chosen for the split may have something to do with the problem that occurred in the parser.
The named in Name-parsing-problems-Br.ter.txt are a special case which is has been checked by Henning, see #6100#note-1 for details.
Files
Name-parsing-problems.txt (3.21 KB) Name-parsing-problems.txt | Andreas Kohlbecker, 09/20/2016 12:54 PM | ||
Name-parsing-problems-basionyms.txt (3.69 KB) Name-parsing-problems-basionyms.txt | Andreas Kohlbecker, 09/20/2016 12:54 PM | ||
Name-parsing-problems-Br.ter.txt (809 Bytes) Name-parsing-problems-Br.ter.txt | Andreas Kohlbecker, 09/20/2016 12:54 PM | ||
Name-parsing-problems-ex-authors.txt (4.62 KB) Name-parsing-problems-ex-authors.txt | Andreas Kohlbecker, 09/20/2016 12:54 PM | ||
Name-parsing-problems-hybrid.txt (88 Bytes) Name-parsing-problems-hybrid.txt | Andreas Kohlbecker, 09/20/2016 12:54 PM |
Related issues
Updated by Andreas Kohlbecker almost 8 years ago
- Description updated (diff)
Hallo Andreas,
korrekter Sonderfall des Brummitt & Powell Standard:
Da R.Br. 3x vorhanden ist, ist der erste R.Br., der 2. R.Br.bis, der 3. R.Br.ter
Also alles in Ordnung mit dem Autor
- R.Br. - Robert Brown 1773-1858
- R.Br.ter - Robert, of Campster Brown 1842-1895
- R.Br.bis - Robert, of NZ Brown 1820-1906
VG
Henning
Updated by Andreas Müller over 7 years ago
- Priority changed from New to Highest
- Target version changed from Unassigned CDM tickets to Release 4.6
Updated by Andreas Müller over 7 years ago
Hybrids:
- Pterocypsela x mansuensis (Hayata) C.I Peng => the problem here is the missing dot after "C.I" ! Is this the correct author abbreviation? The hybrid itself parses correctly.
Updated by Wolf-Henning Kusber over 7 years ago
Andreas Müller wrote:
Hybrids:
- Pterocypsela x mansuensis (Hayata) C.I Peng => the problem here is the missing dot after "C.I" ! Is this the correct author abbreviation? The hybrid itself parses correctly.
There is no dot missing. Seems to be a short syllable.
For Standard see IPNI: http://www.ipni.org/ipni/idAuthorSearch.do;jsessionid=345D3EDF92F1C4F8619E25BA977409DB?id=15619-1&back_page=%2Fipni%2FeditAdvAuthorSearch.do%3Bjsessionid%3D345D3EDF92F1C4F8619E25BA977409DB%3Ffind_abbreviation%3D%26find_surname%3DPeng%26find_isoCountry%3D%26find_forename%3D%26output_format%3Dnormal
Updated by Andreas Müller over 7 years ago
Hybrids (2):
- Swida x friedlanderi (W.H.Wagner jun.) Holub => the problem here is "jun." in the author
fixed with cdmlib|4c91a094f879af5
Updated by Andreas Müller over 7 years ago
Ex authors:
- all the cases seem to follow the format Lycopersicon lycopersicoides A.Child ex (Dunal) J.M.H.Shaw where the basionym follows the ex author. Is this somehow covered by the code? I do not know this type formatting ex authors. I would expect "Lycopersicon lycopersicoides (Dunal) A.Child ex J.M.H.Shaw" or "Lycopersicon lycopersicoides (A.Child ex Dunal) J.M.H.Shaw"
Updated by Wolf-Henning Kusber over 7 years ago
Andreas Müller wrote:
Hybrids (2):
- Swida x friedlanderi (W.H.Wagner jun.) Holub => the problem here is "jun." in the author
fixed with cdmlib|4c91a094f879af5
According to IPNI standard author without "jun." Swida x friedlanderi (W.H.Wagner) Holub
Updated by Wolf-Henning Kusber over 7 years ago
Andreas Müller wrote:
Ex authors:
- all the cases seem to follow the format Lycopersicon lycopersicoides A.Child ex (Dunal) J.M.H.Shaw where the basionym follows the ex author. Is this somehow covered by the code? I do not know this type formatting ex authors. I would expect "Lycopersicon lycopersicoides (Dunal) A.Child ex J.M.H.Shaw" or "Lycopersicon lycopersicoides (A.Child ex Dunal) J.M.H.Shaw"
Data errors in the original data, see: http://archive.bgbm.org/scripts/ASP/registration/regDetail.asp?Key=2769
Updated by Wolf-Henning Kusber over 7 years ago
Wolf-Henning Kusber wrote:
Andreas Müller wrote:
Hybrids (2):
- Swida x friedlanderi (W.H.Wagner jun.) Holub => the problem here is "jun." in the author
fixed with cdmlib|4c91a094f879af5
According to IPNI standard author without "jun." Swida x friedlanderi (W.H.Wagner) Holub
Second comment: it an author would be "jun." the standard is "f." with space before (if the surname is not abbreviated) or without space, if the surname is abbreviated. For Linnaeus: L., the "jun." = L.f.
Updated by Andreas Müller over 7 years ago
Basionyms (I):
Most cases refer to subgen. as infrageneric marker. According to Art. 5A.1. of the code subg. is the correct http://www.iapt-taxon.org/nomen/main.php?page=art5 marker. However IPNI seems to use subgen. in all names http://www.ipni.org/ipni/simplePlantNameSearch.do?find_wholeName=Pleione+subgen.+Scopulorum&output_format=normal&query_type=by_query&back_page=query_ipni.html
From the code I can't find out if subgen. is only not recommended or forbidden.
Updated by Andreas Müller over 7 years ago
Basionyms (cont.):
Another problem is the use of forma instead of "f.". I haven't found in the code if abbreviation is required so I guess both is correct?
Some problems are about author names:
- again C.I Peng and similar P.I Mao
- and other authors not recognized: ´t Hart and la Croix
There is also one wrong name:
- Psoroma papuana () Aptroot & Diederich
And some typical fungi authors, e.g.
- Phoma aliena (Fr.: Fr.) v.d. Aa & Boerema
- Setulipes splachnoides (Horn.: Fr.) Bon
- Wegelina barbirostris (Dufour: Fr.) M.E. Barr
Updated by Wolf-Henning Kusber over 7 years ago
't Hart is standard for ´t Hart
http://www.ipni.org/ipni/idAuthorSearch.do?id=10539-1&back_page=%2Fipni%2FeditAdvAuthorSearch.do%3Ffind_abbreviation%3D%26find_surname%3D*Hart%26find_isoCountry%3D%26find_forename%3D%26output_format%3Dnormal
P.I Mao is standard I = Yi, but I is standard.
la Croix
is standard according to IPNI
http://www.ipni.org/ipni/idAuthorSearch.do?id=21929-1&back_page=%2Fipni%2FeditAdvAuthorSearch.do%3Ffind_abbreviation%3D%26find_surname%3DCroix%26find_isoCountry%3D%26find_forename%3D%26output_format%3Dnormal
Updated by Andreas Müller over 7 years ago
fixed subgen. and forma recognition by 7301dfbfa163b (even if maybe not code compliant)
Updated by Wolf-Henning Kusber over 7 years ago
Andreas Müller wrote:
Basionyms (cont.):
Another problem is the use of forma instead of "f.". I haven't found in the code if abbreviation is required so I guess both is correct?
Some problems are about author names:
- again C.I Peng and similar P.I Mao
- and other authors not recognized: ´t Hart and la Croix
There is also one wrong name:
- Psoroma papuana () Aptroot & Diederich
And some typical fungi authors, e.g.
- Phoma aliena (Fr.: Fr.) v.d. Aa & Boerema
- Setulipes splachnoides (Horn.: Fr.) Bon
- Wegelina barbirostris (Dufour: Fr.) M.E. Barr
Psoroma papuana () Aptroot & Diederich
is data error for
Psoroma papuana Aptroot & Diederich
http://www.indexfungorum.org/names/NamesRecord.asp?RecordID=442505
Updated by Andreas Müller over 7 years ago
Author issues (C.I, la Croix and ´t Hart) fixed with 78c828989914a .
Maybe we should automatically map ´t Hart to 't Hart as ´ is a unicode character \u00B4 (but don't do this now)
Updated by Andreas Müller over 7 years ago
- Copied to feature request #6428: Handle fungi authors in name parser (and model) added
Updated by Andreas Müller over 7 years ago
remaining issue from basionyms copied to new ticket: #6428
Updated by Andreas Müller over 7 years ago
- Related to task #3967: Rethink original spelling strategy added
Updated by Andreas Müller over 7 years ago
Issues from Name-parsing-problems.txt:
- most issues also about subgen., forma, C.I Peng, ´t Hart etc.
New issues:
Man in 't Veld in: Phytophthora multivesiculata Ilieva, Man in 't Veld, Veenbaas-Rijks & Pieters => this is critical as it looks like the start of a reference, needs to be handled explicitly=> fixed with 454d0d3dec6c21- Polygala petræa Chodat : is æ a valid charater in a name?
- Rhyncho-Hypnum warmingii Hampe => is Rhyncho-Hypnum a valid genus name?
- Arthrowallemia R.F. Castańeda, D. Garc¡a & Guarro => Garc¡a is probably incorrect and should be D. García
- Sorokina caeruleogrisea Spooner, L‘ssøe & Lodge => is L‘ssøe correct? At IPNI I only found a mycologist Læssøe
Thymus x herberoi De la Torre, Vicedo, Alonso & Payá => problem in parser, capital D should be recognized=> fixed with 454d0d3dec6c21- Chaetoceros schüttii var. circinalis Meunier => is ü a valid name character? I know from i,e,and o that diaeresis is allowed (or at least in use), so probably it is
- Brunneiapiospora K.D. Hyde, j. Fröhl. & J.E. Taylor => I guess this is a mistake and should be J. Fröhl.
Coelosphaerium evidenter-marginatum M.T.P.Azevedo & Sant=> fixed (at least for standard form) by e838ffccfc7e5a83bAnna => standard form is Sant'Anna, do we want to allow Sant
Anna, anyway also Sant'Anna is currently not recognized.- Hibiscus tiliaceus 'hastatus' => original spelling not yet fully implemented, often discussed #3967, #3966 and related tickets
obviously incorrect:
- Claviceps citrina Pažoutova, Fučíkovský;, Leyva-Mir & Flieger (;)
- Helianthemum polifolium sensu auct. (sensu auct. should not be part of name)
Updated by Andreas Müller over 7 years ago
- Status changed from New to In Progress
Updated by Andreas Müller over 7 years ago
´t Hart still does not seem to work if compiled with maven. Should be avoided as it seems to be 2 characters (1 hidden)
Updated by Andreas Müller almost 4 years ago
- Target version changed from Release 4.6 to Release 5.18
Updated by Andreas Müller almost 4 years ago
- Related to task #9014: Unparsable name strings added
Updated by Andreas Müller over 3 years ago
- Target version changed from Release 5.18 to Release 5.19
Updated by Andreas Müller over 3 years ago
- Target version changed from Release 5.19 to Release 5.21
Updated by Andreas Müller over 3 years ago
- Target version changed from Release 5.21 to Release 5.22
Updated by Andreas Müller over 3 years ago
- Target version changed from Release 5.22 to Release 5.25
Updated by Andreas Müller about 3 years ago
- Target version changed from Release 5.25 to Release 5.51