Project

General

Profile

bug #6100

Name parser problems

Added by Andreas Kohlbecker about 2 years ago. Updated almost 2 years ago.

Status:
In Progress
Priority:
Highest
Category:
cdmlib
Target version:
Start date:
09/20/2016
Due date:
% Done:

0%

Severity:
normal
Found in Version:

Description

When running the IAPT import (#6026) a lot of names can not be parsed.

For a better overview I split the affected names into a couple of files. The name feature which has been chosen for the split may have something to do with the problem that occurred in the parser.

The named in Name-parsing-problems-Br.ter.txt are a special case which is has been checked by Henning, see #6100#note-1 for details.

Name-parsing-problems.txt View (3.21 KB) Andreas Kohlbecker, 09/20/2016 12:54 PM

Name-parsing-problems-basionyms.txt View (3.69 KB) Andreas Kohlbecker, 09/20/2016 12:54 PM

Name-parsing-problems-Br.ter.txt View (809 Bytes) Andreas Kohlbecker, 09/20/2016 12:54 PM

Name-parsing-problems-ex-authors.txt View (4.62 KB) Andreas Kohlbecker, 09/20/2016 12:54 PM

Name-parsing-problems-hybrid.txt View (88 Bytes) Andreas Kohlbecker, 09/20/2016 12:54 PM


Related issues

Copied to Edit - feature request #6428: Handle fungi authors in name parser (and model) New 02/17/2017

Associated revisions

Revision 4c91a094 (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 fix parser for jun. in author string

Revision ce3ed240 (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 fix parser for bis and ter in author string

Revision 7301dfbf (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 fix parser subgen. and forma

Revision 78c82898 (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 fix author issues for la Croix, ´t Hart and C.I

Revision 454d0d3d (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 fix Man in 't Veld and De la Torre (maybe fixed already before) and add some hybrid stuff

Revision e838ffcc (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 fix Sant'Anna parsing

Revision 18ae79e1 (diff)
Added by Andreas Müller almost 2 years ago

ref #6100 comment open issues with accent acute

History

#1 Updated by Andreas Kohlbecker about 2 years ago

  • Description updated (diff)

#2 Updated by Andreas Kohlbecker about 2 years ago

  • Description updated (diff)

Hallo Andreas,

korrekter Sonderfall des Brummitt & Powell Standard:

Da R.Br. 3x vorhanden ist, ist der erste R.Br., der 2. R.Br.bis, der 3. R.Br.ter

Also alles in Ordnung mit dem Autor

  • R.Br. - Robert Brown 1773-1858
  • R.Br.ter - Robert, of Campster Brown 1842-1895
  • R.Br.bis - Robert, of NZ Brown 1820-1906

VG
Henning

#3 Updated by Andreas Müller almost 2 years ago

  • Priority changed from New to Highest
  • Target version changed from Unassigned CDM tickets to Release 4.6

#4 Updated by Andreas Müller almost 2 years ago

Hybrids:

  • Pterocypsela x mansuensis (Hayata) C.I Peng => the problem here is the missing dot after "C.I" ! Is this the correct author abbreviation? The hybrid itself parses correctly.

#5 Updated by Wolf-Henning Kusber almost 2 years ago

Andreas Müller wrote:

Hybrids:
* Pterocypsela x mansuensis (Hayata) C.I Peng => the problem here is the missing dot after "C.I" ! Is this the correct author abbreviation? The hybrid itself parses correctly.

There is no dot missing. Seems to be a short syllable.
For Standard see IPNI: http://www.ipni.org/ipni/idAuthorSearch.do;jsessionid=345D3EDF92F1C4F8619E25BA977409DB?id=15619-1&back_page=%2Fipni%2FeditAdvAuthorSearch.do%3Bjsessionid%3D345D3EDF92F1C4F8619E25BA977409DB%3Ffind_abbreviation%3D%26find_surname%3DPeng%26find_isoCountry%3D%26find_forename%3D%26output_format%3Dnormal

#6 Updated by Andreas Müller almost 2 years ago

Hybrids (2):

  • Swida x friedlanderi (W.H.Wagner jun.) Holub => the problem here is "jun." in the author

fixed with cdmlib|4c91a094f879af5

#7 Updated by Andreas Müller almost 2 years ago

Ex authors:

  • all the cases seem to follow the format Lycopersicon lycopersicoides A.Child ex (Dunal) J.M.H.Shaw where the basionym follows the ex author. Is this somehow covered by the code? I do not know this type formatting ex authors. I would expect "Lycopersicon lycopersicoides (Dunal) A.Child ex J.M.H.Shaw" or "Lycopersicon lycopersicoides (A.Child ex Dunal) J.M.H.Shaw"

#9 Updated by Andreas Müller almost 2 years ago

R.Br.ter => fixed with ce3ed240325c

#10 Updated by Wolf-Henning Kusber almost 2 years ago

Andreas Müller wrote:

Ex authors:

  • all the cases seem to follow the format Lycopersicon lycopersicoides A.Child ex (Dunal) J.M.H.Shaw where the basionym follows the ex author. Is this somehow covered by the code? I do not know this type formatting ex authors. I would expect "Lycopersicon lycopersicoides (Dunal) A.Child ex J.M.H.Shaw" or "Lycopersicon lycopersicoides (A.Child ex Dunal) J.M.H.Shaw"

Data errors in the original data, see: http://archive.bgbm.org/scripts/ASP/registration/regDetail.asp?Key=2769

#11 Updated by Wolf-Henning Kusber almost 2 years ago

Wolf-Henning Kusber wrote:

Andreas Müller wrote:

Hybrids (2):

  • Swida x friedlanderi (W.H.Wagner jun.) Holub => the problem here is "jun." in the author

fixed with cdmlib|4c91a094f879af5

According to IPNI standard author without "jun." Swida x friedlanderi (W.H.Wagner) Holub

http://www.ipni.org/ipni/advPlantNameSearch.do;jsessionid=228A187EEC1CD3D86704AB30DC02DBB8?find_family=&find_genus=Swida&find_species=friedlanderi&find_infrafamily=&find_infragenus=&find_infraspecies=&find_authorAbbrev=&find_includePublicationAuthors=on&find_includePublicationAuthors=off&find_includeBasionymAuthors=on&find_includeBasionymAuthors=off&find_publicationTitle=&find_isAPNIRecord=on&find_isAPNIRecord=false&find_isGCIRecord=on&find_isGCIRecord=false&find_isIKRecord=on&find_isIKRecord=false&find_rankToReturn=all&output_format=normal&find_sortByFamily=on&find_sortByFamily=off&query_type=by_query&back_page=plantsearch

Second comment: it an author would be "jun." the standard is "f." with space before (if the surname is not abbreviated) or without space, if the surname is abbreviated. For Linnaeus: L., the "jun." = L.f.

#12 Updated by Andreas Müller almost 2 years ago

Basionyms (I):

Most cases refer to subgen. as infrageneric marker. According to Art. 5A.1. of the code subg. is the correct http://www.iapt-taxon.org/nomen/main.php?page=art5 marker. However IPNI seems to use subgen. in all names http://www.ipni.org/ipni/simplePlantNameSearch.do?find_wholeName=Pleione+subgen.+Scopulorum&output_format=normal&query_type=by_query&back_page=query_ipni.html

From the code I can't find out if subgen. is only not recommended or forbidden.

#13 Updated by Andreas Müller almost 2 years ago

Basionyms (cont.):

Another problem is the use of forma instead of "f.". I haven't found in the code if abbreviation is required so I guess both is correct?

Some problems are about author names:

  • again C.I Peng and similar P.I Mao
  • and other authors not recognized: ´t Hart and la Croix

There is also one wrong name:

  • Psoroma papuana () Aptroot & Diederich

And some typical fungi authors, e.g.

  • Phoma aliena (Fr.: Fr.) v.d. Aa & Boerema
  • Setulipes splachnoides (Horn.: Fr.) Bon
  • Wegelina barbirostris (Dufour: Fr.) M.E. Barr

#15 Updated by Andreas Müller almost 2 years ago

fixed subgen. and forma recognition by 7301dfbfa163b (even if maybe not code compliant)

#16 Updated by Wolf-Henning Kusber almost 2 years ago

Andreas Müller wrote:

Basionyms (cont.):

Another problem is the use of forma instead of "f.". I haven't found in the code if abbreviation is required so I guess both is correct?

Some problems are about author names:

  • again C.I Peng and similar P.I Mao
  • and other authors not recognized: ´t Hart and la Croix

There is also one wrong name:

  • Psoroma papuana () Aptroot & Diederich

And some typical fungi authors, e.g.

  • Phoma aliena (Fr.: Fr.) v.d. Aa & Boerema
  • Setulipes splachnoides (Horn.: Fr.) Bon
  • Wegelina barbirostris (Dufour: Fr.) M.E. Barr

Psoroma papuana () Aptroot & Diederich
is data error for
Psoroma papuana Aptroot & Diederich
http://www.indexfungorum.org/names/NamesRecord.asp?RecordID=442505

#17 Updated by Andreas Müller almost 2 years ago

Author issues (C.I, la Croix and ´t Hart) fixed with 78c828989914a .
Maybe we should automatically map ´t Hart to 't Hart as ´ is a unicode character \u00B4 (but don't do this now)

#18 Updated by Andreas Müller almost 2 years ago

#19 Updated by Andreas Müller almost 2 years ago

remaining issue from basionyms copied to new ticket: #6428

#21 Updated by Andreas Müller almost 2 years ago

Issues from Name-parsing-problems.txt:

  • most issues also about subgen., forma, C.I Peng, ´t Hart etc.

New issues:

  • Man in 't Veld in: Phytophthora multivesiculata Ilieva, Man in 't Veld, Veenbaas-Rijks & Pieters => this is critical as it looks like the start of a reference, needs to be handled explicitly => fixed with 454d0d3dec6c21
  • Polygala petræa Chodat : is æ a valid charater in a name?
  • Rhyncho-Hypnum warmingii Hampe => is Rhyncho-Hypnum a valid genus name?
  • Arthrowallemia R.F. Castańeda, D. Garc¡a & Guarro => Garc¡a is probably incorrect and should be D. García
  • Sorokina caeruleogrisea Spooner, L‘ssøe & Lodge => is L‘ssøe correct? At IPNI I only found a mycologist Læssøe
  • Thymus x herberoi De la Torre, Vicedo, Alonso & Payá => problem in parser, capital D should be recognized => fixed with 454d0d3dec6c21
  • Chaetoceros schüttii var. circinalis Meunier => is ü a valid name character? I know from i,e,and o that diaeresis is allowed (or at least in use), so probably it is
  • Brunneiapiospora K.D. Hyde, j. Fröhl. & J.E. Taylor => I guess this is a mistake and should be J. Fröhl.
  • Coelosphaerium evidenter-marginatum M.T.P.Azevedo & SantAnna => standard form is Sant'Anna, do we want to allow SantAnna, anyway also Sant'Anna is currently not recognized. => fixed (at least for standard form) by e838ffccfc7e5a83b
  • Hibiscus tiliaceus 'hastatus' => original spelling not yet fully implemented, often discussed #3967, #3966 and related tickets

obviously incorrect:

  • Claviceps citrina Pažoutova, Fučíkovský;, Leyva-Mir & Flieger (;)
  • Helianthemum polifolium sensu auct. (sensu auct. should not be part of name)

#22 Updated by Andreas Müller almost 2 years ago

  • Status changed from New to In Progress

#23 Updated by Andreas Müller almost 2 years ago

´t Hart still does not seem to work if compiled with maven. Should be avoided as it seems to be 2 characters (1 hidden)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)