Project

General

Profile

Actions

NameParserDocumentation » History » Revision 9

« Previous | Revision 9/23 (diff) | Next »
Andreas Müller, 09/29/2009 06:56 PM


NAME PARSER DOCUMENTATION

The taxonomic name parser recognizes and atomizes four main components of a freetext style taxonomic reference.

These components are the

Not all of them are obligatory.

The four parts are separated from the predecessor by the following separators:

|part|separator|example|
|authorship|any whitespace|Abies alba L.|
|reference|commata with following whitespace OR whitespace+'in'+whitespace|Abies alba L., Sp. Pl... or Pinus alba in Bull. Soc....
|nom. status|commata with following whitespace|

So valid name texts fully recognized by the parser are:

Abies alba (L.) Mill., Sp. Pl.: 105. 1846., nom illeg. //TODO real example

or

Abies alba (L.) Mill. in Bull. Bot. 3: 99. 1987., nom illeg. //TODO real example

The name part is obligatory. The authorship part is obligatory only if followed by the reference part. The reference part as the status part are not obligatory.

In the following the four parts are described in detail:

Name part

The name part recognizes uninomials, binomials and trinomials. The first epithet must always start with a capital letter. All other words must have "kleine" letters except for infrageneric epethita. Only latin letters are allowed in names (except for the letter XXX)

The name part parser differentiates 6 different syntaxes.

uninomials

One word starting with a capital letter. As the rank is not "eindeutig" for uninomials, the rank is just guessed and a warning is given to check the rank.

Example: "Cichorieae"

infrageneric names

Capital word followed by the infrageneric marker followed by the infrageneric epethiton.

Valid markers are: subgen., subg., sect., subsect., ser., subser., t.infgen.

Example: Desmometopa subg. LitoXXX

species aggregates

Species aggretates are recognized like species but followed by a group marker.

Valid markers are: aggr., agg., group

Example: XXX

species

Species names have a genus part (capital letter) and a species part "kleiner" letter

Examples are: Abies alba

Infraspecific names

Infraspecific names have four parts: the genus part, the species part, the infraspecific marker and the infraspecific part. All but the first must not start with a capital letter.

Valid markers are: subsp., convar., var., subvar., f., subf., f.spec., tax.infrasp., tax. infrasp.

Example:

Infraspecific names (old markers)

Some older names (not valid according to the nomenclatural code) use other infraspecific markers.

The recognition of these older names is not yet implemented.

Authorship part

The authorship part is devided into the originalcombination authorship and the combination authorship.

The earlier is put in brackets.

Example (botany): (L.) Mill.

Example (zool): (XXX, 1830) XXX, 1845

None of the parts is obligatory, but if there is any information following the authorship part there must be atleast one, the original combination author or the combination author.

The parser diffentiates botanical and zoological authors. The later have a year added, separated by a commata.

Authorship may include single persons and teams. Team members are separated by an '&'.

A placeholder 'al.' may be used for further team members.

Both authortypes may also include Ex-Authors separated by 'ex' or 'ex.'

So further valid author strings are:

Botany: (Greuther & L'Hiver & al. ex Müller & Schmidt)Clark ex Ciardelli

The number of allowed special characters like "'" or "-" at the moment is beyond this documentation and will change in future.

Reference part

The reference part follows the following syntax:

{separator}{authorship{,}}{TitleEditonVolume}{:}{Detail}{.}{Year}

zoological new combination should not have a reference part as in zoologogy it is not common to mention the combination reference.

separator

=

may be a comma {,} or an { in } surrounded by whitespaces. The comma indicates a book, the "in" stands either for a journal article or a book section.

authorship

an author is only available for book sections. Articles and book sections are differentiated from each other by comparing the first four words that follow the separator. If these words include a comma and the words before the comma are likely to represent an author the reference is recoginzed as a book section. Otherwise is will be treated as an article. In both cases a warning is thrown that differentiation is not safe.

titleEditionVolume

The titleEditionVolume part includes the title itself as well as a facultativ edition part and volume parts.

The title itself allows most character combinations but care has to be taken if a ":" is included as this is the separator for the subsequent detail part. Special characters like '&' and '-' are only allowed if preceded and followed immidiately by ordinary characters. Ordinary brackets are allowed.

Edition and volume are separated by whitespace if only one of them exists. If both exist the later is separated by a comma. Both are facultative so all the following four formats are valid:

Sp. Pl.

Sp. Pl. ed. 3

Sp. Pl. ed. 3, 4

Sp. Pl. 4

As it can be seen the edition is recoginzed by a preceding "ed." whereas the volume is just a number (or a number followed by another number in brackets - e.g. like 4(5) ).

The detail part is separated by a column ":" from the preceding titleEditonVolume part and is separated by "." from the year (botanical names only).

A number of typical detail information is recognized such as pure page numbers(e.g. 345) and ranges (e.g. 345-348), pagenumbers preceded by

Nomenclatural status

The nomeclatural status is separated from the preceding text by comma ",". Valid values for a status are at the moment:

nom. superfl., nom. nud., nom. illeg, nom. inval., nom. cons., nom. alternativ., nom. subnud., nom. rej., nom. rej., nom. prop., nom. provis., orth. var.

where the authorship part is facultativ for

The reference part is separated from the authorship part by either a commata or by " in ". A commata referes to a Book which has the names author as the books author.

A reference part starting with an " in " refers either to a book section or to a journal article. The default value here is an article. But if there is a commata in the titles part preceded by a text string that is likely to represent an author, the reference is considered to be a book section. In both cases a warning is given that the reference type has to be checked.

The whole reference part consist of a separator, a facultativ authorship, a title, a volume, an edition an obligatory detail and a year, that is obligatory for botanical names.

Updated by Andreas Müller almost 15 years ago · 9 revisions