E-Flora Markup Documentation


The import is generally done temporarily until the quality is such that only few things are left that need to be done manually. Most of them can't be done automatically.


Features may either be explicitly marked-up as features or be retrieved or double checked by headings and subheadings


The order of features needs to be stored to later be able to construct the best feature tree and to check for those cases where problems arise from the created feature tree


Descriptions are imported redundantly.

  1. First we import the full text with feature description.

  2. Characters (sensu latu) are then imported separately, with order information attached

  3. Same is true for sub-characters.

  4. For characters we try to have fix uuids defined in the MarkupTransformer class.

For now, only the full text is shown in the dataportal. Later, when all data is available, we will try to create a feature tree, and try to build the descriptions from all the single features. But this needs further research on where this creates problems due to ordering or due to any other issues.


Keys are handled by importing trying to match the couplets to existing taxa. This is done by trying to expand taxon names and then match them. Both, taxa and couplets are stored during parsing in a XXX and whenever a new element is created it is checked if the partner element already exists.

At the end of the import it is logged which couplets did not match any taxon until the end.

After Import

There is a list of TODOs for after final imports at #5540

Flora Malesiana

FM ser 1 Volumes 2 & 3 were never published. They may only exist as handwritten notes in an archive.

FM ser1 volume 1 does not actually contain any treatments; instead it contains the first part of the Cyclopedia of Collectors and many general chapters about the Malesian area (in 1950). ... it's really only interesting for historians.


Taxon Names


Q: Rank Ipomoea x multifida (RAFIN.) SHINNERS is “hybrid”. Hybrid is actually not a rank but a kind of status. Rank of this name is species. Hybrid should not be allowed as term for rank.

Line 110364 is a bit different as it is a hybrid formula, not a hybrid name. But still the rank is species. Hybrid formulas consist of 2 names combined with x while hybrid names just have a x somewhere within the name indicating the rank on which they are hybrids.

I wonder why we did not have a structured mark up for hybrid names as this should not be difficult to define.

Have we ever discussed this?

A: The reason they are all marked up with fullName is that if I don't do that they don't match with what is in the keys. With regards to changing the mark-up so hybrid is not a term for rank, that's possible. So I add a Boolean class "hybrid" and use the proper rank for ?

A2: Yes. This would be great. If possible we could even distinguish hybrid formulas and hybrid names. But as hybrids are not so many I probably will handle them manually anyway. Only important issue is to have the rank available which makes it easier to build the classification tree correctly from the very beginning.

Citation / Reference

Generally we need to distinguish nomenclatural citations and general citations.



Q: The problem I have here is that appendix is used for all kind of “suppl.” parts of the citation. However, this may occur either as part of the title (pubName), but sometimes it also occurs as part of what we call the details part (e.g. to define the exact place where a taxon name is used). Here it should not necessarily be part of the title.

For me it is actually difficult to distinguish these cases, and, as I do not have a place where to store appendix information separately it rather creates problems then solving any.

So my question is if we couldn’t remove the appendix and either put the information into the title (default) or, if there is some obvious reason to put it somewhere else to put it there.

Appendix itself does not really have a semantics I think and therefore marking it up separately is questionable.

A: Appendix information is theoretically split off only when the base publication name exists in the same volume without "Suppl." or "App." appended. So the obvious reason would be "This appears to refer explicitly to a separate part of a work".

Unfortunately, there are a few publication names where "Suppl." or "App." actually belongs to the publication name itself where my script may split off said information in error. I currently try to fix those when I encounter them.

If it's in the publication details following the year, I just assume it's part of the details.

Updated by Andreas Müller almost 6 years ago · 12 revisions