Taxonomic Data Standards

TDWG is the main source for data standards in the biodiversity realm. Standards can be classified whether they try to represent documents, preserving the wordflow of documents and semantically tag information, or if they are serialisation formats used for data exchange that more or less represent objects. The CommonDataModel will be a normalised object based format developed in UML.

Currently most standards are build on XML Schemas, but RDF/OWL is gaining popularity lately. XML Schemas have much more support in existing software frameworks and are the basis for most webservices. The decision on whether to use XML Schemas or RDFS/OWL to define an exchange format for the CDM has not been decided yet.

Nomenclatural codes


Full data models

Relational models



Orthoptera Species Files

The Orthoptera Species File (OSF) is a taxonomic database of the world's orthopteroid insects. It is using the Species File Software (SFS, SFS is is based on a published relational data model (

Object oriented formats


The prime candidate to encode taxonomic concepts in XML

TDWG LSID vocabularies

look promising to get started and involved with:

CATE Software

CATE's data model is very much based on TCS, which is basically extended by more data types.

Compared to the BerlinModel is lacks:

  • ex-authors

  • the basionym year is probably part of a proper basionym name record.

It additionally has:

  • isAnamorphic - boolean flag taken from TCS for fungus

Taxis Software

mx Software

Following is a brief overview of fields in the mx data model but not in the Berlin Model. The BM implementation used for comparison was the Euro+Med Plantbase project, which has extensions such as types and distribution not implemented in all other BM databases.

Name - Two fields in the table taxon_names not accounted for in the BM are:

  • nominotypical_subgenus BOOLEAN

  • iczn_group VARCHAR(8) -- species, genus, family only@

'''Type''' - The implementation of type information is relatively recent in the Berlin Model, and yet to be implemented in all BM databases. The following columns in taxon_names show a greater level of atomization than the BM's:

  • type_count INT UNSIGNED -- only allowed if status = syntypes

  • type_sex VARCHAR(255) -- male/female/gynadromorph/undetermined

  • type_repository_id INT UNSIGNED

  • type_repository_notes VARCHAR(255)

  • type_geog_id INT UNSIGNED

  • type_locality text

  • type_by VARCHAR(64) -- how did this become a type (monotypy etc.)

  • type_lost BOOLEAN

'''Reference''' - not sure what distinction is being made here, but this is a greater level of granularity than the BM's @RefDetail@, a catchall that usually refers to page in a reference.

  • page_validated_on INT

  • page_first_appearance INT

'''Notes fields''' - of note here is the separate notes field for imports - experience has proven that this separation would be very useful in future BM model implementations, as well as in the EDIT Common Data Model.

  • notes TEXT

  • import_notes TEXT

Document based formats


see wiki:TaxonX

BerlinModel import schema



Text Encoding Initiative, a general markup language for documents, not specific to taxonomy.


Updated by Katja Luther over 1 year ago · 12 revisions