Taxonomic Data Standards¶
TDWG is the main source for data standards in the biodiversity realm. Standards can be classified whether they try to represent documents, preserving the wordflow of documents and semantically tag information, or if they are serialisation formats used for data exchange that more or less represent objects. The CommonDataModel will be a normalised object based format developed in UML.
- Table of contents
- Taxonomic Data Standards
Currently most standards are build on XML Schemas, but RDF/OWL is gaining popularity lately. XML Schemas have much more support in existing software frameworks and are the basis for most webservices. The decision on whether to use XML Schemas or RDFS/OWL to define an exchange format for the CDM has not been decided yet.
Full data models¶
Implementation of the Prometheus Taxonomic Model a comparison of database models and query languages and an introduction to the Prometheus Object-Oriented Model. Taxon vol. 51. 2002. pp. 131-142
Orthoptera Species Files¶
The Orthoptera Species File (OSF) is a taxonomic database of the world's orthopteroid insects. It is using the Species File Software (SFS, http://software.speciesfile.org/AboutSFS.aspx). SFS is is based on a published relational data model (http://software.speciesfile.org/design/Design.aspx).
Object oriented formats¶
The prime candidate to encode taxonomic concepts in XML
TDWG LSID vocabularies¶
look promising to get started and involved with:
CATE's data model is very much based on TCS, which is basically extended by more data types.
Compared to the BerlinModel is lacks:
the basionym year is probably part of a proper basionym name record.
It additionally has:
- isAnamorphic - boolean flag taken from TCS for fungus
- http://www.bio-tools.net/ still version3.5 but there is a prototype of Taxis4 around already
Following is a brief overview of fields in the mx data model but not in the Berlin Model. The BM implementation used for comparison was the Euro+Med Plantbase project, which has extensions such as types and distribution not implemented in all other BM databases.
Name - Two fields in the table
taxon_names not accounted for in the BM are:
iczn_groupVARCHAR(8) -- species, genus, family only@
'''Type''' - The implementation of type information is relatively recent in the Berlin Model, and yet to be implemented in all BM databases. The following columns in
taxon_names show a greater level of atomization than the BM's:
type_count INT UNSIGNED -- only allowed if status = syntypes
type_sex VARCHAR(255) -- male/female/gynadromorph/undetermined
type_repository_id INT UNSIGNED
type_geog_id INT UNSIGNED
type_by VARCHAR(64) -- how did this become a type (monotypy etc.)
'''Reference''' - not sure what distinction is being made here, but this is a greater level of granularity than the BM's @RefDetail@, a catchall that usually refers to page in a reference.
'''Notes fields''' - of note here is the separate notes field for imports - experience has proven that this separation would be very useful in future BM model implementations, as well as in the EDIT Common Data Model.
Document based formats¶
BerlinModel import schema¶
Text Encoding Initiative, a general markup language for documents, not specific to taxonomy.