Taxonomic Data Standards¶
TDWG is the main source for data standards in the biodiversity realm. Standards can be classified whether they try to represent documents, preserving the wordflow of documents and semantically tag information, or if they are serialisation formats used for data exchange that more or less represent objects. The CommonDataModel will be a normalised object based format developed in UML.
- Table of contents
- Taxonomic Data Standards
Currently most standards are build on XML Schemas, but RDF/OWL is gaining popularity lately. XML Schemas have much more support in existing software frameworks and are the basis for most webservices. The decision on whether to use XML Schemas or RDFS/OWL to define an exchange format for the CDM has not been decided yet.
Nomenclatural codes¶
Vocabularies¶
Full data models¶
Relational models¶
IOPI¶
Prometheus¶
Implementation of the Prometheus Taxonomic Model a comparison of database models and query languages and an introduction to the Prometheus Object-Oriented Model. Taxon vol. 51. 2002. pp. 131-142
Orthoptera Species Files¶
The Orthoptera Species File (OSF) is a taxonomic database of the world's orthopteroid insects. It is using the Species File Software (SFS, http://software.speciesfile.org/AboutSFS.aspx). SFS is is based on a published relational data model (http://software.speciesfile.org/design/Design.aspx).
Object oriented formats¶
TDWG TCS¶
The prime candidate to encode taxonomic concepts in XML
TDWG LSID vocabularies¶
look promising to get started and involved with:
CATE Software¶
CATE's data model is very much based on TCS, which is basically extended by more data types.
Compared to the BerlinModel is lacks:
ex-authors
the basionym year is probably part of a proper basionym name record.
It additionally has:
- isAnamorphic - boolean flag taken from TCS for fungus
Taxis Software¶
- http://www.bio-tools.net/ still version3.5 but there is a prototype of Taxis4 around already
mx Software¶
Following is a brief overview of fields in the mx data model but not in the Berlin Model. The BM implementation used for comparison was the Euro+Med Plantbase project, which has extensions such as types and distribution not implemented in all other BM databases.
Name - Two fields in the table taxon_names
not accounted for in the BM are:
nominotypical_subgenus BOOLEAN
iczn_group
VARCHAR(8) -- species, genus, family only@
'''Type''' - The implementation of type information is relatively recent in the Berlin Model, and yet to be implemented in all BM databases. The following columns in taxon_names
show a greater level of atomization than the BM's:
type_count INT UNSIGNED -- only allowed if status = syntypes
type_sex VARCHAR(255) -- male/female/gynadromorph/undetermined
type_repository_id INT UNSIGNED
type_repository_notes VARCHAR(255)
type_geog_id INT UNSIGNED
type_locality text
type_by VARCHAR(64) -- how did this become a type (monotypy etc.)
type_lost BOOLEAN
'''Reference''' - not sure what distinction is being made here, but this is a greater level of granularity than the BM's @RefDetail@, a catchall that usually refers to page in a reference.
page_validated_on INT
page_first_appearance INT
'''Notes fields''' - of note here is the separate notes field for imports - experience has proven that this separation would be very useful in future BM model implementations, as well as in the EDIT Common Data Model.
notes TEXT
import_notes TEXT
Document based formats¶
TaxonX¶
see wiki:TaxonX
BerlinModel import schema¶
TaXMLit¶
AMNH's NSF Taxonomic Literature Projectpages
TEI¶
Text Encoding Initiative, a general markup language for documents, not specific to taxonomy.
Examples¶
The AMNH has a nice comparison of the same PDF document encoded in different standards:
Examples in Digitising ant pubications: http://antbase.org/databases/xml_docs.html
Updated by Katja Luther over 1 year ago · 12 revisions