Table of Contents
TDWG is the main source for data standards in the biodiversity realm. Standards can be classified whether they try to represent documents, preserving the wordflow of documents and semantically tag information, or if they are serialisation formats used for data exchange that more or less represent objects. The CommonDataModel will be a normalised object based format developed in UML.
Currently most standards are build on XML Schemas, but RDF/OWL is gaining popularity lately. XML Schemas have much more support in existing software frameworks and are the basis for most webservices. The decision on whether to use XML Schemas or RDFS/OWL to define an exchange format for the CDM has not been decided yet.
Nomenclatural codes
Vocabularies
Full data models
Relational models
IOPI
Prometheus
- Implementation of the Prometheus Taxonomic Model: a comparison of database models and query languages and an introduction to the Prometheus Object-Oriented Model. Taxon vol. 51. 2002. pp. 131-142
Orthoptera Species Files
The Orthoptera Species File (OSF) is a taxonomic database of the world's orthopteroid insects. It is using the Species File Software (SFS, http://software.speciesfile.org/AboutSFS.aspx). SFS is is based on a published relational data model (http://software.speciesfile.org/design/Design.aspx).
Object oriented formats
TDWG TCS
The prime candidate to encode taxonomic concepts in XML
TDWG LSID vocabularies
look promising to get started and involved with:
CATE Software
CATE's data model is very much based on TCS, which is basically extended by more data types. Compared to the BerlinModel is lacks:
- ex-authors
- the basionym year is probably part of a proper basionym name record.
It additionally has:
- isAnamorphic - boolean flag taken from TCS for fungus
Taxis Software
- http://www.bio-tools.net/ still version3.5 but there is a prototype of Taxis4 around already
mx Software
Following is a brief overview of fields in the mx data model but not in the Berlin Model. The BM implementation used for comparison was the Euro+Med Plantbase project, which has extensions such as types and distribution not implemented in all other BM databases.
Name - Two fields in the table taxon_names not accounted for in the BM are:
- nominotypical_subgenus BOOLEAN
- iczn_group VARCHAR(8) -- species, genus, family only`
Type - The implementation of type information is relatively recent in the Berlin Model, and yet to be implemented in all BM databases. The following columns in taxon_names show a greater level of atomization than the BM's:
- type_count INT UNSIGNED -- only allowed if status = syntypes
- type_sex VARCHAR(255) -- male/female/gynadromorph/undetermined
- type_repository_id INT UNSIGNED
- type_repository_notes VARCHAR(255)
- type_geog_id INT UNSIGNED
- type_locality text
- type_by VARCHAR(64) -- how did this become a type (monotypy etc.)
- type_lost BOOLEAN
Reference - not sure what distinction is being made here, but this is a greater level of granularity than the BM's RefDetail, a catchall that usually refers to page in a reference.
- page_validated_on INT
- page_first_appearance INT
Notes fields - of note here is the separate notes field for imports - experience has proven that this separation would be very useful in future BM model implementations, as well as in the EDIT Common Data Model.
- notes TEXT
- import_notes TEXT
Document based formats
TaxonX
see wiki:TaxonX
BerlinModel import schema
TaXMLit
- AMNH's NSF Taxonomic Literature Projectpages
- taXMLit-v1-3.xsd
TEI
Text Encoding Initiative, a general markup language for documents, not specific to taxonomy.
Examples
- The AMNH has a nice comparison of the same PDF document encoded in different standards:
- Original PDF
- TaxonX v1
- TaxonX v0.3
- TaXMLit
- ABCD
- SDD
- Examples in Digitising ant pubications: http://antbase.org/databases/xml_docs.html
