Preliminary notes on the SFS database model used by the Orthoptera Species File (OSF)¶
The SFS data model is documented at http://software.speciesfile.org/design/Design.aspx using a set of E/R diagrammes for the model's core components (e.g. taxon, citation, people) as well as documents compiling attributes and controlled vocabulary. Unfortunately, the basic ideas of the model are not discussed on the web site and examples are not given which makies it difficult to understand and compare the model to other known models. Furthermore it seems that E/R diagrammes and the textual attribute lists are out of sync. Therefore this page will provide just an unstructured list of observations and questions and will be replace later by a detailed analysis once the SFS model devleopers have provided more background information on the model.
SFS stores names and taxa in single table (tblTaxon) which is sufficient for reflecting a taxonomic consesus view. Different taxonomic concepts belonging to the same name cannot be represented.
It seems that names in tblTaxon are composed using just three attributes. TName is the full latin name without authors. NecAuthor contains the author string for the first publication of the taxon. Parens is a flag indicating whether the authorstring has to be put in parentheses if a new combination of the name exists.
It is very unlekily that SFS uses the free text structure of names in tblTaxon as its primary storage for scientific names. There is another more structured name representation (tblNomenclator) which is linked via tblIdentification to tblTaxon that is probably used for storing atomized names. tblNomenclator has simply three foreign keys representing the four relevant name parts for multi-nomial names: GenusName, SubgenusName, SpeciesName, SubspeciesName. An important difference to the Berlin Model is that the nomenclator is obviously using catalogue tables for the different name parts which gives probably more control over the name vocabularies used. Another difference is the subgenus name part needed for zoological names which is not yet possible in the Berlin model.
Comparable to the Berlin Model the protologue is represented with two entities tblReference and tblCitation. However, Citation has a foreign key to tblTaxon so that multiple citations can be used for a single taxon. The type of citation is specified with CiteTypeID (not specifoed, validity, acceptance, synonymy, missaplication, correction).
The year of publication which is needed to construct the proper latin name in zoology is not directly stored in the name parts. So it is very likely taken form the attributes StatedYear and ActualYear from tblReference. But how are names stored if the original publication is not cited (e.g. imports of name lists)?
tblTaxon has to elements for indicating taxonomic and nomenclatural status values (NameStatus and StatusFlags). It seems that NameStatus contains broad categories (e.g. valid, temporary name, synonym) and StatusFlags is used for more specific status values (e.g. for synonym: literature misspelling, unjustified emendation, nomen dubium, supressed, etc.). However, I could not find values for the type of synonymy (heterotypic, homotypic). which is probably represented using the type-related elements in tblIdentification.
The SFS has an interesting way to represent multiple status values associated with a name or taxon. They are stored in a binary number each bit representing the presence/absence of a status value. For example bit 8 switched on means nomen protectum. How can one search efficiently on such a structure? And how can one store the references for the information pieces belonging to the individual bits?
There is a pointer to ITIS (TSN) in tblNomenclatur which is presntly unused.
Each taxon is associated with an expert using the ExpertId attribute in tblTaxon.