Project

General

Profile

Actions

Taxonomic Core Components


Existing models considered

Requirements

The role within the platform of the taxonomic core components is to store, edit and publish taxonomic and other taxon-related data. Researchers must be able to collaborate but also establish different opinions about taxonomic groups. The differing opinions and divergent views on taxa leading to different taxonomic concepts need to be addressed. At the same time, there is a pronounced need from both within and outside the taxonomic community to present a single voice to the public. The taxonomic core components of the platform should be able to facilitate the creation of a consensus taxonomy or preferred view, mainly through a review process, but should not force users to do so. One of the most important usages of taxonomic information is to serve as a stable reference for other communities. In order to support a taxonomist in this task, a persistent archive needs to be created. The primary functional components to be defined for such a system are:

  • Storage

  • Archive

  • Editor

  • Public Portal

  • Expert Portal

Ideally, in the spirit of the platform, all those components will be decoupled, work dynamically with the common data model (CDM) and be exchangeable with other similar components. We have considered the following options for building a basis for the above components:

Options considered

CATE

The CATE project is dedicated to delivering a collaborative online revision tool for taxonomy, featuring an elaborate editorial review process. However, the CATE scenario is not the only process of revision the EDIT platform is seeking to support. Individual revisions without reviews - potentially even offline - should also be considered. CATE is in its early stages, and as such its final form is unknown. We therefore aim at integrating CATE into the platform through data exchange interfaces as well as shared components, but would prefer not to adopt CATE as the taxonomic core "reference" implementation. Shared components being discussed at the moment are: an interactive online key powered by defined backend services; and visualisation of distributional data through simple REST services abstracting the more complex OGC protocols used internally.

Berlin Model based on PHP/AJAX

The Berlin Model is a proven and widely understood relational model for managing taxonomic data. Its use of proprietary MS SQL Server and Cold Fusion has been an obstacle to more widespread adoption. This option would involve the migration of the Cold Fusion code to PHP, while gradually migrating business logic away from the database system into a PHP library. It was concluded together with WP6 exemplary groups that the current Cold Fusion web-based editor also needs a total redesign in terms of usability. Other improvements that are foreseen include: modifying the relational model; adding a type specimen module; integrating zoological modifications into the names part of the model; simplifying the reference module; and in general introducing some simplifications learned from extensive use of the model over the past several years. The clear disadvantage of this option is the tight coupling of the editor functionality with the relational Berlin Model; the editor would not be compatible with other backends. The amount of development necessary would be substantial, and existing software for the Berlin Model would need to be adapted as well. For a modern user interface, some Javascript (AJAX) would be required; this technology as it currently exists is difficult to maintain and would require the implementation of similar classes on both the server (PHP) and client (JS) side.

Drupal scratchpad

An extensively discussed option suggested by WP6 is the use of the content management system (CMS) Drupal as an application framework to develop an editor and portal for the taxonomic core. The Berlin Model could be ported to Drupal’s relational database (mysql or postgresql) and separate Drupal modules could be developed supporting the primary objects of the taxonomic core, i.e. names, taxonomic concepts, authors, references, types, descriptions and general facts such as occurrences. Drupal already provides modules to deal with references, user roles and even A9 OpenSearch (see above). On the other hand, such a system also binds the editor and portal components to a certain relational model. Drupal has not been designed as an application framework for complex data manipulations, but has its strength in providing a collaborative environment with community tools. There has been some skepticism as to whether user interfaces for complex editing tasks are even possible with Drupal, at least at a reasonable cost. The conclusion thus far is that Drupal is without doubt an excellent tool for community building, and well suited as a portal displaying taxonomic information and allowing users to annotate data, but does not appear to be a good framework for developing an editor and archive.

Proposal

Object Oriented Design

Thus far, we have concluded: all existing data models must be modified; a new CDM must be created; and there is the need for serialisation to/from XML for imports/exports and other REST services. Here, a logical solution presents itself: to build the taxonomic core components on the basis of the CDM. An object-oriented domain model, persistence technology and XML binding tools in Java will take care of much of the work. CATE largely follows this approach, but bases the model on an XML schema. We suggest modelling the CDM in UML, deriving Java classes from the model, and then using an object relational mapping framework (O/RM) and XML or RDF binding tools that map to the CDM. This last step is necessary because the CDM based on existing TDWG standards might differ slightly from the domain model.

To decouple the functional components above, web services can be used, but their interfaces need to be defined for all components. This is a substantial amount of work, as an editor and portal need many more services than simple object access by ID. This heavily depends upon their user interface functionality, but current rich internet clients build with AJAX or Flash demonstrate this already.

The tight coupling of descriptive data to taxonomic concepts further influenced our discussions. If descriptive data is managed in an application separate from the nomenclature and taxonomic classification, it will be hard to keep those two systems in sync. Every change in the description of a taxon ultimately alters the taxon concept; the circumscription has changed. It would be desirable to come up with a system that allows several components to stay in sync with each other. We therefore propose a data repository that can store the entire CDM and exposes a single synchronisation interface to other applications. Such a repository can also easily be extended to serve as an archive and a fallback cache for data resolvers. All data will also be accessible on an object base via CRUD REST services, i.e. the exposure of methods to read, update, delete and insert objects. Further, more complex search interfaces - based wherever possible on OpenSearch - can be created as needed, by portals, for example.

A taxonomic desktop editor or other clients can then be built on top of the domain model that runs offline on an embedded SQLlite database. This database can then be synched with the central repository at any time and keep other applications up to date that have subscribed to some datasets. Some examples are: a descriptive tool that updates the repository itself; and a read-only field recording tool that subscribes to a certain taxonomic group or taxa within a certain geographic area. The development of such an editor client can reuse most of the repository code (domain model, persistence, syncAPI) so that the user interface is the only component requiring additional development.

Based on data push, such a system scales considerably better, as data is only exchanged when needed. The Google Data Service (GData) is a good example of such a design, though it lacks the proposed synchronisation API. Additionally, such an application no longer requires a (web) server, and desktop clients have far more flexibility than even rich Internet clients.

Updated by Andreas Müller almost 2 years ago · 6 revisions