GBIFChecklistHackathon2015Team2Results » History » Version 2

Andreas Kohlbecker, 06/27/2018 09:01 AM

1 1 Andreas Kohlbecker
# Web annotation of taxon-level data - Results of Training Hackathon for Checklist Cross-mapping and Precursor National Checklists Generation from GBIF-mediated data
2 1 Andreas Kohlbecker
3 1 Andreas Kohlbecker
*Andrea Kohlbecker (leader),  Ruud Altenburg,  Oskar Kindvall, David Remsen*
4 1 Andreas Kohlbecker
5 1 Andreas Kohlbecker
Sources of biodiversity occurrence data, such as catalogued and indexed by GBIF, may serve as a means to both verify or extend the list of taxa found in national species checklists. They might also serve as the means to start a de-novo national species list. Team 2 focused on a system design that could be used to present assertions of a taxon occurrence within a country - to a presumed expert curator, who might then use their knowledge to assess the assertion and determine whether the taxon should or should not be added to the list. The authoritative Catalogue of Life record - linked through the cross-mapping efforts of Team 1, would then form the record-of-authority for the national list. In addition, negative matches (i.e, species asserted to occur within the country but determined to not belong there - might be linked to a comment or annotation that could serve to inform future users of the GBIF network to the nature of the suspect occurrence. This led to the articulation of the following user story.
6 1 Andreas Kohlbecker
7 1 Andreas Kohlbecker
## User Story 2-1
8 1 Andreas Kohlbecker
9 1 Andreas Kohlbecker
As an owner of a national checklist I want to load my checklist into a system and compare it to the list of taxa assigned to my country within the GBIF index. Matches missing from my national list may 1) represent legitimate missing taxa that should be candidates to add to my list. They may also 2) represent taxa erroneously applied to my country that should be annotated with their suspect status for future users of the record.
10 1 Andreas Kohlbecker
11 2 Andreas Kohlbecker
## Goals
12 1 Andreas Kohlbecker
13 1 Andreas Kohlbecker
1. Can the federated GBIF portal be used to support the identification and qualification of novel species occurrence records in the development of national or regional species inventories?
14 1 Andreas Kohlbecker
2. Can annotation interfaces be used, in combination with authoritative regional or national species lists, to identify and annotate potentially erroneous species occurrences and thus inform future users of GBIF-mobilized data as to this erroneous assessment?
15 1 Andreas Kohlbecker
16 1 Andreas Kohlbecker
17 2 Andreas Kohlbecker
## System Design
18 1 Andreas Kohlbecker
19 1 Andreas Kohlbecker
Team 2 came up with the following solutions for each step in the workflow described in Figure 2 below.
20 1 Andreas Kohlbecker
21 1 Andreas Kohlbecker
![Figure​ ​2​ ​-​ ​Workflow​ ​describing​ ​the​ ​steps​ ​needed​ ​to​ ​enable​ ​comparison​ ​between​ ​a​ ​national​ ​checklist​ ​and
22 1 Andreas Kohlbecker
the​ ​taxa​ ​represented​ ​by​ ​GBIF​ ​occurrence​ ​data.
23 1 Andreas Kohlbecker
24 2 Andreas Kohlbecker
25 2 Andreas Kohlbecker
1) GBIF. The term taxonKey appeared to be a more solid choice.
26 2 Andreas Kohlbecker
27 2 Andreas Kohlbecker
2) To retrieve the list of taxa represented by occurrence data, the team used SQL distinct selection on speciesKey, scientificName, genus; specificEpithet, infraspecificEpithet. This list was stored in the database (n = 43 387).
28 2 Andreas Kohlbecker
29 2 Andreas Kohlbecker
3) Cross-matching the checklist with the Catalogue of Life
30 2 Andreas Kohlbecker
31 2 Andreas Kohlbecker
4) Filtering out the negative matches (the taxa missing from the original national list)
32 2 Andreas Kohlbecker
33 2 Andreas Kohlbecker
A table was created where all taxa represented by GBIF data was inserted. This table included the following columns: taxonKey, AnnosSysUri, scientificName, blacklisted (bool), taxonStatusGBIF, existsInChecklist, checklistStatus (native, introduced etc), occurrenceRecordCount. The fields existsInChecklist, checklistStatus were updated from the Checklist table. Extraction of the potentially missing taxa was then made by selecting which taxa in the table existsInChecklist is false.
34 2 Andreas Kohlbecker
35 2 Andreas Kohlbecker
5) Annotate the missing taxa.
36 2 Andreas Kohlbecker
37 2 Andreas Kohlbecker
a. The team used AnnoSys ( to store annotations to the taxon occurrence records.  AnnoSys was originally intended to annotate biodiversity occurrence records in ABCD; an XML format. The team extended an Annotation class of AnnoSys so it could handle information about a taxon (as opposed to a taxon occurrence).  The team further developed a new annotation model, which is also based on the W3C Open Annotation Data Model ( General techical documentation and documentation of the open annotation model as used by AnnoSys can be found at
38 2 Andreas Kohlbecker
The purpose of the annotation in this case was to express that the distribution for the taxon might, or might not, be correct. In order to express the latter, a interim RDF term ( was introduced.
39 2 Andreas Kohlbecker
40 2 Andreas Kohlbecker
b. The validation information is then supposed to be posted into AnnoSys using its REST API. We suggest that the annotation should be related to the URL representing the taxon page of GBIF i.e. annotation should be expressed in a way that should be interpreted as: for the taxon with the taxonKey, all occurrences reported for the specified Country where the establishmentMeans do not clearly indicate non natural occurrence, should be considered as being expected errors.
Add picture from clipboard (Maximum size: 40 MB)