1 <?xml version=
"1.0" encoding=
"UTF-8"?>
3 xsi:
schemaLocation=
"http://docbook.org/ns/docbook http://docbook.org/xml/5.0/xsd/docbook.xsd"
4 xml:
id=
"cdm-reference-guide" xmlns=
"http://docbook.org/ns/docbook"
5 xmlns:
xsi=
"http://www.w3.org/2001/XMLSchema-instance"
6 xmlns:
xs=
"http://www.w3.org/2001/XMLSchema"
7 xmlns:
xlink=
"http://www.w3.org/1999/xlink"
8 xmlns:
xi=
"http://www.w3.org/2001/XInclude"
9 xmlns:
ns5=
"http://www.w3.org/1999/xhtml"
10 xmlns:
ns4=
"http://www.w3.org/2000/svg"
11 xmlns:
ns3=
"http://www.w3.org/1998/Math/MathML"
12 xmlns:
ns=
"http://docbook.org/ns/docbook">
14 <title>EDIT Common Data Model Library
</title>
16 <subtitle>Reference Documentation (Work in Progress)
</subtitle>
19 <inlinegraphic fileref=
"./resources/images/logo.png" />
22 <!-- Please add your names here -->
26 <personname>Ben Clark
</personname>
29 <personname>Andreas Müller
</personname>
33 <releaseinfo>2.1</releaseinfo>
38 <holder>EDIT - European Distributed Institute of Taxonomy -
39 http://www.e-taxonomy.eu
</holder>
43 <para>The contents of this file are subject to the Mozilla Public
44 License Version
1.1. See LICENSE.TXT at the top of this package for the
45 full license terms.
</para>
51 <preface id=
"preface">
52 <title>Preface
</title>
54 <para>EDIT's Internet Platform for Cybertaxonomy is a distributed
55 computing platform that helps taxonomists do revisionary taxonomy and
56 taxonomic field work efficiently and expediently via the web. At the core
57 of the platform lies a common data model to enable interoperability
58 between the different components. The model describes all the commonly
59 used data that is dealt with in the platform, and therefore covers
60 taxonomic names and concepts; literature references; authors; (type)
61 specimen; structured descriptive data; molecular data; related (binary)
62 files such as images or compiled keys; controlled vocabularies and terms;
63 and species related content of any kind like economic use or conservation
66 <para>The cyberplatform consists of interoperable but independent
67 components. Platform components can take the form of software applications
68 (desktop or web-based) for human users or (web) services intended to be
69 used by other software applications. The platform as envisioned does not
70 have a single user interface or website; rather, it is a collection of
71 interacting components which may be combined and assembled according to
72 the task in hand. To facilitate the development of core CDM Applications
73 such as the CDM Community Server, the CDM Dataportals, and the Taxonomic
74 Editor, an implementation of the CDM has been created in the java
75 programming language. In addition to CDM model classes being modelled as
76 plain-old-java-objects (
<link
77 xlink:
href=
"http://en.wikipedia.org/wiki/Plain_Old_Java_Object">pojo's
</link>),
78 a set of java components has been created that provide common services
79 across all java applications using the CDM. They serve as the basis of
80 core components of the Internet Platform for Cyberplatform and also allow
81 the development of other applications using the CDM by providing basic
82 functionality that can be extended for a particular purpose.
</para>
84 <para>The CDM Library, as it is known, consists of four major modules that
85 can be used by any java application based on the CDM. These libraries are
86 used as the foundation of the Taxonomic Editor and the CDM Community
87 Server. In addition a web application (the CDM Community Server) is
88 documented here, as its components can be re-purposed or extended by other
89 web applications based on the CDM.
</para>
92 <title>An overview of the main CDM Components
</title>
95 <imageobject role=
"html">
96 <imagedata fileref=
"resources/images/cdmlib-arch3.png" format=
"png" />
99 <imageobject role=
"fo">
100 <imagedata contentwidth=
"160mm"
101 fileref=
"resources/images/cdmlib-arch3.png" format=
"png"
105 <caption>The overall architecture of the EDIT Internet platform for
106 Cybertaxonomy, showing the core components of the CDM Java Library,
107 and their use by desktop (Taxonomic Editor) and web-based (CDM
108 Dataportal, CATE) applications.
</caption>
112 <para>This reference documentation is aimed at anyone who would like to
113 understand the software components that make up the core of the
114 cyberplatform: the CDM Java Library and the CDM Server application. More
115 generic information about the applications that make up the cyberplatform,
116 information for end-users of specific applications, and information on the
117 EDIT project itself are beyond the scope of this document. More
118 information about EDIT can be found on the
<link linkend=
"???">EDIT
119 website
</link>, and more information on the specific software applications
120 produced by EDIT can be found on the
<link linkend=
"???">Work Package
5
121 website
</link>.
</para>
125 <title>Getting Started
</title>
128 <para>This part of the reference documentation aims to provide simple
129 step-by-step instructions to enable application developers to start
130 using the CDM Java Library in their java application. To do this, we
131 will create a small toy application. The CDM Java Library is packaged
132 and published using the Apache Maven software project managment and
133 comprehension tool. To make life easier, we'll use maven to create our
134 application too. Assuming that Maven (
2.0.x+) installed, we begin by
135 creating a new maven application (substituting the group id, artifact
136 id, and version of our application):
</para>
138 <screen>mvn archetype:create -DgroupId=
<emphasis>org.myproject
</emphasis> -DartifactId=
<emphasis>myapp
</emphasis> -Dversion=
<emphasis>1.0</emphasis></screen>
140 <para>The next step is to add the EDIT maven repository to your maven
141 <emphasis>project object model
</emphasis> or
<emphasis>pom
</emphasis>
144 <programlisting>. . .
147 <id
>EditRepository
</id
>
148 <url
>http://wp5.e-taxonomy.eu/cdmlib/mavenrepo/
</url
>
150 </repositories
>
151 </project
></programlisting>
153 <para>We also need to add the specific dependency that we would like our
154 project to include.
</para>
156 <programlisting>. . .
159 <groupId
>eu.etaxonomy
</groupId
>
160 <artifactId
>cdmlib-services
</artifactId
>
161 <version
>1.1.1</version
><!--ben: We will need to change this to reflect the new release once it is available-->
163 </dependencies
>
165 . . .
</programlisting>
167 <para>In most cases, application developers will wish to include the
168 cdmlib services (which include the data model and persistence layer
169 too). In some cases, developers might wish to use components from the
170 <package>cdmlib-io
</package> and
<package>cdmlib-remote
</package>
171 packages too. New releases of the CDM Java Library are published in the
172 EDIT Maven Repository, and maven will download and use these artifacts
173 automatically if you change the version number of the dependency
174 specified in your pom file.
</para>
176 <para>All that remains is to set up the cdmlib services within the
177 application context. The CDM Java Library is uses the Spring Framework
178 to manage its components. Whilst it is not mandatory to wire the CDM
179 services and DAOs using Spring, it is certainly easier to configure your
180 application this way. A minimal applicationContext.xml (placed in
181 <filename>src/main/resources
</filename>) file might look like
184 <programlisting><import
resource=
"classpath:/eu/etaxonomy/cdm/services.xml" /
>
186 <bean
id=
"dataSource"
188 class=
"eu.etaxonomy.cdm.database.LocalHsqldb"
190 destroy-method=
"destroy">
191 <property
name=
"driverClassName" value=
"org.hsqldb.jdbcDriver"/
>
192 <property
name=
"username" value=
"sa"/
>
193 <property
name=
"password" value=
""/
>
194 <property
name=
"startServer" value=
"true"/
>
195 <property
name=
"silent" value=
"true"/
>
198 <bean
id=
"hibernateProperties"
199 class=
"org.springframework.beans.factory.config.PropertiesFactoryBean">
200 <property
name=
"properties">
202 <prop
key=
"hibernate.hbm2ddl.auto">create-drop
</prop
>
203 <prop
key=
"hibernate.dialect">org.hibernate.dialect.HSQLDialect
</prop
>
204 <prop
key=
"hibernate.cache.provider_class">org.hibernate.cache.NoCacheProvider
</prop
>
207 </bean
></programlisting>
209 <para>The first element imports the cdmlib service definitions. The two
210 other beans supply a data source and a properties object that the CDM
211 library uses to configure the hibernate session factory and connect to
212 the database. In this case, we're using an in-memory HSQL database, but
213 the CDM can be used with many other databases. The only thing left to do
214 is to start using the CDM services. In real applications, CDM services
215 may well be autowired into components using Spring or another dependency
216 injection mechanism. To keep this example simple, we'll initialize the
217 application context and obtain a service programatically.
</para>
219 <programlisting>ApplicationContext context = new ClassPathXmlApplicationContext(
"applicationContext.xml");
221 INameService nameService = (INameService)context.getBean(
"nameServiceImpl");
223 BotanicalName botanicalName = BotanicalName.NewInstance(Rank.SPECIES());
224 botanicalName.setGenusOrUninomial(
"Arum");
225 botanicalName.setSpecificEpithet(
"maculatum");
226 UUID uuid = nameService.saveTaxonName(botanicalName);
<!--ben: Again, this example reflects the 1.1.1 release, and will need to be changed slightly once the new release is available-->
228 System.out.println(
"Saved \'Arum maculatum\' under uuid " + uuid.toString());
</programlisting>
230 <para>In this simple example, we've covered the basics of using the CDM
231 Java Library. We created a simple maven project, and added the
232 repository and a single dependency to our pom file. We then created a
233 simple application context that used the default CDM configuration, and
234 specified a couple of objects that allowed the CDM to connect to a
235 database. Finally we initialized these services by loading the
236 application context, and then retrieved a specific service, and used it
237 to persist a new taxonomic name.
</para>
242 <title>Common Data Model
</title>
245 <para>The Common Data Model (CDM) is the domain model for the core EDIT
246 cyberplatform components. The CDM is primarily based on the
<link
248 xlink:
href=
"http://wiki.tdwg.org/twiki/bin/view/TAG/LsidVocs">TDWG
249 Ontology
</link> and in most cases there is concordance with relevant
250 TDWG standards such as
<link linkend=
"???"
251 xlink:
href=
"http://www.tdwg.org/standards/117/">Taxon Concept Transfer
252 Schema (TCS)
</link>,
<link linkend=
"???"
253 xlink:
href=
"http://www.tdwg.org/standards/117/">Structured Descriptive
254 Data (SDD)
</link> and
<link linkend=
"???"
255 xlink:
href=
"http://www.tdwg.org/standards/115/">Access to Biological
256 Collections Data (ABCD)
</link>.
</para>
258 <para>The CDM differs from the TDWG standards in its purpose: it is
259 intended to serve as the basis of software applications in the
260 cyberplatform (e.g. the taxonomic editor, the CDM Dataportals) rather
261 than being a standard for data exchange between any resource containing
262 biodiversity information. Whilst it is certainly possible to exchange
263 data as CDM domain objects serialized as XML or JSON (the CDM Server and
264 the CDM Dataportals do this), the common data model is not intended to
265 replace existing TDWG standards as a general purpose exchange standard.
266 It is possible to convert data held in a CDM store into a relevant TDWG
267 standard for exchange and in some cases this may be the desired route
268 for data held in the CDM (e.g. for exchange with an application that is
269 not part of the cyberplatform, but which is capable of understanding
270 data in a TDWG standard).
</para>
272 <para>Thus the CDM is intended for use as
</para>
276 <para>A domain model for applications, particularly those that
277 enable taxonomists to do revisionary taxonomy and taxonomic field
282 <para>A standard for exchange between applications that are part of
283 the EDIT Internet Platform for Cybertaxonomy
</para>
287 <para>In terms of scope, the CDM covers information core to the vision
288 of the cyberplatform i.e. descriptive and revisionary taxonomy,
289 including taxonomic fieldwork :-
</para>
293 <para>Taxonomic names and nomenclature, typification
</para>
297 <para>Taxonomic concepts and relationships between accepted names
298 and synonyms, including the placement of the same taxonomic concept
299 in different taxonomic hierarchies.
</para>
303 <para>Specimens and Observations of individual organisms, their
304 collection, location, processing and taxonomic determination.
</para>
308 <para>Structured and unstructured information about names, taxa, and
313 <para>In addition to this core area, the CDM covers some related domains
314 that are important:-
</para>
318 <para>Literature
</para>
322 <para>People, teams of people and institutions in various roles
323 (i.e. as authors, collectors, artists, rights holders etc)
</para>
327 <para>Media (images, video and audio files, plus more
328 taxonomy-specific media such as phylogenies and compiled
333 <para>Molecular data, such as DNA sequences and loci
</para>
337 <para>As you might expect, there are also a number of data entities
338 representing controlled vocabularies, identity of users (and their roles
339 and permissions), and ancillary data common to all major classes such as
340 multilingual text content, annotations and markers.
</para>
343 <title>A UML Package diagram showing the CDM packages and their
347 <imageobject role=
"html">
348 <imagedata fileref=
"resources/images/ModelOverview20.gif" />
351 <imageobject role=
"fo">
352 <imagedata contentwidth=
"160mm"
353 fileref=
"resources/images/ModelOverview20.gif"
360 <xi:include href=
"base-classes.xml" />
362 <xi:include href=
"annotation-and-markers.xml" />
364 <!--<xi:include href="extensions.xml" />-->
366 <xi:include href=
"identifiable-entities.xml" />
369 ben: I think that some explaination of how the CDM deals with core
370 data classes, would be really useful here. In some cases, we're still
371 trying to understand how it should work and in that case it might
372 still be useful to have a straw-man that people can disagree with or
376 <!--<xi:include href="taxonomic-names.xml" />-->
378 <!--<xi:include href="taxonomic-concepts.xml" />-->
380 <!--<xi:include href="specimens-and-observations.xml" />-->
382 <!--<xi:include href="descriptive-data.xml" />-->
384 <!--<xi:include href="terms-and-vocabularies.xml" />-->
386 <!-- ben: We'll need to touch on _where_ these external files live . . .-->
388 <!--<xi:include href="media.xml" />-->
390 <!-- ben: I'll include something about validation once I start
391 work on the validation framework next month
394 <!--<xi:include href="validation.xml" />-->
398 <title>Persistence Layer
</title>
401 <para>Even the most basic of taxonomic applications have a requirement
402 for users to be able to save the information that they create. In
403 addition, a common component of taxonomic applications is the use of a
404 database to provide users with the ability to filter or search their
405 data in one way or another. Some applications will require more advanced
406 functionality, such as auditing or versioning of data. All of this logic
407 is contained in the persistence layer, providing clean separation
408 between data access and more taxonomy-centric business logic in the
409 service layer.
</para>
411 <para>Persistence is not a simple problem to solve, especially in
412 application developed in Object-Oriented languages, with large amounts
413 of data, or with many users accessing data at the same time. The CDM
414 Library uses the
<link
415 xlink:
href=
"http://www.hibernate.org">Hibernate
</link> object/relational
416 persistence and query service as the basis of its persistence layer.
417 Several member projects of the Hibernate stable, including
<link
418 xlink:
href=
"http://annotations.hibernate.org">Hibernate
419 Annotations
</link>,
<link
420 xlink:
href=
"http://search.hibernate.org">Hibernate Search
</link> and
421 <link linkend=
"http://jboss.org/envers/">Hibernate Envers
</link> (part
422 of Hibernate Core) provide the basis of the more advanced
423 persistence-related functionality in the CDM Library. As a consequence
424 some of the behaviour of the CDM Library is constrained by the
425 underlying ORM technology. The advantage of using an ORM is that the
426 same software can be used with multiple database systems with (almost)
427 no changes to the application. Currently the CDM Library has been tested
428 with (version numbers
& platforms in brackets)
</para>
430 <!--I don't know how many of these have been tested, on which platforms, but it would be good to include some measure of which platform / database combinations
431 have been used and how, so that potential users can evaluate the technology. In an ideal world, we would pick some databases as "supported" and ensure that
432 the test suite runs on that platform / db combination (i.e. you don't release until the tests pass). For the others, we still might want to say: "We tested
433 the CDM on this platform and it seemed to work".-->
438 xlink:
href=
"http://www.ibm.com/software/data/db2/">DB2
</link></para>
442 <para><link xlink:
href=
"???">H2
</link> (default local database used
443 by the Taxonomic Editor,
1.0.73)
</para>
448 <link xlink:
href=
"http://hsqldb.org">HSQLDB
</link>
453 <para><link xlink:
href=
"http://www.mysql.com">MySQL
</link> (
4.1.20:
454 linux;
5.1.32: windows)
</para>
459 <link xlink:
href=
"???">ODBC
</link>
466 xlink:
href=
"http://www.oracle.com/database/index.html">Oracle
467 Database
11<emphasis>g
</emphasis></link>
473 <link xlink:
href=
"http://www.postgresql.org/">PostgreSQL
</link>
479 <link xlink:
href=
"???">Microsoft SQL Server
2000</link>
486 xlink:
href=
"http://www.microsoft.com/sqlserver/2005/">Microsoft
487 SQL Server
2005</link>
493 <link linkend=
"???" xlink:
href=
"http://www.sybase.co.uk/">Sybase
494 Advantage Database Server
</link>
499 <para>In theory, application developers should not need to use the
500 persistence layer directly, but should instead use the
<link
501 linkend=
"api">API
</link>, which provides a
<emphasis>facade
</emphasis>
502 over the persistence layer and extra business logic that most
503 applications using the CDM will require.
</para>
506 <xi:include href=
"basic-persistence.xml" />
508 <xi:include href=
"versioning.xml" />
510 <xi:include href=
"free-text-search.xml" />
514 <title>API Methods
</title>
517 <para>Apart from the Common Data Model classes themselves, the CDM
518 Service layer contains the components most likely to be used directly by
519 applications based upon the CDM Java Library. This layer contains a set
520 of basic service objects that can be used as a facade over the
521 persistence logic.
</para>
524 <xi:include href=
"service.xml" />
526 <!--<xi:include href="application-controller.xml" />-->
528 <!--<xi:include href="transactions.xml" />-->
530 <xi:include href=
"guid-resolution.xml" />
532 <xi:include href=
"security.xml" />
536 <title>CDM Input / Output Layer
</title>
539 <para>This part describes the input output routines:
</para>
542 <!--<xi:include href="base-io-usage.xml" />-->
544 <!--<xi:include href="cdm-xml-input-output.xml" />-->
546 <!--<xi:include href="abcd-input-output.xml" />-->
548 <!--<xi:include href="berlinmodel-input-output.xml" />-->
550 <!--<xi:include href="excel-input-output.xml" />-->
552 <!--<xi:include href="sdd-input-output.xml" />-->
554 <!--<xi:include href="taxonx-input-output.xml" />-->
556 <!--<xi:include href="tcsrdf-input-output.xml" />-->
558 <!--<xi:include href="tcsxml-input-output.xml" />-->
562 <title>CDM Server
</title>
565 <para>This part describes the cdm-server application:
</para>
568 <!--<xi:include href="cdm-server.xml" />-->
570 <!--<xi:include href="instalation.xml" />-->
572 <!--<xi:include href="configuration.xml" />-->