switched to docbkx plugin v 2.0.9 and added some more documentation
[cdmlib.git] / src / docbkx / ReferenceDocumentation.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <book version="5.0"
3 xsi:schemaLocation="http://docbook.org/ns/docbook http://docbook.org/xml/5.0/xsd/docbook.xsd"
4 xml:id="cdm-reference-guide" xmlns="http://docbook.org/ns/docbook"
5 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
6 xmlns:xs="http://www.w3.org/2001/XMLSchema"
7 xmlns:xlink="http://www.w3.org/1999/xlink"
8 xmlns:xi="http://www.w3.org/2001/XInclude"
9 xmlns:ns5="http://www.w3.org/1999/xhtml"
10 xmlns:ns4="http://www.w3.org/2000/svg"
11 xmlns:ns3="http://www.w3.org/1998/Math/MathML"
12 xmlns:ns="http://docbook.org/ns/docbook">
13 <bookinfo>
14 <title>EDIT Common Data Model Library</title>
15
16 <subtitle>Reference Documentation (Work in Progress)</subtitle>
17
18 <corpauthor>
19 <inlinegraphic fileref="./resources/images/logo.png" />
20 </corpauthor>
21
22 <!-- Please add your names here -->
23
24 <authorgroup>
25 <author>
26 <personname>Ben Clark</personname>
27 </author>
28 </authorgroup>
29
30 <releaseinfo>2.1</releaseinfo>
31
32 <copyright>
33 <year>2009</year>
34
35 <holder>EDIT - European Distributed Institute of Taxonomy -
36 http://www.e-taxonomy.eu</holder>
37 </copyright>
38
39 <legalnotice>
40 <para>The contents of this file are subject to the Mozilla Public
41 License Version 1.1. See LICENSE.TXT at the top of this package for the
42 full license terms.</para>
43 </legalnotice>
44 </bookinfo>
45
46 <toc />
47
48 <preface id="preface">
49 <title>Preface</title>
50
51 <para>EDIT's Internet Platform for Cybertaxonomy is a distributed
52 computing platform that helps taxonomists do revisionary taxonomy and
53 taxonomic field work efficiently and expediently via the web. At the core
54 of the platform lies a common data model to enable interoperability
55 between the different components. The model describes all the commonly
56 used data that is dealt with in the platform, and therefore covers
57 taxonomic names and concepts; literature references; authors; (type)
58 specimen; structured descriptive data; molecular data; related (binary)
59 files such as images or compiled keys; controlled vocabularies and terms;
60 and species related content of any kind like economic use or conservation
61 status.</para>
62
63 <para>The cyberplatform consists of interoperable but independent
64 components. Platform components can take the form of software applications
65 (desktop or web-based) for human users or (web) services intended to be
66 used by other software applications. The platform as envisioned does not
67 have a single user interface or website; rather, it is a collection of
68 interacting components which may be combined and assembled according to
69 the task in hand. To facilitate the development of core CDM Applications
70 such as the CDM Community Server, the CDM Dataportals, and the Taxonomic
71 Editor, an implementation of the CDM has been created in the java
72 programming language. In addition to CDM model classes being modelled as
73 plain-old-java-objects (<link
74 xlink:href="http://en.wikipedia.org/wiki/Plain_Old_Java_Object">pojo's</link>),
75 a set of java components has been created that provide common services
76 across all java applications using the CDM. In addition to serving as the
77 basis of core components of the Internet Platform for Cyberplatform, they
78 also allow the development of other applications using the CDM by
79 providing basic functionality that can be extended for a particular
80 purpose.</para>
81
82 <para>The CDM Library, as it is known, consists of four major modules that
83 can be used by any java application based on the CDM. These libraries are
84 used as the foundation of the Taxonomic Editor and the CDM Community
85 Server. In addition a web application (the CDM Community Server) is
86 documented here, as its components can be re-purposed or extended by other
87 web applications based on the CDM.</para>
88
89 <figure>
90 <mediaobject>
91 <imageobject role="html">
92 <imagedata fileref="resources/images/cdmlib-arch3.png" format="png" />
93 </imageobject>
94
95 <imageobject role="fo">
96 <imagedata contentwidth="160mm"
97 fileref="resources/images/cdmlib-arch3.png" format="png"
98 scalefit="1" />
99 </imageobject>
100
101 <caption>The overall architecture of the EDIT Internet platform for
102 Cybertaxonomy, showing the core components of the CDM Java Library,
103 and their use by desktop (Taxonomic Editor) and web-based (CDM
104 Dataportal, CATE) applications.</caption>
105 </mediaobject>
106 </figure>
107 </preface>
108
109 <part>
110 <title>Common Data Model</title>
111
112 <partintro>
113 <para>The Common Data Model (CDM) is the domain model for the core EDIT
114 cyberplatform components. The CDM is primarily based on the <link
115 linkend="???"
116 xlink:href="http://wiki.tdwg.org/twiki/bin/view/TAG/LsidVocs">TDWG
117 Ontology</link> and in most cases there is concordance with relevant
118 TDWG standards such as <link linkend="???"
119 xlink:href="http://www.tdwg.org/standards/117/">Taxon Concept Transfer
120 Schema (TCS)</link>, <link linkend="???"
121 xlink:href="http://www.tdwg.org/standards/117/">Structured Descriptive
122 Data (SDD)</link> and <link linkend="???"
123 xlink:href="http://www.tdwg.org/standards/115/">Access to Biological
124 Collections Data (ABCD)</link>.</para>
125
126 <para>The CDM differs from the TDWG standards in its purpose: it is
127 intended to serve as the basis of software applications in the
128 cyberplatform (e.g. the taxonomic editor, the CDM Dataportals) rather
129 than being a standard for data exchange between any resource containing
130 biodiversity information. Whilst it is certainly possible to exchange
131 data as CDM domain objects serialized as XML or JSON (the CDM Server and
132 the CDM Dataportals do this), the common data model is not intended to
133 replace existing TDWG standards as a general purpose exchange standard.
134 It is possible to convert data held in a CDM store into a relevant TDWG
135 standard for exchange and in some cases this may be the desired route
136 for data held in the CDM (e.g. for exchange with an application that is
137 not part of the cyberplatform, but which is capable of understanding
138 data in a TDWG standard).</para>
139
140 <para>Thus the CDM is intended for use as</para>
141
142 <itemizedlist>
143 <listitem>
144 <para>A domain model for applications, particularly those that
145 enable taxonomists to do revisionary taxonomy and taxonomic field
146 work</para>
147 </listitem>
148
149 <listitem>
150 <para>A standard for exchange between applications that are part of
151 the EDIT Internet Platform for Cybertaxonomy</para>
152 </listitem>
153 </itemizedlist>
154
155 <para>In terms of scope, the CDM covers information core to the vision
156 of the cyberplatform i.e. descriptive and revisionary taxonomy,
157 including taxonomic fieldwork :-</para>
158
159 <itemizedlist>
160 <listitem>
161 <para>Taxonomic names and nomenclature, typification</para>
162 </listitem>
163
164 <listitem>
165 <para>Taxonomic concepts and relationships between accepted names
166 and synonyms, including the placement of the same taxonomic concept
167 in different taxonomic hierarchies.</para>
168 </listitem>
169
170 <listitem>
171 <para>Specimens and Observations of individual organisms, their
172 collection, location, processing and taxonomic determination.</para>
173 </listitem>
174
175 <listitem>
176 <para>Structured and unstructured information about names, taxa, and
177 specimens.</para>
178 </listitem>
179 </itemizedlist>
180
181 <para>In addition to this core area, the CDM covers some related domains
182 that are important:-</para>
183
184 <itemizedlist>
185 <listitem>
186 <para>Literature</para>
187 </listitem>
188
189 <listitem>
190 <para>People, teams of people and institutions in various roles
191 (i.e. as authors, collectors, artists, rights holders etc)</para>
192 </listitem>
193
194 <listitem>
195 <para>Media (images, video and audio files, plus more
196 taxonomy-specific media such as phylogenies and compiled
197 keys)</para>
198 </listitem>
199
200 <listitem>
201 <para>Molecular data, such as DNA sequences and loci</para>
202 </listitem>
203 </itemizedlist>
204
205 <para>As you might expect, there are also a number of data entities
206 representing controlled vocabularies, identity of users (and their roles
207 and permissions), and ancillary data common to all major classes such as
208 multilingual text content, annotations and markers.</para>
209
210 <figure>
211 <title>A UML Package diagram showing the CDM packages and their
212 members.</title>
213
214 <mediaobject>
215 <imageobject role="html">
216 <imagedata fileref="resources/images/ModelOverview20.gif" />
217 </imageobject>
218
219 <imageobject role="fo">
220 <imagedata contentwidth="160mm"
221 fileref="resources/images/ModelOverview20.gif"
222 scalefit="1" />
223 </imageobject>
224 </mediaobject>
225 </figure>
226 </partintro>
227
228 <xi:include href="base-classes.xml" />
229
230 <!--<xi:include href="annotation-and-markers.xml" />-->
231
232 <!--<xi:include href="extensions.xml" />-->
233
234 <!--<xi:include href="identifiable-entities.xml" />-->
235
236 <!--<xi:include href="validation.xml" />-->
237 </part>
238
239 <part>
240 <title>Persistence Layer</title>
241
242 <partintro>
243 <para>Even the most basic of taxonomic applications have a requirement
244 for users to be able to save the information that they create. In
245 addition, a common component of taxonomic applications is the use of a
246 database to provide users with the ability to filter or search their
247 data in one way or another. Some applications will require more advanced
248 functionality, such as auditing or versioning of data. All of this logic
249 is contained in the persistence layer, providing clean separation
250 between data access and more taxonomy-centric business logic in the
251 service layer.</para>
252
253 <para>Persistence is not a simple problem to solve, especially in
254 application developed in Object-Oriented languages, with large amounts
255 of data, or with many users accessing data at the same time. The CDM
256 Library uses the <link
257 xlink:href="http://www.hibernate.org">Hibernate</link> object/relational
258 persistence and query service as the basis of its persistence layer.
259 Several member projects of the Hibernate stable, including <link
260 xlink:href="http://annotations.hibernate.org">Hibernate
261 Annotations</link>, <link linkend="???"
262 xlink:href="http://search.hibernate.org">Hibernate Search</link> and
263 <link linkend="???">Hibernate Envers</link> (part of Hibernate Core)
264 provide the basis of the more advanced persistence-related functionality
265 in the CDM Library. As a consequence some of the behaviour of the CDM
266 Library is constrained by the underlying ORM technology. The advantage
267 of using an ORM is that the same software can be used with multiple
268 database systems with (almost) no changes to the application. Currently
269 the CDM Library has been tested with (version numbers &amp; platforms in
270 brackets)</para>
271
272 <!--I don't know how many of these have been tested, on which platforms, but it would be good to include some measure of which platform / database combinations
273 have been used and how, so that potential users can evaluate the technology. In an ideal world, we would pick some databases as "supported" and ensure that
274 the test suite runs on that platform / db combination (i.e. you don't release until the tests pass). For the others, we still might want to say: "We tested
275 the CDM on this platform and it seemed to work".-->
276
277 <itemizedlist>
278 <listitem>
279 <para>IBM <link
280 xlink:href="http://www.ibm.com/software/data/db2/">DB2</link></para>
281 </listitem>
282
283 <listitem>
284 <para><link xlink:href="???">H2</link> (default local database used
285 by the Taxonomic Editor, 1.0.73)</para>
286 </listitem>
287
288 <listitem>
289 <para>
290 <link xlink:href="http://hsqldb.org">HSQLDB</link>
291 </para>
292 </listitem>
293
294 <listitem>
295 <para><link xlink:href="http://www.mysql.com">MySQL</link> (4.1.20:
296 linux; 5.1.32: windows)</para>
297 </listitem>
298
299 <listitem>
300 <para>
301 <link xlink:href="???">ODBC</link>
302 </para>
303 </listitem>
304
305 <listitem>
306 <para>
307 <link
308 xlink:href="http://www.oracle.com/database/index.html">Oracle
309 Database 11<emphasis>g</emphasis></link>
310 </para>
311 </listitem>
312
313 <listitem>
314 <para>
315 <link xlink:href="http://www.postgresql.org/">PostgreSQL</link>
316 </para>
317 </listitem>
318
319 <listitem>
320 <para>
321 <link xlink:href="???">Microsoft SQL Server 2000</link>
322 </para>
323 </listitem>
324
325 <listitem>
326 <para>
327 <link linkend="???"
328 xlink:href="http://www.microsoft.com/sqlserver/2005/">Microsoft
329 SQL Server 2005</link>
330 </para>
331 </listitem>
332
333 <listitem>
334 <para>
335 <link linkend="???" xlink:href="http://www.sybase.co.uk/">Sybase
336 Advantage Database Server</link>
337 </para>
338 </listitem>
339 </itemizedlist>
340
341 <para>In theory, application developers should not need to use the
342 persistence layer directly, but should instead use the <link
343 linkend="api">API</link>, which provides a <emphasis>facade</emphasis>
344 over the persistence layer and extra business logic that most
345 applications using the CDM will require.</para>
346 </partintro>
347
348 <xi:include href="basic-persistence.xml" />
349
350 <!--<xi:include href="listing-sorting-initializing.xml" />-->
351
352 <!--<xi:include href="versioning.xml" />-->
353
354 <!--<xi:include href="free-text-search.xml" />-->
355 </part>
356
357 <part xml:id="api">
358 <title>API Methods</title>
359
360 <partintro>
361 <para>This part discusses the service layer:</para>
362 </partintro>
363
364 <!--<xi:include href="service.xml" />-->
365
366 <!--<xi:include href="paging-resultsets.xml" />-->
367
368 <!--<xi:include href="application-controller.xml" />-->
369
370 <!--<xi:include href="transactions.xml" />-->
371
372 <!--<xi:include href="guid-resolution.xml" />-->
373
374 <!--<xi:include href="security.xml" />-->
375 </part>
376
377 <part>
378 <title>CDM Input / Output Layer</title>
379
380 <partintro>
381 <para>This part describes the input output routines:</para>
382 </partintro>
383
384 <!--<xi:include href="base-io-usage.xml" />-->
385
386 <!--<xi:include href="cdm-xml-input-output.xml" />-->
387
388 <!--<xi:include href="abcd-input-output.xml" />-->
389
390 <!--<xi:include href="berlinmodel-input-output.xml" />-->
391
392 <!--<xi:include href="excel-input-output.xml" />-->
393
394 <!--<xi:include href="sdd-input-output.xml" />-->
395
396 <!--<xi:include href="taxonx-input-output.xml" />-->
397
398 <!--<xi:include href="tcsrdf-input-output.xml" />-->
399
400 <!--<xi:include href="tcsxml-input-output.xml" />-->
401 </part>
402
403 <part>
404 <title>CDM Server</title>
405
406 <partintro>
407 <para>This part describes the cdm-server application:</para>
408 </partintro>
409
410 <!--<xi:include href="cdm-server.xml" />-->
411
412 <!--<xi:include href="instalation.xml" />-->
413
414 <!--<xi:include href="configuration.xml" />-->
415 </part>
416 </book>