updated javadoc
[cdmlib.git] / src / docbkx / ReferenceDocumentation.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <book version="5.0"
3 xsi:schemaLocation="http://docbook.org/ns/docbook http://docbook.org/xml/5.0/xsd/docbook.xsd"
4 xml:id="cdm-reference-guide" xmlns="http://docbook.org/ns/docbook"
5 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
6 xmlns:xs="http://www.w3.org/2001/XMLSchema"
7 xmlns:xlink="http://www.w3.org/1999/xlink"
8 xmlns:xi="http://www.w3.org/2001/XInclude"
9 xmlns:ns5="http://www.w3.org/1999/xhtml"
10 xmlns:ns4="http://www.w3.org/2000/svg"
11 xmlns:ns3="http://www.w3.org/1998/Math/MathML"
12 xmlns:ns="http://docbook.org/ns/docbook">
13 <bookinfo>
14 <title>EDIT Common Data Model Library</title>
15
16 <subtitle>Reference Documentation (Work in Progress)</subtitle>
17
18 <corpauthor>
19 <inlinegraphic fileref="./resources/images/logo.png" />
20 </corpauthor>
21
22 <!-- Please add your names here -->
23
24 <authorgroup>
25 <author>
26 <personname>Ben Clark</personname>
27 </author>
28 <author>
29 <personname>Andreas Müller</personname>
30 </author>
31 </authorgroup>
32
33 <releaseinfo>2.1</releaseinfo>
34
35 <copyright>
36 <year>2009</year>
37
38 <holder>EDIT - European Distributed Institute of Taxonomy -
39 http://www.e-taxonomy.eu</holder>
40 </copyright>
41
42 <legalnotice>
43 <para>The contents of this file are subject to the Mozilla Public
44 License Version 1.1. See LICENSE.TXT at the top of this package for the
45 full license terms.</para>
46 </legalnotice>
47 </bookinfo>
48
49 <toc />
50
51 <preface id="preface">
52 <title>Preface</title>
53
54 <para>EDIT's Internet Platform for Cybertaxonomy is a distributed
55 computing platform that helps taxonomists do revisionary taxonomy and
56 taxonomic field work efficiently and expediently via the web. At the core
57 of the platform lies a common data model to enable interoperability
58 between the different components. The model describes all the commonly
59 used data that is dealt with in the platform, and therefore covers
60 taxonomic names and concepts; literature references; authors; (type)
61 specimen; structured descriptive data; molecular data; related (binary)
62 files such as images or compiled keys; controlled vocabularies and terms;
63 and species related content of any kind like economic use or conservation
64 status.</para>
65
66 <para>The cyberplatform consists of interoperable but independent
67 components. Platform components can take the form of software applications
68 (desktop or web-based) for human users or (web) services intended to be
69 used by other software applications. The platform as envisioned does not
70 have a single user interface or website; rather, it is a collection of
71 interacting components which may be combined and assembled according to
72 the task in hand. To facilitate the development of core CDM Applications
73 such as the CDM Community Server, the CDM Dataportals, and the Taxonomic
74 Editor, an implementation of the CDM has been created in the java
75 programming language. In addition to CDM model classes being modelled as
76 plain-old-java-objects (<link
77 xlink:href="http://en.wikipedia.org/wiki/Plain_Old_Java_Object">pojo's</link>),
78 a set of java components has been created that provide common services
79 across all java applications using the CDM. They serve as the basis of
80 core components of the Internet Platform for Cyberplatform and also allow
81 the development of other applications using the CDM by providing basic
82 functionality that can be extended for a particular purpose.</para>
83
84 <para>The CDM Library, as it is known, consists of four major modules that
85 can be used by any java application based on the CDM. These libraries are
86 used as the foundation of the Taxonomic Editor and the CDM Community
87 Server. In addition a web application (the CDM Community Server) is
88 documented here, as its components can be re-purposed or extended by other
89 web applications based on the CDM.</para>
90
91 <figure>
92 <title>An overview of the main CDM Components</title>
93
94 <mediaobject>
95 <imageobject role="html">
96 <imagedata fileref="resources/images/cdmlib-arch3.png" format="png" />
97 </imageobject>
98
99 <imageobject role="fo">
100 <imagedata contentwidth="160mm"
101 fileref="resources/images/cdmlib-arch3.png" format="png"
102 scalefit="1" />
103 </imageobject>
104
105 <caption>The overall architecture of the EDIT Internet platform for
106 Cybertaxonomy, showing the core components of the CDM Java Library,
107 and their use by desktop (Taxonomic Editor) and web-based (CDM
108 Dataportal, CATE) applications.</caption>
109 </mediaobject>
110 </figure>
111
112 <para>This reference documentation is aimed at anyone who would like to
113 understand the software components that make up the core of the
114 cyberplatform: the CDM Java Library and the CDM Server application. More
115 generic information about the applications that make up the cyberplatform,
116 information for end-users of specific applications, and information on the
117 EDIT project itself are beyond the scope of this document. More
118 information about EDIT can be found on the <link linkend="???">EDIT
119 website</link>, and more information on the specific software applications
120 produced by EDIT can be found on the <link linkend="???">Work Package 5
121 website</link>.</para>
122 </preface>
123
124 <part>
125 <title>Getting Started</title>
126
127 <partintro>
128 <para>This part of the reference documentation aims to provide simple
129 step-by-step instructions to enable application developers to start
130 using the CDM Java Library in their java application. To do this, we
131 will create a small toy application. The CDM Java Library is packaged
132 and published using the Apache Maven software project managment and
133 comprehension tool. To make life easier, we'll use maven to create our
134 application too. Assuming that Maven (2.0.x+) installed, we begin by
135 creating a new maven application (substituting the group id, artifact
136 id, and version of our application):</para>
137
138 <screen>mvn archetype:create -DgroupId=<emphasis>org.myproject</emphasis> -DartifactId=<emphasis>myapp</emphasis> -Dversion=<emphasis>1.0</emphasis></screen>
139
140 <para>The next step is to add the EDIT maven repository to your maven
141 <emphasis>project object model</emphasis> or <emphasis>pom</emphasis>
142 file, thus:</para>
143
144 <programlisting>. . .
145 &lt;repositories&gt;
146 &lt;repository&gt;
147 &lt;id&gt;EditRepository&lt;/id&gt;
148 &lt;url&gt;http://wp5.e-taxonomy.eu/cdmlib/mavenrepo/&lt;/url&gt;
149 &lt;/repository&gt;
150 &lt;/repositories&gt;
151 &lt;/project&gt;</programlisting>
152
153 <para>We also need to add the specific dependency that we would like our
154 project to include.</para>
155
156 <programlisting>. . .
157 &lt;dependencies&gt;
158 &lt;dependency&gt;
159 &lt;groupId&gt;eu.etaxonomy&lt;/groupId&gt;
160 &lt;artifactId&gt;cdmlib-services&lt;/artifactId&gt;
161 &lt;version&gt;1.1.1&lt;/version&gt;<!--ben: We will need to change this to reflect the new release once it is available-->
162 &lt;/dependency&gt;
163 &lt;/dependencies&gt;
164 &lt;repositories&gt;
165 . . .</programlisting>
166
167 <para>In most cases, application developers will wish to include the
168 cdmlib services (which include the data model and persistence layer
169 too). In some cases, developers might wish to use components from the
170 <package>cdmlib-io</package> and <package>cdmlib-remote</package>
171 packages too. New releases of the CDM Java Library are published in the
172 EDIT Maven Repository, and maven will download and use these artifacts
173 automatically if you change the version number of the dependency
174 specified in your pom file.</para>
175
176 <para>All that remains is to set up the cdmlib services within the
177 application context. The CDM Java Library is uses the Spring Framework
178 to manage its components. Whilst it is not mandatory to wire the CDM
179 services and DAOs using Spring, it is certainly easier to configure your
180 application this way. A minimal applicationContext.xml (placed in
181 <filename>src/main/resources</filename>) file might look like
182 this:</para>
183
184 <programlisting>&lt;import resource="classpath:/eu/etaxonomy/cdm/services.xml" /&gt;
185
186 &lt;bean id="dataSource"
187 lazy-init="true"
188 class="eu.etaxonomy.cdm.database.LocalHsqldb"
189 init-method="init"
190 destroy-method="destroy"&gt;
191 &lt;property name="driverClassName" value="org.hsqldb.jdbcDriver"/&gt;
192 &lt;property name="username" value="sa"/&gt;
193 &lt;property name="password" value=""/&gt;
194 &lt;property name="startServer" value="true"/&gt;
195 &lt;property name="silent" value="true"/&gt;
196 &lt;/bean&gt;
197
198 &lt;bean id="hibernateProperties"
199 class="org.springframework.beans.factory.config.PropertiesFactoryBean"&gt;
200 &lt;property name="properties"&gt;
201 &lt;props&gt;
202 &lt;prop key="hibernate.hbm2ddl.auto"&gt;create-drop&lt;/prop&gt;
203 &lt;prop key="hibernate.dialect"&gt;org.hibernate.dialect.HSQLDialect&lt;/prop&gt;
204 &lt;prop key="hibernate.cache.provider_class"&gt;org.hibernate.cache.NoCacheProvider&lt;/prop&gt;
205 &lt;/props&gt;
206 &lt;/property&gt;
207 &lt;/bean&gt;</programlisting>
208
209 <para>The first element imports the cdmlib service definitions. The two
210 other beans supply a data source and a properties object that the CDM
211 library uses to configure the hibernate session factory and connect to
212 the database. In this case, we're using an in-memory HSQL database, but
213 the CDM can be used with many other databases. The only thing left to do
214 is to start using the CDM services. In real applications, CDM services
215 may well be autowired into components using Spring or another dependency
216 injection mechanism. To keep this example simple, we'll initialize the
217 application context and obtain a service programatically.</para>
218
219 <programlisting>ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml");
220
221 INameService nameService = (INameService)context.getBean("nameServiceImpl");
222
223 BotanicalName botanicalName = BotanicalName.NewInstance(Rank.SPECIES());
224 botanicalName.setGenusOrUninomial("Arum");
225 botanicalName.setSpecificEpithet("maculatum");
226 UUID uuid = nameService.saveTaxonName(botanicalName);<!--ben: Again, this example reflects the 1.1.1 release, and will need to be changed slightly once the new release is available-->
227
228 System.out.println("Saved \'Arum maculatum\' under uuid " + uuid.toString());</programlisting>
229
230 <para>In this simple example, we've covered the basics of using the CDM
231 Java Library. We created a simple maven project, and added the
232 repository and a single dependency to our pom file. We then created a
233 simple application context that used the default CDM configuration, and
234 specified a couple of objects that allowed the CDM to connect to a
235 database. Finally we initialized these services by loading the
236 application context, and then retrieved a specific service, and used it
237 to persist a new taxonomic name.</para>
238 </partintro>
239 </part>
240
241 <part>
242 <title>Common Data Model</title>
243
244 <partintro>
245 <para>The Common Data Model (CDM) is the domain model for the core EDIT
246 cyberplatform components. The CDM is primarily based on the <link
247 linkend="???"
248 xlink:href="http://wiki.tdwg.org/twiki/bin/view/TAG/LsidVocs">TDWG
249 Ontology</link> and in most cases there is concordance with relevant
250 TDWG standards such as <link linkend="???"
251 xlink:href="http://www.tdwg.org/standards/117/">Taxon Concept Transfer
252 Schema (TCS)</link>, <link linkend="???"
253 xlink:href="http://www.tdwg.org/standards/117/">Structured Descriptive
254 Data (SDD)</link> and <link linkend="???"
255 xlink:href="http://www.tdwg.org/standards/115/">Access to Biological
256 Collections Data (ABCD)</link>.</para>
257
258 <para>The CDM differs from the TDWG standards in its purpose: it is
259 intended to serve as the basis of software applications in the
260 cyberplatform (e.g. the taxonomic editor, the CDM Dataportals) rather
261 than being a standard for data exchange between any resource containing
262 biodiversity information. Whilst it is certainly possible to exchange
263 data as CDM domain objects serialized as XML or JSON (the CDM Server and
264 the CDM Dataportals do this), the common data model is not intended to
265 replace existing TDWG standards as a general purpose exchange standard.
266 It is possible to convert data held in a CDM store into a relevant TDWG
267 standard for exchange and in some cases this may be the desired route
268 for data held in the CDM (e.g. for exchange with an application that is
269 not part of the cyberplatform, but which is capable of understanding
270 data in a TDWG standard).</para>
271
272 <para>Thus the CDM is intended for use as</para>
273
274 <itemizedlist>
275 <listitem>
276 <para>A domain model for applications, particularly those that
277 enable taxonomists to do revisionary taxonomy and taxonomic field
278 work</para>
279 </listitem>
280
281 <listitem>
282 <para>A standard for exchange between applications that are part of
283 the EDIT Internet Platform for Cybertaxonomy</para>
284 </listitem>
285 </itemizedlist>
286
287 <para>In terms of scope, the CDM covers information core to the vision
288 of the cyberplatform i.e. descriptive and revisionary taxonomy,
289 including taxonomic fieldwork :-</para>
290
291 <itemizedlist>
292 <listitem>
293 <para>Taxonomic names and nomenclature, typification</para>
294 </listitem>
295
296 <listitem>
297 <para>Taxonomic concepts and relationships between accepted names
298 and synonyms, including the placement of the same taxonomic concept
299 in different taxonomic hierarchies.</para>
300 </listitem>
301
302 <listitem>
303 <para>Specimens and Observations of individual organisms, their
304 collection, location, processing and taxonomic determination.</para>
305 </listitem>
306
307 <listitem>
308 <para>Structured and unstructured information about names, taxa, and
309 specimens.</para>
310 </listitem>
311 </itemizedlist>
312
313 <para>In addition to this core area, the CDM covers some related domains
314 that are important:-</para>
315
316 <itemizedlist>
317 <listitem>
318 <para>Literature</para>
319 </listitem>
320
321 <listitem>
322 <para>People, teams of people and institutions in various roles
323 (i.e. as authors, collectors, artists, rights holders etc)</para>
324 </listitem>
325
326 <listitem>
327 <para>Media (images, video and audio files, plus more
328 taxonomy-specific media such as phylogenies and compiled
329 keys)</para>
330 </listitem>
331
332 <listitem>
333 <para>Molecular data, such as DNA sequences and loci</para>
334 </listitem>
335 </itemizedlist>
336
337 <para>As you might expect, there are also a number of data entities
338 representing controlled vocabularies, identity of users (and their roles
339 and permissions), and ancillary data common to all major classes such as
340 multilingual text content, annotations and markers.</para>
341
342 <figure>
343 <title>A UML Package diagram showing the CDM packages and their
344 members.</title>
345
346 <mediaobject>
347 <imageobject role="html">
348 <imagedata fileref="resources/images/ModelOverview20.gif" />
349 </imageobject>
350
351 <imageobject role="fo">
352 <imagedata contentwidth="160mm"
353 fileref="resources/images/ModelOverview20.gif"
354 scalefit="1" />
355 </imageobject>
356 </mediaobject>
357 </figure>
358 </partintro>
359
360 <xi:include href="base-classes.xml" />
361
362 <xi:include href="annotation-and-markers.xml" />
363
364 <!--<xi:include href="extensions.xml" />-->
365
366 <xi:include href="identifiable-entities.xml" />
367
368 <!--
369 ben: I think that some explaination of how the CDM deals with core
370 data classes, would be really useful here. In some cases, we're still
371 trying to understand how it should work and in that case it might
372 still be useful to have a straw-man that people can disagree with or
373 improve.
374 -->
375
376 <!--<xi:include href="taxonomic-names.xml" />-->
377
378 <!--<xi:include href="taxonomic-concepts.xml" />-->
379
380 <!--<xi:include href="specimens-and-observations.xml" />-->
381
382 <!--<xi:include href="descriptive-data.xml" />-->
383
384 <!--<xi:include href="terms-and-vocabularies.xml" />-->
385
386 <!-- ben: We'll need to touch on _where_ these external files live . . .-->
387
388 <!--<xi:include href="media.xml" />-->
389
390 <!-- ben: I'll include something about validation once I start
391 work on the validation framework next month
392 -->
393
394 <!--<xi:include href="validation.xml" />-->
395 </part>
396
397 <part>
398 <title>Persistence Layer</title>
399
400 <partintro>
401 <para>Even the most basic of taxonomic applications have a requirement
402 for users to be able to save the information that they create. In
403 addition, a common component of taxonomic applications is the use of a
404 database to provide users with the ability to filter or search their
405 data in one way or another. Some applications will require more advanced
406 functionality, such as auditing or versioning of data. All of this logic
407 is contained in the persistence layer, providing clean separation
408 between data access and more taxonomy-centric business logic in the
409 service layer.</para>
410
411 <para>Persistence is not a simple problem to solve, especially in
412 application developed in Object-Oriented languages, with large amounts
413 of data, or with many users accessing data at the same time. The CDM
414 Library uses the <link
415 xlink:href="http://www.hibernate.org">Hibernate</link> object/relational
416 persistence and query service as the basis of its persistence layer.
417 Several member projects of the Hibernate stable, including <link
418 xlink:href="http://annotations.hibernate.org">Hibernate
419 Annotations</link>, <link
420 xlink:href="http://search.hibernate.org">Hibernate Search</link> and
421 <link linkend="http://jboss.org/envers/">Hibernate Envers</link> (part
422 of Hibernate Core) provide the basis of the more advanced
423 persistence-related functionality in the CDM Library. As a consequence
424 some of the behaviour of the CDM Library is constrained by the
425 underlying ORM technology. The advantage of using an ORM is that the
426 same software can be used with multiple database systems with (almost)
427 no changes to the application. Currently the CDM Library has been tested
428 with (version numbers &amp; platforms in brackets)</para>
429
430 <!--I don't know how many of these have been tested, on which platforms, but it would be good to include some measure of which platform / database combinations
431 have been used and how, so that potential users can evaluate the technology. In an ideal world, we would pick some databases as "supported" and ensure that
432 the test suite runs on that platform / db combination (i.e. you don't release until the tests pass). For the others, we still might want to say: "We tested
433 the CDM on this platform and it seemed to work".-->
434
435 <itemizedlist>
436 <listitem>
437 <para>IBM <link
438 xlink:href="http://www.ibm.com/software/data/db2/">DB2</link></para>
439 </listitem>
440
441 <listitem>
442 <para><link xlink:href="???">H2</link> (default local database used
443 by the Taxonomic Editor, 1.0.73)</para>
444 </listitem>
445
446 <listitem>
447 <para>
448 <link xlink:href="http://hsqldb.org">HSQLDB</link>
449 </para>
450 </listitem>
451
452 <listitem>
453 <para><link xlink:href="http://www.mysql.com">MySQL</link> (4.1.20:
454 linux; 5.1.32: windows)</para>
455 </listitem>
456
457 <listitem>
458 <para>
459 <link xlink:href="???">ODBC</link>
460 </para>
461 </listitem>
462
463 <listitem>
464 <para>
465 <link
466 xlink:href="http://www.oracle.com/database/index.html">Oracle
467 Database 11<emphasis>g</emphasis></link>
468 </para>
469 </listitem>
470
471 <listitem>
472 <para>
473 <link xlink:href="http://www.postgresql.org/">PostgreSQL</link>
474 </para>
475 </listitem>
476
477 <listitem>
478 <para>
479 <link xlink:href="???">Microsoft SQL Server 2000</link>
480 </para>
481 </listitem>
482
483 <listitem>
484 <para>
485 <link linkend="???"
486 xlink:href="http://www.microsoft.com/sqlserver/2005/">Microsoft
487 SQL Server 2005</link>
488 </para>
489 </listitem>
490
491 <listitem>
492 <para>
493 <link linkend="???" xlink:href="http://www.sybase.co.uk/">Sybase
494 Advantage Database Server</link>
495 </para>
496 </listitem>
497 </itemizedlist>
498
499 <para>In theory, application developers should not need to use the
500 persistence layer directly, but should instead use the <link
501 linkend="api">API</link>, which provides a <emphasis>facade</emphasis>
502 over the persistence layer and extra business logic that most
503 applications using the CDM will require.</para>
504 </partintro>
505
506 <xi:include href="basic-persistence.xml" />
507
508 <xi:include href="versioning.xml" />
509
510 <xi:include href="free-text-search.xml" />
511 </part>
512
513 <part xml:id="api">
514 <title>API Methods</title>
515
516 <partintro>
517 <para>Apart from the Common Data Model classes themselves, the CDM
518 Service layer contains the components most likely to be used directly by
519 applications based upon the CDM Java Library. This layer contains a set
520 of basic service objects that can be used as a facade over the
521 persistence logic.</para>
522 </partintro>
523
524 <xi:include href="service.xml" />
525
526 <!--<xi:include href="application-controller.xml" />-->
527
528 <!--<xi:include href="transactions.xml" />-->
529
530 <xi:include href="guid-resolution.xml" />
531
532 <xi:include href="security.xml" />
533 </part>
534
535 <part>
536 <title>CDM Input / Output Layer</title>
537
538 <partintro>
539 <para>This part describes the input output routines:</para>
540 </partintro>
541
542 <!--<xi:include href="base-io-usage.xml" />-->
543
544 <!--<xi:include href="cdm-xml-input-output.xml" />-->
545
546 <!--<xi:include href="abcd-input-output.xml" />-->
547
548 <!--<xi:include href="berlinmodel-input-output.xml" />-->
549
550 <!--<xi:include href="excel-input-output.xml" />-->
551
552 <!--<xi:include href="sdd-input-output.xml" />-->
553
554 <!--<xi:include href="taxonx-input-output.xml" />-->
555
556 <!--<xi:include href="tcsrdf-input-output.xml" />-->
557
558 <!--<xi:include href="tcsxml-input-output.xml" />-->
559 </part>
560
561 <part>
562 <title>CDM Server</title>
563
564 <partintro>
565 <para>This part describes the cdm-server application:</para>
566 </partintro>
567
568 <!--<xi:include href="cdm-server.xml" />-->
569
570 <!--<xi:include href="instalation.xml" />-->
571
572 <!--<xi:include href="configuration.xml" />-->
573 </part>
574 </book>