Project

General

Profile

Preparatory meeting, Berlin, 9th December 2005.

List of attendants:

Minutes

The minutes are a bit loose at some points. Please feel free to correct them if you think its wrong!

MJ: NHM has a couple of projects publishing taxonomic data online, like Lepidoptera. But with no integration/collaboration aspects.

S: Idea is to develop tools being maintained for quite some time (>5y) by various institutions?

W: Yes. Users are taxonomists only. Platform of components. Documentation of entire workprocess of revisionary taxonomy.

B: What scope? Are standards (introduction to taxonomists) part of the work?

W: As TDWG chair for sure yes! Its important to tell taxonomists about standards like TDWG. At least a best practice documentation.

MJ: How to get there?

W: by modelling & via intense communication with users (=taxonomists)

MJ: So we will try to change users workprocesses (WP)?

W: yes. Every accepted introduced software influences WPs. Ultimate aim is to speed up taxonomic WP. Its an major move to eScience.

B: Modelling diverse processes is 1 thing. But getting taxonomists on board and to decide on 1 one to do things gets very difficult. So not only passive modelling is desired.

MJ: So there'll bea big thinking/design phase after modelling.

W/MJ/S: we need a killer app to prove usefulness and proliferate acceptance of the platform. online distribution map generator could be one

W: modelling not a single app. Finding commonalities by modelling is the goal.

S: MISSING

MJ: We should not make too many assumptions, but rather try to identify all diversity of processes we can find

MD: A general questions about scope. what biological domain are we talking about? botany, zoology, microorganisms, paleaontology, viruses ... all of them?

W: initially higher plants and entomology (Compositae & Lepidoptera). later it'll be extended

ALL: Modellers should have good modelling + communication/social skills to conduct interviews with taxonomists. Some tax. knowledge would be good, but not required.

E: How many & which taxonomists are questioned?

MJ: We should chose taxonomists by

  • Just done a revision. SO its fresh in their heads

  • diversity of fields

  • high productivity & quality to understand efficiency

B: output might be a text so a start could be existing text books like Davis & Heywood, Principles of Angiosperm Taxonomy (1963)

E: and Stuessy, Plant Taxonomy (1990)

S: Taxonomists that just did their 1st revision could be interesting

W: go through WP Descr of Work as it is still time to change workplans

Conclusions: 
 IT Committee responsible to take major descisions, 
    todo an inventory on existing standards,infrastructur, ...
    and to meet frequently

5.2 Modelling in detail

5.2.1 initial interviews & identification of non taxonomy "agents" for interviews

5.2.2 in depth modelling mainly of inventories (monitoring is well covered by others already)

5.2.4 Tools for group revisions acceptable to taxonomists, esp. with litlle IT skills.

R: Wikis are currently everywhere in eScience...

E: many IT tools are not accepted by taxonomists.

S: why are taxonomists slow right now? Because they spend their time on different things in meetings, they are too inefficient, ...

W: bibliography and specimen access is surely a bottleneck

E: Flora of china is a good example. They are very fast, have preliminary results on the web

W: we should interview the projects leaders!

B: human organisation is curcial for large projects

W: how to do modelling technically?

A: we tried several UML tools with no big experience. At the end 6 were tried:

  • RationalRose, expensive

  • Objective, no UML2

  • MS Visio, not 2005, code generation only for .NET framework, good intuive models

  • UModel, Altova

  • Poseidon, Java, slow, open source

  • EnterpriseArchitect, full UML2, ~150$

S: we used SystemArchitect in kew

W: money is not crucial

MJ: Can we confirm that we want UML? though I guess its foolish not to given its market position and the skills available. We used SSADM, predating UML, in Kew before. data flow is well supported.

A: worklflow models is what we need. UML is better at that.

R: What exactly should be done with the models? If its only documentation Visio is great. Is code generation needed? only for interfaces?

Another alternative to look at is BPEL - Business Process Execution Language - supported by Oracle,IBM,Sun,Microsoft

Webservice origins. models activities and workflows for business processes.

Look at http://www.oracle.com/technology/products/ias/bpel/index.html for all the BPEL stuff as Oracle sees it.

Look at http://www-128.ibm.com/developerworks/library/specification/ws-bpel for the BPEL specification.

MJ: Are human physical actions supported by BPEL? like making notes, read things, physically do something?

W: does BPEL have diagramms? can UML be produced from it?

MJ: UML activity diagrams were found easy to understand by taxonomists.

MD: diagrams in general are useful for understanding. dont have to be UML.

R: is MDA (model driven architecture) requested? any code generation at all?

S: collaboration needs a) diagrams, b) interface specifications

W/M: We want to use the same tools as exchange of UML between different applications is a nightmare.

R: are all processes modelled or only best practices?

W: 1st try many, 2ns synthetic approach, 3rd selecting major ones, cutting down alternatives

W: model tool output should:

  • easy to change

  • intuitive to non technical people

  • OO approach

  • code generation debatable yet.

M: what interfaces? webservcies or Java API? modelling language depends on this.

S: survey of existing tools needed.

B: to me the process modelling is the focus & its documentation. not the code.

R: BPEL not for data modelling. then use XML schema. BPEL is for data/workflow, synchronisation, interfaces.

MJ: can it model whats going on inside the boxes behind the interfaces?

R: better use UML

S: we need more criteria

MD: do we want entire model to be "covered" with code or only parts?

W: no. only parts

MD: so we cold have fine grained modelling incl code generation for some areas, and no code for others.

B: MISSING

R: what is the integration technology? java RMI, webservices, ...

MJ/S: webservcies will be needed at one point. also to be interoparble in future.

B: seems to me like 2 outputs we are creating:

  • accurate description of business with alternative methods

  • environment, workbench

W: I see only one

MJ: we have 2 documents at least. even if we create a synthesys of the existing 1st survey diagram, the latter needs to be retained to document the existing processes, "what is".

W: reusable (complex) components are useful to identify common components.

R: clarify terminology:

*

  • prescriptive models prescribe implementations - like blueprints.

  • descriptive models describe existing states, like the current taxonomists workflows

  • process & data. UML is for data. to specify when a process can be substituted by another process is extremely difficult. No existing specification for this. Computational theory is working on that: BSimulation. no formal basis yet.

W: all work on minutes is a draft. everyone to look at them and Enterprise Architect to see what is possible

M: Set up mailing list to get BPEL links from robert

Development Issues

S: Kew is using: Linux,Sybase (for transactional systems) & [MySQL], Java, Python, Apache+Tomcat, CVS, Ant, Bugzilla

M: BGBM is using: Linux+Win, SQL Server + postgres + mysql, Java, Python, VB, ColdFusion, SUbversion, Tomcat. No build system yet, Ant would be OK.

R: whats the architecture vision? 1 server?

W: scalable

S: offline tools as well?

W: yes. for fieldwork for example

W/S: Data is held by researchers. as Distributed as possible.

S: a web portal then like a bookshop? link collections, downloadable checklists, authority files, tools

W: Documentation. collaborative tools. best practices. Soon a splash in the community is needed - geo map tool.

S: lots of little batch tools

M: or full workflow environment with services like biomoby + Taverna?

MJ: See what happens. Core parts could be full workflow, others not.

S: login required?

B: get in touch with taxonomists fairly early.

W: article in taxon early. Mail to taxacom list in month 3 would be good.

E: iportant to integrate people who teach taxonomy. spread the word.

W: involve them at a later state

MJ: teaching WP should do that.

W: questionaires dont work. personal interviews preferred, but specific questions could be done via questionaires. Read exising user surveys, eg biocase, synthesys (edinburgh, m.watson, m.pullan); jesse kennedy for TCS.

S: would love mysql

W: tomcat OK, SVN/CVS to decide. what skills are required by developers?

S: no job interviews at kew without money

W: EU signs last. project start could be days before you get the approval. 1st january not realistic, after consortium agreement is signed it could start at any moment. So job interviews cold start then.

Developer requirements?

MG:

Kew   16PM starting month3 with modeler in rev.taxonomy [for the 1st 18month]

total 58PM


FU   27?PM 

BGBM 1 coordinator start mth4

     2,5 developer start mth13


Budapest 10PM, modeler

Stuttgart 1 developer, start mth13 -> inventory

S: kew modeller rather senior, project manager tasks involved.

W: remember there are 2 models (inventory/revisionary tax) to be integrated.

MJ: We need technical and team leadership for the Kew postholder to come from one place, and the logical place for this is Berlin. The Kew software development unit will of course give support to the Kew postholder, but the postholder should work with the team managed from Berlin.

S: technical team leader in berlin

W: project management in BGBM besides technical team

PR gets an own position hopefully, maybe in france

Skills from develpoers: java, mysql are core:

  • 1 UI developer with JSP, CSS, graphics

  • 1 backend techie

python for batch tools and glue scripts.

S: travel budget? possible for programmers to code together for 1 week?

W: guesthouses available at kew? active exchanging

S: no. only B&B

MD/S/W: Every week a skype conference to sync work. For diagram discussions personal meeting proves better.

W: IPR. kew policies?

MJ: No general policies. data IPR yes, but code no

R: 1st rule to keep track of used licenses. mix of commercial & open licenses nasty

S: license with least overhead should be used.

ALL: agreed to use Mozilla Public License if used libs allow it.

W: Use of subversion instead of CVS agreed.

S: single repository with (Rsync) backup. SVN ok for a new project

W: someone probably wants a wiki. where to host it? we set one up in berlin.

M: what about outsurcing? Wiki, SVN, backup, bugtracker is reliably available for little money:

a free subversion / trac service
https://opensvn.csie.org/

commercial services:
https://secure.cvsdude.org/
http://wush.net/subversion.php

larger list of available svn services:
http://weblogs.asp.net/fmarguerie/archive/2005/04/27/404793.aspx

W: put minutes on a wiki for collaborative editing. 10 meetings per year expected.

Add picture from clipboard (Maximum size: 40 MB)