source: trunk/doc/src/docbook/developerdoc/api_overview.xml @ 3835

Last change on this file since 3835 was 3835, checked in by Nicklas Nordborg, 15 years ago

References #721. Updated documentation.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 47.3 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE chapter PUBLIC
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
5<!--
6  $Id: api_overview.xml 3835 2007-10-15 12:26:21Z nicklas $
7
8  Copyright (C) 2007 Peter Johansson, Nicklas Nordborg, Martin Svensson
9
10  This file is part of BASE - BioArray Software Environment.
11  Available at http://base.thep.lu.se/
12
13  BASE is free software; you can redistribute it and/or
14  modify it under the terms of the GNU General Public License
15  as published by the Free Software Foundation; either version 2
16  of the License, or (at your option) any later version.
17
18  BASE is distributed in the hope that it will be useful,
19  but WITHOUT ANY WARRANTY; without even the implied warranty of
20  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
21  GNU General Public License for more details.
22
23  You should have received a copy of the GNU General Public License
24  along with this program; if not, write to the Free Software
25  Foundation, Inc., 59 Temple Place - Suite 330,
26  Boston, MA  02111-1307, USA.
27-->
28
29<chapter id="api_overview">
30  <?dbhtml dir="api"?>
31  <title>API overview (how to use and code examples)</title>
32
33  <sect1 id="api_overview.public_api">
34    <title>The Public API of BASE</title>
35   
36    <para>
37      Not all public classes and methods in the <filename>BASE2Core.jar</filename>
38      and other JAR files shipped with BASE are considered as
39      <emphasis>Public API</emphasis>. This is important knowledge
40      since we will always try to maintain backwards compatibility
41      for classes that are part of the public API. For other
42      classes, changes may be instroduced at any time without
43      notice or specific documentation. In other words:
44    </para>
45   
46    <note>
47      <title>Only use the public API when developing plug-ins</title>
48      <para>
49        This will maximize the chance that you plug-in will continue
50        to work with the next BASE release. If you use the non-public API
51        you do so at your own risk.
52      </para>
53    </note>
54   
55    <para>
56      See the <ulink url="http://base.thep.lu.se/chrome/site/doc/api/index.html"
57        >javadoc</ulink> for information about
58      what parts of the API that contributes to the public API.
59      Methods, classes and other elements that have been tagged as
60      <code>@deprecated</code> should be considered as part of the internal API
61      and may be removed in a subsequent relase without warning.
62    </para>
63   
64    <para>
65      See <xref linkend="appendix.incompatible" /> to read more about
66      changes that have been introduced by each release.
67    </para>
68
69    <sect2 id="api_overview.compatibility">
70      <title>What is backwards compatibility?</title>
71     
72      <para>
73        There is a great article about this subject on <ulink 
74        url="http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs"
75          >http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs</ulink>.
76        This is what we will try to comply with. If you do not want to
77        read the entire article, here are some of the most important points:
78      </para>
79     
80     
81      <sect3 id="api_overview.compatibility.binary">
82        <title>Binary compatibility</title>
83        <para>
84        <blockquote>
85          Pre-existing Client binaries must link and run with new releases of the
86          Component without recompiling.
87        </blockquote>
88       
89        For example:
90        <itemizedlist>
91        <listitem>
92          <para>
93            We cannot change the number or types of parameters to a method
94            or constructor.
95          </para>
96        </listitem>
97        <listitem>
98          <para>
99            We cannot add or change methods to interfaces that are intended
100            to be implemented by plug-in or client code.
101          </para>
102        </listitem>
103        </itemizedlist>
104        </para>       
105      </sect3>
106     
107      <sect3 id="api_overview.compatibility.contract">
108        <title>Contract compatibility</title>
109        <para>
110          <blockquote>
111          API changes must not invalidate formerly legal Client code.
112          </blockquote>
113       
114          For example:
115          <itemizedlist>
116          <listitem>
117            <para>
118              We cannot change the implementation of a method to do
119              things differently than before. For example, allow <constant>null</constant>
120              as a return value when it was not allowed before.
121            </para>
122          </listitem>
123          </itemizedlist>
124       
125          <note>
126            <para>
127            Sometimes there is a very fine line between what is considered a
128            bug and what is considered a feature. For example, if the
129            actual implementation does not do what the javadoc says,
130            do we change the code or do we change the documentation?
131            This has to be considered from case to case and depends on
132            the age of the code and if we expect plug-ins and clients to be
133            affected by it or not.
134            </para>
135          </note>
136        </para>
137      </sect3>
138     
139      <sect3 id="api_overview.compatibility.source">
140        <title>Source code compatibility</title>
141        <para>
142        This is not an important matter and is not always possible to
143        achieve. In most cases, the problems are easy to fix.
144        Example:
145       
146        <itemizedlist>
147        <listitem>
148          <para>
149          Adding a class may break a plug-in or client that import
150          classes with <constant>.*</constant> if the same class name
151          exists in another package.
152          </para>
153        </listitem>
154        </itemizedlist>
155        </para>
156      </sect3>
157    </sect2>
158  </sect1>
159
160  <sect1 id="api_overview.data_api" chunked="1">
161    <title>The database schema and the Data Layer API</title>
162
163    <para>
164      This section gives an overview of the entire data layer API.
165      The figure below show how different modules relate to each other.
166    </para>
167   
168    <note>
169      All information has not yet been transfered from the old documentation.
170      The old documentation is available at
171      <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/data/index.html"
172        >http://base.thep.lu.se/chrome/site/doc/development/overview/data/index.html</ulink>
173    </note>
174   
175    <figure id="data_api.figures.overview">
176      <title>Data layer overview</title>
177      <screenshot>
178        <mediaobject>
179          <imageobject>
180            <imagedata 
181              fileref="figures/uml/datalayer.overview.png" format="PNG" />
182          </imageobject>
183        </mediaobject>
184      </screenshot>
185    </figure>
186
187    <sect2 id="data_api.basic">
188      <title>Basic classes and interfaces</title>
189     
190      <para>
191        This document contains information about the basic classes and interfaces in this package.
192        They are important since all data-layer classes must inherit from one of the already
193        existing abstract base classes or implement one or more of the
194        existing interfaces. They contain code that is common to all classes,
195        for example implementations of the <methodname>equals()</methodname>
196        and <methodname>hashCode()</methodname> methods or how to link with the owner of an
197        item.
198      </para>
199     
200      <sect3 id="data_api.basic.uml">
201        <title>UML diagram</title>
202       
203        <figure id="data_api.figures.basic">
204          <title>Basic classes and interfaces</title>
205          <screenshot>
206            <mediaobject>
207              <imageobject>
208                <imagedata 
209                  fileref="figures/uml/datalayer.basic.png" format="PNG" />
210              </imageobject>
211            </mediaobject>
212          </screenshot>
213        </figure>
214      </sect3>
215     
216      <sect3 id="data_api.basic.classes">
217        <title>Classes</title>
218       
219        <variablelist>
220        <varlistentry>
221          <term><classname>BasicData</classname></term>
222          <listitem>
223            <para>
224            The root class. It overrides the <methodname>equals()</methodname>,
225            <methodname>hashCode()</methodname> and <methodname>toString()</methodname> methods
226            from the <classname>Object</classname> class. It also defines the
227            <varname>id</varname> and <varname>version</varname> properties.
228            All data layer classes must inherit from this class or one of it's subclasses.
229            </para>
230          </listitem>
231        </varlistentry>
232       
233        <varlistentry>
234          <term><classname>OwnedData</classname></term>
235          <listitem>
236            <para>
237            Extends the <classname>BasicData</classname> class and adds
238            an <varname>owner</varname> property. The owner is a required link to a
239            <classname>UserData</classname> object, representing the user that
240            is the owner of the item.
241            </para>
242          </listitem>
243        </varlistentry>
244
245        <varlistentry>
246          <term><classname>SharedData</classname></term>
247          <listitem>
248            <para>
249            Extends the <classname>OwnedData</classname> class and adds
250            properties (<varname>itemKey</varname> and <varname>projectKey</varname>)
251            that holds access permission information for an item.
252            Access permissions are held in <classname>ItemKeyData</classname> and/or
253            <classname>ProjectKeyData</classname> objects. These objects only exists if
254            the item has been shared.
255            </para>
256          </listitem>
257        </varlistentry>
258
259        <varlistentry>
260          <term><classname>CommonData</classname></term>
261          <listitem>
262            <para>
263            This is a convenience class for items that extends the <classname>SharedData</classname>
264            class and implements the <interfacename>NameableData</interfacename> and
265            <interfacename>RemoveableData</interfacename> interfaces. This is one of
266            the most common situations.
267            </para>
268          </listitem>
269        </varlistentry>
270
271        <varlistentry>
272          <term><classname>AnnotatedData</classname></term>
273          <listitem>
274            <para>
275            This is a convenience class for items that can be annotated.
276            Annotations are held in <classname>AnnotationSetData</classname> objects.
277            The annotation set only exists if annotations has been created for the item.
278            </para>
279          </listitem>
280        </varlistentry>
281        </variablelist>
282       
283      </sect3>
284     
285      <sect3 id="data_api.basic.interfaces">
286        <title>Interfaces</title>
287       
288        <variablelist>
289        <varlistentry>
290          <term><classname>IdentifiableData</classname></term>
291          <listitem>
292            <para>
293            All items are identifiable, which means that they have a unique <varname>id</varname>.
294            The id is unique for all items of a specific type (ie. class). The id is number
295            that is automatically generated by the database and has no other meaning
296            outside of the application. The <varname>version</varname> property is used for
297            detecting and preventing concurrent modifications to an item.
298            </para>
299          </listitem>
300        </varlistentry>
301       
302        <varlistentry>
303          <term><classname>OwnableData</classname></term>
304          <listitem>
305            <para>
306            An ownable item is an item which has an owner. The owner is represented as a
307            required link to a <classname>UserData</classname> object.
308            </para>
309          </listitem>
310        </varlistentry>       
311
312        <varlistentry>
313          <term><classname>ShareableData</classname></term>
314          <listitem>
315            <para>
316            A shareable item is an item which can be shared to other users, groups or projects.
317            Access permissions are held in <classname>ItemKeyData</classname> and/or
318            <classname>ProjectKeyData</classname> objects.
319            </para>
320          </listitem>
321        </varlistentry>
322             
323        <varlistentry>
324          <term><classname>NameableData</classname></term>
325          <listitem>
326            <para>
327            A nameable item is an item that has a name (required) and a description
328            (optional). The name doesn't have to be unique, except in a few special
329            cases (for example, the name of a file).
330            </para>
331          </listitem>
332        </varlistentry>
333       
334        <varlistentry>
335          <term><classname>RemovableData</classname></term>
336          <listitem>
337            <para>
338            A removable item is an item that can be flagged as removed. This doesn't
339            remove the information about the item from the database, but can be used by
340            client applications to hide items that the user is not interested in.
341            A trashcan function can be used to either restore or permanently
342            remove items that has the flag set.
343            </para>
344          </listitem>
345        </varlistentry>
346               
347        <varlistentry>
348          <term><classname>SystemData</classname></term>
349          <listitem>
350            <para>
351            A system item is an item which has an additional id in the form of string. A system id
352            is required when we need to make sure that we can get a specific item without
353            knowing the numeric id. Example of such items are the root user and the everyone group.
354            A system id is generally constructed like:
355            <constant>net.sf.basedb.core.User.ROOT</constant>. The system id:s are defined in the
356            core layer by each item class.
357            </para>
358          </listitem>
359        </varlistentry>
360
361        <varlistentry>
362          <term><classname>DiskConsumableData</classname></term>
363          <listitem>
364            <para>
365            This interface is used by items which occupies a lot of disk space and
366            should be part of the quota system, for example files. The required
367            <classname>DiskUsageData</classname> contains information about the size,
368            location, owner etc. of the item.
369            </para>
370          </listitem>
371        </varlistentry>
372       
373        <varlistentry>
374          <term><classname>AnnotatableData</classname></term>
375          <listitem>
376            <para>
377            This interface is used by items which can be annotated. Annotations are name/value
378            pairs that are attached as extra information to an item. All annotations are
379            contained in an <classname>AnnotationSetData</classname> object.
380            </para>
381          </listitem>
382        </varlistentry>
383       
384        <varlistentry>
385          <term><classname>ExtendableData</classname></term>
386          <listitem>
387            <para>
388            This interface is used by items which can have extra administrator-defined
389            columns. The functionality is similar to annotations. It is not as flexible,
390            since it is a global configuration, but has better performance. BASE will
391            generate extra database columns to store the data in the tables for items that
392            can be extended.
393            </para>
394          </listitem>
395        </varlistentry>
396       
397        <varlistentry>
398          <term><classname>BatchableData</classname></term>
399          <listitem>
400            <para>
401            This interface is a tagging interface which is used by items that needs batch
402            functionality in the core.
403            </para>
404          </listitem>
405        </varlistentry>
406        </variablelist>
407
408      </sect3>
409    </sect2>
410   
411    <sect2 id="data_api.authentication">
412      <title>User authentication and access control</title>
413     
414      <para>
415         This section gives an overview of user authentication and
416         how groups, roles and projects are used for access control
417         to items.
418      </para>
419     
420      <sect3 id="data_api.authentication.uml">
421        <title>UML diagram</title>
422       
423        <figure id="data_api.figures.authentication">
424          <title>User authentication and access control</title>
425          <screenshot>
426            <mediaobject>
427              <imageobject>
428                <imagedata 
429                  fileref="figures/uml/datalayer.authentication.png" format="PNG" />
430              </imageobject>
431            </mediaobject>
432          </screenshot>
433        </figure>
434      </sect3>
435     
436      <sect3 id="data_api.authentication.users">
437        <title>Users and passwords</title>     
438     
439        <para>
440          The <classname>UserData</classname> class holds information about users.
441          We keep the passwords in a separate table and use proxies to avoid loading
442          password data each time a user is loaded to minimize security risks. It is
443          only if the password needs to be changed that the <classname>PasswordData</classname>
444          object is loaded. The one-to-one mapping between user and password is controlled
445          by the password class, but a cascade attribute on the user class makes sure
446          that the password is deleted when a user is deleted.
447        </para>
448      </sect3>
449
450      <sect3 id="data_api.authentication.groups">
451        <title>Groups, roles and projects</title>     
452     
453        <para>
454          The <classname>GroupData</classname>, <classname>RoleData</classname> and
455          <classname>ProjectData</classname> classes holds information about groups, roles
456          and projects respectively. A user may be a member of any number of groups,
457          roles and/or projects. The membership in a project comes with an attached
458          permission values. This is the highest permission the user has in the
459          project. No matter what permission an item has been shared with the
460          user will not get higher permission. Groups may be members of other groups and
461          also in projects.
462        </para>
463       
464      </sect3>
465     
466      <sect3 id="data_api.authentication.keys">
467        <title>Keys</title>     
468     
469        <para>
470          The <classname>KeyData</classname> class and it's subclasses
471          <classname>ItemKeyData</classname>, <classname>ProjectKeyData</classname> and
472          <classname>RoleKeyData</classname>, are used to store information about access
473          permissions to items. To get permission to manipulate an item a user must have
474          access to a key giving that permission. There are three types of keys:
475        </para>
476       
477        <variablelist>
478        <varlistentry>
479          <term><classname>ItemKey</classname></term>
480          <listitem>
481            <para>
482            Is used to give a user or group access to a specific item. The item
483            must be a <interfacename>ShareableData</interfacename> item.
484            The permissions are usually set be the owner of the item. Once created an
485            item key cannot be changed. This allows the core to reuse a key if the
486            permissions match exactly, ie. for a given set of users/groups/permissions
487            there can be only one item key object.
488            </para>
489          </listitem>
490        </varlistentry>
491
492        <varlistentry>
493          <term><classname>ProjectKey</classname></term>
494          <listitem>
495            <para>
496            Is used to give members of a project access to a specific item. The item
497            must be a <interfacename>ShareableData</interfacename> item. Once created a
498            project key cannot be changed. This allows the core to reuse a key if the
499            permissions match exactly, ie. for a given set of projects/permissions
500            there can be only one project key object.
501            </para>
502          </listitem>
503        </varlistentry>
504
505        <varlistentry>
506          <term><classname>RoleKey</classname></term>
507          <listitem>
508            <para>
509            Is used to give a user access to all items of a specific type, ie.
510            <constant>READ</constant> all <constant>SAMPLES</constant>. The installation
511            will make sure that there already exists a role key for each type of item, and
512            it is not possible to add new or delete existing keys. Unlike the other two types
513            this key can be modified.
514            </para>
515           
516            <para>
517            A role key is also used to assign permissions to plug-ins. If a plug-in has
518            been specified to use permissions the default is to deny everything.
519            The mapping to the role key is used to grant permissions to the plugin.
520            The <varname>granted</varname> value gives the plugin access to all items
521            of the related item type regardless of if the user that is running the plug-in has the
522            permission or not. The <varname>denied</varname> values denies access to all
523            items of the related item type even if the logged in user has the permission.
524            Permissions that are not granted nor denied are checked against the
525            logged in users regular permissions. Permissions to items that are
526            not linked are always denied.
527            </para>
528          </listitem>
529        </varlistentry>
530        </variablelist>
531       
532      </sect3>
533
534      <sect3 id="data_api.authentication.permissions">
535        <title>Permissions</title>
536       
537        <para>
538          The <varname>permission</varname> property appearing in many classes is an
539          integer values describing the permission:
540        </para>
541       
542        <informaltable>
543        <tgroup cols="2">
544          <colspec colname="value" />
545          <colspec colname="permission" />
546          <thead>
547            <row>
548              <entry>Value</entry>
549              <entry>Permission</entry>
550            </row>
551          </thead>
552          <tbody>
553            <row>
554              <entry>1</entry>
555              <entry>Read</entry>
556            </row>
557            <row>
558              <entry>3</entry>
559              <entry>Use</entry>
560            </row>
561            <row>
562              <entry>7</entry>
563              <entry>Restricted write</entry>
564            </row>
565            <row>
566              <entry>15</entry>
567              <entry>Write</entry>
568            </row>
569            <row>
570              <entry>31</entry>
571              <entry>Delete</entry>
572            </row>
573            <row>
574              <entry>47 (=32+15)</entry>
575              <entry>Set owner</entry>
576            </row>
577            <row>
578              <entry>79 (=64+15)</entry>
579              <entry>Set permissions</entry>
580            </row>
581            <row>
582              <entry>128</entry>
583              <entry>Create</entry>
584            </row>
585            <row>
586              <entry>256</entry>
587              <entry>Denied</entry>
588            </row>
589          </tbody>
590        </tgroup>
591        </informaltable>
592       
593        <para>
594          The values are constructed so that
595          <constant>READ</constant> -&gt;
596          <constant>USE</constant> -&gt;
597          <constant>RESTRICTED_WRITE</constant> -&gt;
598          <constant>WRITE</constant> -&gt;
599          <constant>DELETE</constant>
600          are chained in the sense that a higher permission always implies the lower permissions
601          also. The <constant>SET_OWNER</constant> and <constant>SET_PERMISSION</constant>
602          both implies <constant>WRITE</constant> permission. The <constant>DENIED</constant>
603          permission is only valid for role keys, and if specified it overrides all
604          other permissions.               
605        </para>
606       
607        <para>
608          When combining permission for a single item the permission codes for the different
609          paths are OR-ed together. For example a user has a role key with <constant>READ</constant>
610          permission for <constant>SAMPLES</constant>, but also an item key with <constant>USE</constant>
611          permission for a specific sample. Of course, the resulting permission for that
612          sample is <constant>USE</constant>. For other samples the resulting permission is
613          <constant>READ</constant>.
614        </para>
615       
616        <para>
617          If the user is also a member of a project which has <constant>WRITE</constant>
618          permission for the same sample, the user will have <constant>WRITE</constant>
619          permission when working with that project.
620        </para>
621       
622        <para>
623          The <constant>RESTRICTED_WRITE</constant> permission is in most cases the same
624          as the <constant>WRITE</constant> permission. So far the <constant>RESTRICTED_WRITE</constant>
625          permission is only given to users to their own <classname>UserData</classname>
626          object so they can change their address and other contact information,
627          but not quota, expiration date and other administrative information.
628        </para>
629
630      </sect3>
631    </sect2>
632
633    <sect2 id="data_api.wares">
634      <title>Hardware and software</title>
635    </sect2>
636   
637    <sect2 id="data_api.reporters">
638      <title>Reporters</title>
639    </sect2>
640
641    <sect2 id="data_api.quota">
642      <title>Quota and disk usage</title>
643    </sect2>
644
645    <sect2 id="data_api.sessions">
646      <title>Client, session and settings</title>
647    </sect2>
648
649    <sect2 id="data_api.files">
650      <title>Files and directories</title>
651
652      <para>
653        This section covers the details of the BASE file
654        system.
655      </para>
656
657      <sect3 id="data_api.files.uml">
658      <title>UML diagram</title>
659     
660        <figure id="data_api.figures.files">
661          <title>Files and directories</title>
662          <screenshot>
663            <mediaobject>
664              <imageobject>
665                <imagedata 
666                  fileref="figures/uml/datalayer.files.png" format="PNG" />
667              </imageobject>
668            </mediaobject>
669          </screenshot>
670        </figure>
671      </sect3>
672     
673      <sect3 id="data_api.files.description">
674        <title>Description</title>
675       
676        <para>
677          The <classname>DirectoryData</classname> class holds
678          information about directories. Directories are organised in the
679          ususal way as as tree structure. All directories must have
680          a parent directory, except the system-defined root directory.
681        </para>
682       
683        <para>
684          The <classname>FileData</classname> class holds information about
685          a file. The actual file contents is stored on disk in the directory
686          specified by the <varname>userfiles</varname> setting in
687          <filename>base.config</filename>. The <varname>internalName</varname>
688          property is the name of the file on disk, but this is never exposed to
689          client applications. The filenames and directories
690          on the disk doesn't correspond to the the filenames and directories in
691          BASE.
692        </para>
693       
694        <para>
695          The <varname>location</varname> property can take three values:
696        </para>
697       
698        <itemizedlist>
699        <listitem>
700          <para>
701          0 = The file is offline, ie. there is no file on the disk
702          </para>
703        </listitem>
704        <listitem>
705          <para>
706          1 = The file is in primary storage, ie. it is located on the disk
707          and can be used by BASE
708          </para>
709        </listitem>
710        <listitem>
711          <para>
712          2 = The file is in secondary storage, ie. it has been moved to some
713          other place and can't be used by BASE immediately.
714          </para>
715        </listitem>
716        </itemizedlist>
717       
718        <para>
719          The <varname>action</varname> property controls how a file is
720          moved between primary and seconday storage. It can have the following
721          values:
722        </para>
723       
724        <itemizedlist>
725        <listitem>
726          <para>
727          0 = Do nothing
728          </para>
729        </listitem>
730        <listitem>
731          <para>
732          1 = If the file is in secondary storage, move it back to the primary storage
733          </para>
734        </listitem>
735        <listitem>
736          <para>
737          2 = If the file is in primary storage, move it to the secondary storage
738          </para>
739        </listitem>
740        </itemizedlist>
741       
742        <para>
743          The actual moving between primary and secondary storage is done by an
744          external program. See
745          <xref linkend="appendix.base.config.secondary" /> and
746          <xref linkend="plugin_developer.other.secondary" /> for more information.
747        </para>
748     
749        <para>
750          The <varname>md5</varname> property can be used to check for file
751          corruption when it is moved between primary and secondary storage or
752          when a user re-uploads a file that has been offline.
753        </para>
754       
755        <para>
756          BASE can store files in a compressed format. This is handled internally
757          and is not visible to client applications. The <varname>compressed</varname>
758          and <varname>diskSize</varname> properties are used to store information
759          about this. A file may always be compressed if the users says so, but
760          BASE can also do this automatically if the file is uploaded
761          to a directory with the <varname>autoCompress</varname> flag set
762          or if the file has MIME type with the <varname>autoCompress</varname>
763          flag set.
764        </para>
765       
766        <para>
767          The <classname>FileTypeData</classname> class holds information about
768          file types. It is used only to make it easier for users to organise
769          their files.
770        </para>
771       
772        <para>
773          The <classname>MimeTypeData</classname> is used to register mime types and
774          map them to file extensions. The information is only used to lookup values
775          when needed. Given the filename we can set the <varname>File.mimeType</varname>
776          and <varname>File.fileType</varname> properties. The MIME type is also
777          used to decide if a file should be stored in a compressed format or not.
778          The extension of a MIME type must be unique. Extensions should be registered
779          without a dot, ie <emphasis>html</emphasis>, not <emphasis>.html</emphasis>
780        </para>
781       
782      </sect3>
783     
784     
785    </sect2>
786   
787    <sect2 id="data_api.platforms">
788      <title>Experimental platforms</title>
789
790      <para>
791         This section gives an overview of experimental platforms
792         and how they are used to enable data storage in files
793         instead of in the database.
794      </para>
795     
796      <itemizedlist>
797        <title>See also</title>
798        <listitem><xref linkend="core_api.data_in_files" /></listitem>
799        <listitem><xref linkend="appendix.rawdatatypes.platforms" /></listitem>
800      </itemizedlist>
801         
802      <sect3 id="data_api.platforms.uml">
803        <title>UML diagram</title>
804       
805        <figure id="data_api.figures.platforms">
806          <title>Experimental platforms</title>
807          <screenshot>
808            <mediaobject>
809              <imageobject>
810                <imagedata 
811                  fileref="figures/uml/datalayer.platforms.png" format="PNG" />
812              </imageobject>
813            </mediaobject>
814          </screenshot>
815        </figure>
816      </sect3>
817     
818      <sect3 id="data_api.platforms.platforms">
819        <title>Platforms</title>
820       
821        <para>
822          The <classname>PlatformData</classname> holds information about a
823          platform. A platform can have one or more <classname>PlatformVariant</classname>:s.
824          Both the platform and variant are identified by an external ID that
825          is fixed and can't be changed. <emphasis>Affymetrix</emphasis>
826          and <emphasis>Illumina</emphasis> are examples of platforms.
827          If the <varname>fileOnly</varname> flag is set data for the platform
828          can only be stored in files and not imported into the database. If
829          the flag is not set data can be imported into the database.
830          The <varname>rawDataType</varname> can be used to lock the platform
831          to a specific raw data type. If the value is <constant>null</constant>
832          the platform can use any raw data type.
833        </para>
834       
835        <para>
836          Each platform and it's variant can be connected to one or more
837          <classname>DataFileTypeData</classname> items. This item
838          describes the kind of files that are used to hold data for
839          the platform and/or variant. The file types are re-usable between
840          different platforms and variants. Note that a file type may be attached
841          to either only a platform or to a platform with a variant. File
842          types attached to platforms are inherited by the variants. The variants
843          can only define additional file types, not remove or redefine file types
844          that has been attached to the platform.
845        </para>
846        <para>
847          The file type is also identified
848          by a fixed, non-changable external ID. The <varname>itemType</varname>
849          property tells us what type of item the file holds data for (ie.
850          array design or raw bioassay). It also links to a <classname>FileType</classname>
851          which is the generic type of data in the file. This allows to query
852          the database for, as an example, for files with the generic type
853          <constant>FileType.RAW_DATA</constant>. If we are in an Affymetrix
854          experiment we will get the CEL file, for another platform we will
855          get another file.
856        </para>
857        <para>
858          The <varname>required</varname> flag in <classname>PlatformFileTypeData</classname>
859          is used to signal that the file is a required file. This will, however, not be
860          enforeced by the core. It is intended to be used by client applications
861          for creating a better GUI and/or validation of an experiment.
862        </para>
863
864      </sect3>
865     
866      <sect3 id="data_api.platforms.files">
867        <title>FileStoreEnabled items and data files</title>
868       
869        <para>
870          An item must implement the <interfacename>FileStoreEnabledData</interfacename>
871          interface to be able to store data in files instead of in the database.
872          The interface creates a link to a <classname>FileSetData</classname> object,
873          which is can hold several <classname>FileSetMemberData</classname> items.
874          Each member points to specific <classname>FileData</classname> item.
875          A file set can only store one file of each <classname>DataFileTypeData</classname>.
876        </para>
877       
878      </sect3>
879    </sect2>
880
881    <sect2 id="data_api.protocols">
882      <title>Protocols</title>
883    </sect2>
884
885    <sect2 id="data_api.parameters">
886      <title>Parameters</title>
887    </sect2>
888
889    <sect2 id="data_api.annotations">
890      <title>Annotations</title>
891    </sect2>
892
893    <sect2 id="data_api.plugins">
894      <title>Plug-ins, jobs and job agents</title>
895    </sect2>
896   
897    <sect2 id="data_api.biomaterials">
898      <title>Biomaterials</title>
899    </sect2>
900
901    <sect2 id="data_api.plates">
902      <title>Array LIMS - plates</title>
903    </sect2>
904
905    <sect2 id="data_api.arrays">
906      <title>Array LIMS - arrays</title>
907    </sect2>
908
909    <sect2 id="data_api.rawdata">
910      <title>Hybridizations and raw data</title>
911    </sect2>
912
913    <sect2 id="data_api.experiments">
914      <title>Experiments and analysis</title>
915    </sect2>
916   
917    <sect2 id="data_api.misc">
918      <title>Other classes</title>
919    </sect2>
920
921  </sect1>
922 
923  <sect1 id="api_overview.core_api" chunked="1">
924    <title>The Core API</title>
925   
926    <para>
927      This section gives an overview of various parts of the core API.
928    </para>
929   
930    <sect2 id="core_api.data_in_files">
931      <title>Using files to store data</title>
932     
933      <para>
934        This section is about how BASE can use files to store data instead
935        of importing it into the database. Files can be attached
936        to any item that implements the <interfacename>FileStoreEnabled</interfacename>
937        interface. Currently this is <classname>RawBioAssay</classname>
938        and <classname>ArrayDesign</classname>. The
939        ability to store data in files is not a replacement for storing data in the
940        database. It is possible (for some platforms/raw data types) to have data in
941        files and in the database at the same time. We would have liked to enforce
942        that (raw) data is always present in files, but this will not be backwards compatible
943        with older installations, so there are three cases:
944      </para>
945     
946      <itemizedlist>
947      <listitem>
948        <para>
949        Data in files only
950        </para>
951      </listitem>
952      <listitem>
953        <para>
954        Data in the database only
955        </para>
956      </listitem>
957      <listitem>
958        <para>
959        Data in both files and in the database
960        </para>
961      </listitem>
962      </itemizedlist>
963     
964      <para>
965        Not all three cases are supported for all types of data. This is controlled
966        by the <classname>Platform</classname> class, which may disallow
967        that data is stored in the database. To check this call
968        <methodname>Platform.isFileOnly()</methodname> and/or
969        <methodname>Platform.getRawDataType()</methodname>. If the <methodname>isFileOnly()</methodname>
970        method returns <constant>true</constant>, the platform can't store data in
971        the database. If the value is <constant>false</constant> more information
972        can be obtained by calling <methodname>getRawDataType()</methodname>,
973        which may return:
974      </para>
975     
976      <itemizedlist>
977      <listitem>
978        <para>
979          <constant>null</constant>: The platform can store data with any
980          raw data type in the database.
981        </para>
982      </listitem>
983      <listitem>
984        <para>
985        A <classname>RawDataType</classname> that has <code>isStoredInDb() == true</code>:
986        The platform can store data in the database but only data with the specified raw
987        data type.
988        </para>
989      </listitem>
990      <listitem>
991        <para>
992        A <classname>RawDataType</classname> that has <code>isStoredInDb() == false</code>:
993        The platform can't store data in the database.
994        </para>
995      </listitem>
996      </itemizedlist>
997
998      <para>
999        One major modification is that the registration of raw data types
1000        has changed. The <filename>raw-data-types.xml</filename> file should
1001        only be used for raw data types that are stored in the database. The
1002        <sgmltag>storage</sgmltag> tag has been deprecated and BASE will refuse
1003        to start if it finds a raw data type definitions with <code>storage="file"</code>.
1004      </para>
1005     
1006      <para>
1007        For backwards compatibility reasons, each <classname>Platform</classname>
1008        that can only store data in files will create "virtual" raw data type
1009        objects internally. These raw data types all return <constant>false</constant>
1010        from if the method <methodname>RawDataType.isStoredInDb()</methodname>
1011        is called. They also have a back-link to the platform/variant that
1012        created it: <methodname>RawDataType.getPlatform()</methodname>
1013        and <methodname>RawDataType.getVariant()</methodname>. Theese two methods
1014        will always return null when called on a raw data type that can be
1015        stored in the database.
1016      </para>
1017     
1018      <itemizedlist>
1019        <title>See also</title>
1020        <listitem><xref linkend="data_api.platforms" /></listitem>
1021        <listitem><xref linkend="appendix.rawdatatypes.platforms" /></listitem>
1022        <listitem>
1023          <xref linkend="appendix.incompatible.2.5" xrefstyle=""/> in
1024          <xref linkend="appendix.incompatible" />
1025        </listitem>
1026      </itemizedlist>
1027     
1028      <sect3 id="core_api.data_in_files.diagram">
1029        <title>Diagram of classes and methods</title>
1030        <figure id="core_api.figures.data_in_files">
1031          <title>Store data in files</title>
1032          <screenshot>
1033            <mediaobject>
1034              <imageobject>
1035                <imagedata 
1036                  fileref="figures/uml/corelayer.datainfiles.png" format="PNG" />
1037              </imageobject>
1038            </mediaobject>
1039          </screenshot>
1040        </figure>
1041      </sect3>
1042     
1043      <sect3 id="core_api.data_in_files.ask">
1044        <title>Asking the user for files</title>
1045
1046        <para>
1047          A client application must know what types of files it makes sense
1048          to ask the user for. In some cases, data may be split into more than
1049          one file so we need a generic way to select files.
1050        </para>
1051       
1052        <para>
1053          Given that we have a <interfacename>FileStoreEnabled</interfacename>
1054          item we want to find out which <classname>DataFileType</classname>
1055          items that can be used for that item. The
1056          <methodname>DataFileType.getQuery(FileStoreEnabled)</methodname>
1057          can be used for this. Internally, the method uses the result from
1058          <methodname>FileStoreEnabled.getPlatform()</methodname>
1059          and <methodname>FileStoreEnabled.getVariant()</methodname>
1060          methods to restrict the query to only return file types for
1061          a given platform and/or variant. If the item doesn't have
1062          a platform or variant the query will return all file types
1063          that are associated with the given item type. In any case, we get a list
1064          of <classname>DataFileType</classname> items, each one representing a
1065          specific file type that we should ask the user about. Examples:
1066        </para>
1067
1068        <orderedlist>
1069        <listitem>
1070          <para>
1071          The <constant>Affymetrix</constant> platform defines <constant>CEL</constant>
1072          as a raw data file and <constant>CDF</constant> as an array design (feature)
1073          file. If we have a <classname>RawBioAssay</classname> the query will only return
1074          the CEL file type and the client can ask the user for a CEL file.
1075          </para>
1076        </listitem>
1077        <listitem>
1078          <para>
1079          The <constant>Generic</constant> platform defines <constant>PRINT_MAP</constant>
1080          and <constant>REPORTER_MAP</constant> for array designs. If we have
1081          an <classname>ArrayDesign</classname> the query will return those two
1082          items.
1083          </para>
1084        </listitem>
1085        </orderedlist>
1086     
1087        <para>
1088          It might also be interesting to know the currently selected file
1089          for each file type and if the platform has set the <varname>required</varname>
1090          flag for a particular file type. Here is a simple code example
1091          that may be useful to start from:
1092        </para>
1093     
1094        <programlisting>
1095DbControl dc = ...
1096FileStoreEnabled item = ...
1097Platform platform = item.getPlatform();
1098PlatformVariant variant = item.getVariant();
1099
1100// Get list of DataFileTypes used by the platform
1101ItemQuery&lt;DataFileType&gt; query =
1102   FileStoreUtil.getQuery(item);
1103List&lt;DataFileType&gt; types = query.list(dc);
1104
1105// Always check hasFileSet() method first to avoid
1106// creating the file set if it doesn't exists
1107FileSet fileSet = item.hasFileSet() ?
1108   null : item.getFileSet();
1109   
1110for (DataFileType type : types)
1111{
1112   // Get the current file, if any
1113   FileSetMember member = fileSet == null || !fileSet.hasMember(type) ?
1114      null : fileSet.getMember(type);
1115   File current = member == null ?
1116      null : member.getFile();
1117   
1118   // Check if a file is required by the platform
1119   PlatformFileType pft = platform == null ?
1120      null : platform.getFileType(type, variant);
1121   boolean isRequired = pft == null ?
1122      false : pft.isRequired();
1123     
1124   // Now we can do something with this information to
1125   // let the user select a file ...
1126}
1127</programlisting>
1128     
1129        <note>
1130          <title>Also remember to catch PermissionDeniedException</title>
1131          <para>
1132            The above code may look complicated, but this is mostly because
1133            of all checks for <constant>null</constant> values. Remember
1134            that many things are optional and may return <constant>null</constant>.
1135            Another thing to look out for is
1136            <exceptionname>PermissionDeniedException</exceptionname>:s. The logged in
1137            user may not have access to all items. The above example doesn't include
1138            any code for this since it would have made it too complex.
1139          </para>
1140        </note>
1141      </sect3>
1142     
1143      <sect3 id="core_api.data_in_files.link">
1144        <title>Link, validate and extract metadata from the selected files</title>
1145        <para>
1146          When the user has selected the file(s) we must store the links
1147          to them in the database. This is done with a <classname>FileSet</classname>.
1148          object. A file set can contain any number of files. The only limitation
1149          is that it can only contain one file for each file type.
1150          Call <methodname>FileSet.setMember()</methodname> to store
1151          a file in the set. If a file already exists for the given file type
1152          it is replaced, otherwise a new entry is created. The following
1153          program example assumes that we have a map where <classname>File</classname>:s
1154          are related to <classname>DataFileType</classname>:s
1155         
1156        </para>
1157       
1158        <programlisting>
1159DbControl dc = ...
1160FileStoreEnabled item = ...
1161Map&lt;DataFileType, File&gt; files = ...
1162
1163// Store the selected files in the fileset
1164FileSet fileSet = item.getFileSet();
1165for (Map.Entry&lt;DataFileType, File&gt; entry : files)
1166{
1167   DataFileType type = entry.getKey();
1168   File file = entry.getValue();
1169   fileSet.setMember(type, file);
1170}
1171
1172// Validate the files and extract metadata
1173fileSet.validate(dc, true);
1174</programlisting>
1175      </sect3>
1176     
1177      <sect3 id="core_api.data_in_files.validate">
1178        <title>How the core validate the files and extracts metadata</title>
1179       
1180        <para>
1181          Validation and extraction of metadata is important since we want
1182          data in files to be equivalent to data in the database. The validation
1183          and metadata extraction is done by the core when the
1184          <methodname>FileSet.validate()</methodname>.
1185          The process is partly pluggable since each <classname>DataFileType</classname> 
1186          can name a class that should do the validation and/or metadata extraction.
1187          Here is the general outline of what is going on:
1188        </para>
1189       
1190       
1191        <orderedlist>
1192        <listitem>
1193          <para>
1194          The core checks the <classname>DataFileType</classname> of all
1195          members in the file set and create <classname>DataFileValidator</classname>
1196          and <classname>DataFileMetadataReader</classname> objects. Only one instance
1197          of each class is created. If the file set contains members which has the
1198          same validator or metadata reader, they will all share the same instance.
1199          </para>
1200        </listitem>
1201       
1202        <listitem>
1203          <para>
1204          Each validator/metadata reader class is initialised with calls to
1205          <methodname>DataFileHandler.setItem()</methodname> and
1206          <methodname>DataFileHandler.setFile()</methodname>.
1207          </para>
1208        </listitem>
1209       
1210        <listitem>
1211          <para>
1212          Each validator is called. The result of the validation is saved for each
1213          file and can be retreieved by <methodname>FileSetMember.isValid()</methodname>
1214          and <methodname>FileSetMember.getErrorMessage()</methodname>.
1215          </para>
1216        </listitem>
1217       
1218        <listitem>
1219          <para>
1220          Each metadata reader is called, unless the metadata reader is the same class
1221          as the validator and the validation failed. If the metadata reader is a
1222          different class, it is called even if the validation failed.
1223          </para>
1224        </listitem>
1225        </orderedlist>
1226
1227        <note>
1228          <title>Only one instance of each validator class is created</title>
1229          <para>
1230          The validation/metadata extraction is not done until all files have been
1231          added to the fileset. If the same validator/meta data extractor is
1232          used for more than one file, the same instance is reused. Ie.
1233          the <methodname>setFile()</methodname> is called one time
1234          for each file/file type pair. The <methodname>validate()</methodname>
1235          and <methodname>extractMetadata()</methodname> methods are only
1236          called once.
1237          </para>
1238        </note>
1239       
1240        <para>
1241          All validators and meta data extractors should extend
1242          the <classname>AbstractDataFileHandler</classname> class. The reason
1243          is that we may want to add more methods to the <interfacename>DataFileHandler</interfacename>
1244          interface in the future. The <classname>AbstractDataFileHandler</classname> will
1245          be used to provide default implementations for backwards compatibility.
1246        </para>
1247       
1248      </sect3>
1249     
1250      <sect3 id="core_api.data_in_files.import">
1251        <title>Import data into the database</title>
1252       
1253        <para>
1254          This should be done by existing plug-ins in the same way as before.
1255          A slight modification is needed since it is good if the importers
1256          are made aware of already selected files in the <classname>FileSet</classname>
1257          to provide good default values. Something like this.
1258        </para>
1259       
1260        <programlisting>
1261RawBioAssay rba = ...
1262DbControl dc = ...
1263
1264// Get the current raw data file, if any
1265List&lt;File&gt; rawDataFiles = FileStoreUtil.getGenericDataFiles(dc, rba, FileType.RAW_DATA);
1266File defaultFile = rawDataFiles.size() > 0 ?
1267   rawDataFiles.get(0) : null;
1268   
1269// Create parameter asking for input file - use current as default
1270PluginParameter&lt;File&gt; fileParameter = new PluginParameter&lt;File&gt;(
1271   "file",
1272   "Raw data file",
1273   "The file that contains the raw data that you want to import",
1274   new FileParameterType(defaultFile, true, 1)
1275);
1276</programlisting>
1277
1278      <para>
1279        An import plug-in should also save the file that was used to the file set:
1280      </para>
1281     
1282      <programlisting>
1283RawBioassay rba = ...
1284// The file the user selected to import from
1285File rawDataFile = (File)job.getValue("file");
1286
1287// Save the file to the fileset. The method will check which file
1288// type the platform uses as the raw data type. As a fallback the
1289// GENERIC_RAW_DATA type is used
1290FileStoreUtil.setGenericDataFile(dc, rba, FileType.RAW_DATA,
1291   DataFileType.GENERIC_RAW_DATA, rawDataFile);
1292</programlisting>
1293
1294      </sect3>
1295     
1296      <sect3 id="core_api.data_in_files.experiments">
1297        <title>Using raw data from files in an experiment</title>
1298       
1299        <para>
1300          Just as before, an experiment is still locked to a single
1301          <classname>RawDataType</classname>. This is a design issue that
1302          would break too many things if changed. If data is stored in files
1303          the experiment is also locked to a single <classname>Platform</classname>.
1304          This has been designed to have as little impact on existing
1305          plug-ins as possible. In most cases, the plug-ins will continue
1306          to work as before.
1307        </para>
1308       
1309        <para>
1310          A plug-in (using data from the database that needs to check if it can
1311          be used within an experiment can still do:
1312        </para>
1313       
1314        <programlisting>
1315Experiment e = ...
1316RawDataType rdt = e.getRawDataType();
1317if (rdt.isStoredInDb())
1318{
1319   // Check number of channels, etc...
1320   // ... run plug-in code ...
1321}
1322</programlisting>
1323       
1324        <para>
1325          A newer plug-in which uses data from files should do:
1326        </para>
1327       
1328        <programlisting>
1329Experiment e = ...
1330DbControl dc = ...
1331RawDataType rdt = e.getRawDataType();
1332if (!rdt.isStoredInDb())
1333{
1334   Platform p = rdt.getPlatform(dc);
1335   PlatformVariant v = rdt.getVariant(dc);
1336   // Check that platform/variant is supported
1337   // ... run plug-in code ...
1338}
1339</programlisting>
1340       
1341      </sect3>
1342     
1343    </sect2>
1344  </sect1>
1345
1346  <sect1 id="api_overview.query_api">
1347    <title>The Query API</title>
1348    <para>
1349      This documentation is only available in the old format.
1350      See <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/query/index.html"
1351        >http://base.thep.lu.se/chrome/site/doc/development/overview/query/index.html</ulink>
1352    </para>
1353   
1354  </sect1>
1355 
1356  <sect1 id="api_overview.dynamic_and_batch_api">
1357    <title>Analysis and the Dynamic and Batch API:s</title>
1358    <para>
1359      This documentation is only available in the old format.
1360      See <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/dynamic/index.html"
1361        >http://base.thep.lu.se/chrome/site/doc/development/overview/dynamic/index.html</ulink>
1362    </para>
1363  </sect1>
1364
1365  <sect1 id="api_overview.other_api">
1366    <title>Other useful classes and methods</title>
1367    <para>
1368      TODO
1369    </para>
1370  </sect1>
1371 
1372</chapter>
Note: See TracBrowser for help on using the repository browser.