source: trunk/doc/src/docbook/developerdoc/api_overview.xml @ 3795

Last change on this file since 3795 was 3795, checked in by Nicklas Nordborg, 15 years ago

References #721: Updated documentation to reflect changes in the code [3793]

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 42.3 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE chapter PUBLIC
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
5<!--
6  $Id: api_overview.xml 3795 2007-09-27 19:52:01Z nicklas $
7
8  Copyright (C) 2007 Peter Johansson, Nicklas Nordborg, Martin Svensson
9
10  This file is part of BASE - BioArray Software Environment.
11  Available at http://base.thep.lu.se/
12
13  BASE is free software; you can redistribute it and/or
14  modify it under the terms of the GNU General Public License
15  as published by the Free Software Foundation; either version 2
16  of the License, or (at your option) any later version.
17
18  BASE is distributed in the hope that it will be useful,
19  but WITHOUT ANY WARRANTY; without even the implied warranty of
20  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
21  GNU General Public License for more details.
22
23  You should have received a copy of the GNU General Public License
24  along with this program; if not, write to the Free Software
25  Foundation, Inc., 59 Temple Place - Suite 330,
26  Boston, MA  02111-1307, USA.
27-->
28
29<chapter id="api_overview">
30  <?dbhtml dir="api"?>
31  <title>API overview (how to use and code examples)</title>
32
33  <sect1 id="api_overview.public_api">
34    <title>The Public API of BASE</title>
35   
36    <para>
37      Not all public classes and methods in the <filename>BASE2Core.jar</filename>
38      and other JAR files shipped with BASE are considered as
39      <emphasis>Public API</emphasis>. This is important knowledge
40      since we will always try to maintain backwards compatibility
41      for classes that are part of the public API. For other
42      classes, changes may be instroduced at any time without
43      notice or specific documentation. In other words:
44    </para>
45   
46    <note>
47      <title>Only use the public API when developing plug-ins</title>
48      <para>
49        This will maximize the chance that you plug-in will continue
50        to work with the next BASE release. If you use the non-public API
51        you do so at your own risk.
52      </para>
53    </note>
54   
55    <para>
56      See the <ulink url="http://base.thep.lu.se/chrome/site/doc/api/index.html"
57        >javadoc</ulink> for information about
58      what parts of the API that contributes to the public API.
59      Methods, classes and other elements that have been tagged as
60      <code>@deprecated</code> should be considered as part of the internal API
61      and may be removed in a subsequent relase without warning.
62    </para>
63   
64    <para>
65      See <xref linkend="appendix.incompatible" /> to read more about
66      changes that have been introduced by each release.
67    </para>
68
69    <sect2 id="api_overview.compatibility">
70      <title>What is backwards compatibility?</title>
71     
72      <para>
73        There is a great article about this subject on <ulink 
74        url="http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs"
75          >http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs</ulink>.
76        This is what we will try to comply with. If you do not want to
77        read the entire article, here are some of the most important points:
78      </para>
79     
80     
81      <sect3 id="api_overview.compatibility.binary">
82        <title>Binary compatibility</title>
83        <para>
84        <blockquote>
85          Pre-existing Client binaries must link and run with new releases of the
86          Component without recompiling.
87        </blockquote>
88       
89        For example:
90        <itemizedlist>
91        <listitem>
92          <para>
93            We cannot change the number or types of parameters to a method
94            or constructor.
95          </para>
96        </listitem>
97        <listitem>
98          <para>
99            We cannot add or change methods to interfaces that are intended
100            to be implemented by plug-in or client code.
101          </para>
102        </listitem>
103        </itemizedlist>
104        </para>       
105      </sect3>
106     
107      <sect3 id="api_overview.compatibility.contract">
108        <title>Contract compatibility</title>
109        <para>
110          <blockquote>
111          API changes must not invalidate formerly legal Client code.
112          </blockquote>
113       
114          For example:
115          <itemizedlist>
116          <listitem>
117            <para>
118              We cannot change the implementation of a method to do
119              things differently than before. For example, allow <constant>null</constant>
120              as a return value when it was not allowed before.
121            </para>
122          </listitem>
123          </itemizedlist>
124       
125          <note>
126            <para>
127            Sometimes there is a very fine line between what is considered a
128            bug and what is considered a feature. For example, if the
129            actual implementation does not do what the javadoc says,
130            do we change the code or do we change the documentation?
131            This has to be considered from case to case and depends on
132            the age of the code and if we expect plug-ins and clients to be
133            affected by it or not.
134            </para>
135          </note>
136        </para>
137      </sect3>
138     
139      <sect3 id="api_overview.compatibility.source">
140        <title>Source code compatibility</title>
141        <para>
142        This is not an important matter and is not always possible to
143        achieve. In most cases, the problems are easy to fix.
144        Example:
145       
146        <itemizedlist>
147        <listitem>
148          <para>
149          Adding a class may break a plug-in or client that import
150          classes with <constant>.*</constant> if the same class name
151          exists in another package.
152          </para>
153        </listitem>
154        </itemizedlist>
155        </para>
156      </sect3>
157    </sect2>
158  </sect1>
159
160  <sect1 id="api_overview.data_api" chunked="1">
161    <title>The database schema and the Data Layer API</title>
162
163    <para>
164      This section gives an overview of the entire data layer API.
165      The figure below show how different modules relate to each other.
166    </para>
167   
168    <note>
169      All information has not yet been transfered from the old documentation.
170      The old documentation is available at
171      <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/data/index.html"
172        >http://base.thep.lu.se/chrome/site/doc/development/overview/data/index.html</ulink>
173    </note>
174   
175    <figure id="data_api.figures.overview">
176      <title>Data layer overview</title>
177      <screenshot>
178        <mediaobject>
179          <imageobject>
180            <imagedata 
181              fileref="figures/uml/datalayer.overview.png" format="PNG" />
182          </imageobject>
183        </mediaobject>
184      </screenshot>
185    </figure>
186
187    <sect2 id="data_api.basic">
188      <title>Basic classes and interfaces</title>
189     
190      <para>
191        This document contains information about the basic classes and interfaces in this package.
192        They are important since all data-layer classes must inherit from one of the already
193        existing abstract base classes or implement one or more of the
194        existing interfaces. They contain code that is common to all classes,
195        for example implementations of the <methodname>equals()</methodname>
196        and <methodname>hashCode()</methodname> methods or how to link with the owner of an
197        item.
198      </para>
199     
200      <sect3 id="data_api.basic.uml">
201        <title>UML diagram</title>
202       
203        <figure id="data_api.figures.basic">
204          <title>Basic classes and interfaces</title>
205          <screenshot>
206            <mediaobject>
207              <imageobject>
208                <imagedata 
209                  fileref="figures/uml/datalayer.basic.png" format="PNG" />
210              </imageobject>
211            </mediaobject>
212          </screenshot>
213        </figure>
214      </sect3>
215     
216      <sect3 id="data_api.basic.classes">
217        <title>Classes</title>
218       
219        <variablelist>
220        <varlistentry>
221          <term><classname>BasicData</classname></term>
222          <listitem>
223            <para>
224            The root class. It overrides the <methodname>equals()</methodname>,
225            <methodname>hashCode()</methodname> and <methodname>toString()</methodname> methods
226            from the <classname>Object</classname> class. It also defines the
227            <varname>id</varname> and <varname>version</varname> properties.
228            All data layer classes must inherit from this class or one of it's subclasses.
229            </para>
230          </listitem>
231        </varlistentry>
232       
233        <varlistentry>
234          <term><classname>OwnedData</classname></term>
235          <listitem>
236            <para>
237            Extends the <classname>BasicData</classname> class and adds
238            an <varname>owner</varname> property. The owner is a required link to a
239            <classname>UserData</classname> object, representing the user that
240            is the owner of the item.
241            </para>
242          </listitem>
243        </varlistentry>
244
245        <varlistentry>
246          <term><classname>SharedData</classname></term>
247          <listitem>
248            <para>
249            Extends the <classname>OwnedData</classname> class and adds
250            properties (<varname>itemKey</varname> and <varname>projectKey</varname>)
251            that holds access permission information for an item.
252            Access permissions are held in <classname>ItemKeyData</classname> and/or
253            <classname>ProjectKeyData</classname> objects. These objects only exists if
254            the item has been shared.
255            </para>
256          </listitem>
257        </varlistentry>
258
259        <varlistentry>
260          <term><classname>CommonData</classname></term>
261          <listitem>
262            <para>
263            This is a convenience class for items that extends the <classname>SharedData</classname>
264            class and implements the <interfacename>NameableData</interfacename> and
265            <interfacename>RemoveableData</interfacename> interfaces. This is one of
266            the most common situations.
267            </para>
268          </listitem>
269        </varlistentry>
270
271        <varlistentry>
272          <term><classname>AnnotatedData</classname></term>
273          <listitem>
274            <para>
275            This is a convenience class for items that can be annotated.
276            Annotations are held in <classname>AnnotationSetData</classname> objects.
277            The annotation set only exists if annotations has been created for the item.
278            </para>
279          </listitem>
280        </varlistentry>
281        </variablelist>
282       
283      </sect3>
284     
285      <sect3 id="data_api.basic.interfaces">
286        <title>Interfaces</title>
287       
288        <variablelist>
289        <varlistentry>
290          <term><classname>IdentifiableData</classname></term>
291          <listitem>
292            <para>
293            All items are identifiable, which means that they have a unique <varname>id</varname>.
294            The id is unique for all items of a specific type (ie. class). The id is number
295            that is automatically generated by the database and has no other meaning
296            outside of the application. The <varname>version</varname> property is used for
297            detecting and preventing concurrent modifications to an item.
298            </para>
299          </listitem>
300        </varlistentry>
301       
302        <varlistentry>
303          <term><classname>OwnableData</classname></term>
304          <listitem>
305            <para>
306            An ownable item is an item which has an owner. The owner is represented as a
307            required link to a <classname>UserData</classname> object.
308            </para>
309          </listitem>
310        </varlistentry>       
311
312        <varlistentry>
313          <term><classname>ShareableData</classname></term>
314          <listitem>
315            <para>
316            A shareable item is an item which can be shared to other users, groups or projects.
317            Access permissions are held in <classname>ItemKeyData</classname> and/or
318            <classname>ProjectKeyData</classname> objects.
319            </para>
320          </listitem>
321        </varlistentry>
322             
323        <varlistentry>
324          <term><classname>NameableData</classname></term>
325          <listitem>
326            <para>
327            A nameable item is an item that has a name (required) and a description
328            (optional). The name doesn't have to be unique, except in a few special
329            cases (for example, the name of a file).
330            </para>
331          </listitem>
332        </varlistentry>
333       
334        <varlistentry>
335          <term><classname>RemovableData</classname></term>
336          <listitem>
337            <para>
338            A removable item is an item that can be flagged as removed. This doesn't
339            remove the information about the item from the database, but can be used by
340            client applications to hide items that the user is not interested in.
341            A trashcan function can be used to either restore or permanently
342            remove items that has the flag set.
343            </para>
344          </listitem>
345        </varlistentry>
346               
347        <varlistentry>
348          <term><classname>SystemData</classname></term>
349          <listitem>
350            <para>
351            A system item is an item which has an additional id in the form of string. A system id
352            is required when we need to make sure that we can get a specific item without
353            knowing the numeric id. Example of such items are the root user and the everyone group.
354            A system id is generally constructed like:
355            <constant>net.sf.basedb.core.User.ROOT</constant>. The system id:s are defined in the
356            core layer by each item class.
357            </para>
358          </listitem>
359        </varlistentry>
360
361        <varlistentry>
362          <term><classname>DiskConsumableData</classname></term>
363          <listitem>
364            <para>
365            This interface is used by items which occupies a lot of disk space and
366            should be part of the quota system, for example files. The required
367            <classname>DiskUsageData</classname> contains information about the size,
368            location, owner etc. of the item.
369            </para>
370          </listitem>
371        </varlistentry>
372       
373        <varlistentry>
374          <term><classname>AnnotatableData</classname></term>
375          <listitem>
376            <para>
377            This interface is used by items which can be annotated. Annotations are name/value
378            pairs that are attached as extra information to an item. All annotations are
379            contained in an <classname>AnnotationSetData</classname> object.
380            </para>
381          </listitem>
382        </varlistentry>
383       
384        <varlistentry>
385          <term><classname>ExtendableData</classname></term>
386          <listitem>
387            <para>
388            This interface is used by items which can have extra administrator-defined
389            columns. The functionality is similar to annotations. It is not as flexible,
390            since it is a global configuration, but has better performance. BASE will
391            generate extra database columns to store the data in the tables for items that
392            can be extended.
393            </para>
394          </listitem>
395        </varlistentry>
396       
397        <varlistentry>
398          <term><classname>BatchableData</classname></term>
399          <listitem>
400            <para>
401            This interface is a tagging interface which is used by items that needs batch
402            functionality in the core.
403            </para>
404          </listitem>
405        </varlistentry>
406        </variablelist>
407
408      </sect3>
409    </sect2>
410   
411    <sect2 id="data_api.authentication">
412      <title>User authentication and access control</title>
413     
414      <para>
415         This section gives an overview of user authentication and
416         how groups, roles and projects are used for access control
417         to items.
418      </para>
419     
420      <sect3 id="data_api.authentication.uml">
421        <title>UML diagram</title>
422       
423        <figure id="data_api.figures.authentication">
424          <title>User authentication and access control</title>
425          <screenshot>
426            <mediaobject>
427              <imageobject>
428                <imagedata 
429                  fileref="figures/uml/datalayer.authentication.png" format="PNG" />
430              </imageobject>
431            </mediaobject>
432          </screenshot>
433        </figure>
434      </sect3>
435     
436      <sect3 id="data_api.authentication.users">
437        <title>Users and passwords</title>     
438     
439        <para>
440          The <classname>UserData</classname> class holds information about users.
441          We keep the passwords in a separate table and use proxies to avoid loading
442          password data each time a user is loaded to minimize security risks. It is
443          only if the password needs to be changed that the <classname>PasswordData</classname>
444          object is loaded. The one-to-one mapping between user and password is controlled
445          by the password class, but a cascade attribute on the user class makes sure
446          that the password is deleted when a user is deleted.
447        </para>
448      </sect3>
449
450      <sect3 id="data_api.authentication.groups">
451        <title>Groups, roles and projects</title>     
452     
453        <para>
454          The <classname>GroupData</classname>, <classname>RoleData</classname> and
455          <classname>ProjectData</classname> classes holds information about groups, roles
456          and projects respectively. A user may be a member of any number of groups,
457          roles and/or projects. The membership in a project comes with an attached
458          permission values. This is the highest permission the user has in the
459          project. No matter what permission an item has been shared with the
460          user will not get higher permission. Groups may be members of other groups and
461          also in projects.
462        </para>
463       
464      </sect3>
465     
466      <sect3 id="data_api.authentication.keys">
467        <title>Keys</title>     
468     
469        <para>
470          The <classname>KeyData</classname> class and it's subclasses
471          <classname>ItemKeyData</classname>, <classname>ProjectKeyData</classname> and
472          <classname>RoleKeyData</classname>, are used to store information about access
473          permissions to items. To get permission to manipulate an item a user must have
474          access to a key giving that permission. There are three types of keys:
475        </para>
476       
477        <variablelist>
478        <varlistentry>
479          <term><classname>ItemKey</classname></term>
480          <listitem>
481            <para>
482            Is used to give a user or group access to a specific item. The item
483            must be a <interfacename>ShareableData</interfacename> item.
484            The permissions are usually set be the owner of the item. Once created an
485            item key cannot be changed. This allows the core to reuse a key if the
486            permissions match exactly, ie. for a given set of users/groups/permissions
487            there can be only one item key object.
488            </para>
489          </listitem>
490        </varlistentry>
491
492        <varlistentry>
493          <term><classname>ProjectKey</classname></term>
494          <listitem>
495            <para>
496            Is used to give members of a project access to a specific item. The item
497            must be a <interfacename>ShareableData</interfacename> item. Once created a
498            project key cannot be changed. This allows the core to reuse a key if the
499            permissions match exactly, ie. for a given set of projects/permissions
500            there can be only one project key object.
501            </para>
502          </listitem>
503        </varlistentry>
504
505        <varlistentry>
506          <term><classname>RoleKey</classname></term>
507          <listitem>
508            <para>
509            Is used to give a user access to all items of a specific type, ie.
510            <constant>READ</constant> all <constant>SAMPLES</constant>. The installation
511            will make sure that there already exists a role key for each type of item, and
512            it is not possible to add new or delete existing keys. Unlike the other two types
513            this key can be modified.
514            </para>
515           
516            <para>
517            A role key is also used to assign permissions to plug-ins. If a plug-in has
518            been specified to use permissions the default is to deny everything.
519            The mapping to the role key is used to grant permissions to the plugin.
520            The <varname>granted</varname> value gives the plugin access to all items
521            of the related item type regardless of if the user that is running the plug-in has the
522            permission or not. The <varname>denied</varname> values denies access to all
523            items of the related item type even if the logged in user has the permission.
524            Permissions that are not granted nor denied are checked against the
525            logged in users regular permissions. Permissions to items that are
526            not linked are always denied.
527            </para>
528          </listitem>
529        </varlistentry>
530        </variablelist>
531       
532      </sect3>
533
534      <sect3 id="data_api.authentication.permissions">
535        <title>Permissions</title>
536       
537        <para>
538          The <varname>permission</varname> property appearing in many classes is an
539          integer values describing the permission:
540        </para>
541       
542        <informaltable>
543        <tgroup cols="2">
544          <colspec colname="value" />
545          <colspec colname="permission" />
546          <thead>
547            <row>
548              <entry>Value</entry>
549              <entry>Permission</entry>
550            </row>
551          </thead>
552          <tbody>
553            <row>
554              <entry>1</entry>
555              <entry>Read</entry>
556            </row>
557            <row>
558              <entry>3</entry>
559              <entry>Use</entry>
560            </row>
561            <row>
562              <entry>7</entry>
563              <entry>Restricted write</entry>
564            </row>
565            <row>
566              <entry>15</entry>
567              <entry>Write</entry>
568            </row>
569            <row>
570              <entry>31</entry>
571              <entry>Delete</entry>
572            </row>
573            <row>
574              <entry>47 (=32+15)</entry>
575              <entry>Set owner</entry>
576            </row>
577            <row>
578              <entry>79 (=64+15)</entry>
579              <entry>Set permissions</entry>
580            </row>
581            <row>
582              <entry>128</entry>
583              <entry>Create</entry>
584            </row>
585            <row>
586              <entry>256</entry>
587              <entry>Denied</entry>
588            </row>
589          </tbody>
590        </tgroup>
591        </informaltable>
592       
593        <para>
594          The values are constructed so that
595          <constant>READ</constant> -&gt;
596          <constant>USE</constant> -&gt;
597          <constant>RESTRICTED_WRITE</constant> -&gt;
598          <constant>WRITE</constant> -&gt;
599          <constant>DELETE</constant>
600          are chained in the sense that a higher permission always implies the lower permissions
601          also. The <constant>SET_OWNER</constant> and <constant>SET_PERMISSION</constant>
602          both implies <constant>WRITE</constant> permission. The <constant>DENIED</constant>
603          permission is only valid for role keys, and if specified it overrides all
604          other permissions.               
605        </para>
606       
607        <para>
608          When combining permission for a single item the permission codes for the different
609          paths are OR-ed together. For example a user has a role key with <constant>READ</constant>
610          permission for <constant>SAMPLES</constant>, but also an item key with <constant>USE</constant>
611          permission for a specific sample. Of course, the resulting permission for that
612          sample is <constant>USE</constant>. For other samples the resulting permission is
613          <constant>READ</constant>.
614        </para>
615       
616        <para>
617          If the user is also a member of a project which has <constant>WRITE</constant>
618          permission for the same sample, the user will have <constant>WRITE</constant>
619          permission when working with that project.
620        </para>
621       
622        <para>
623          The <constant>RESTRICTED_WRITE</constant> permission is in most cases the same
624          as the <constant>WRITE</constant> permission. So far the <constant>RESTRICTED_WRITE</constant>
625          permission is only given to users to their own <classname>UserData</classname>
626          object so they can change their address and other contact information,
627          but not quota, expiration date and other administrative information.
628        </para>
629
630      </sect3>
631    </sect2>
632
633    <sect2 id="data_api.wares">
634      <title>Hardware and software</title>
635    </sect2>
636   
637    <sect2 id="data_api.reporters">
638      <title>Reporters</title>
639    </sect2>
640
641    <sect2 id="data_api.quota">
642      <title>Quota and disk usage</title>
643    </sect2>
644
645    <sect2 id="data_api.sessions">
646      <title>Client, session and settings</title>
647    </sect2>
648
649    <sect2 id="data_api.files">
650      <title>Files and directories</title>
651
652      <para>
653        This section covers the details of the BASE file
654        system.
655      </para>
656
657      <sect3 id="data_api.files.uml">
658      <title>UML diagram</title>
659     
660        <figure id="data_api.figures.files">
661          <title>Files and directories</title>
662          <screenshot>
663            <mediaobject>
664              <imageobject>
665                <imagedata 
666                  fileref="figures/uml/datalayer.files.png" format="PNG" />
667              </imageobject>
668            </mediaobject>
669          </screenshot>
670        </figure>
671      </sect3>
672     
673      <sect3 id="data_api.files.description">
674        <title>Description</title>
675       
676        <para>
677          The <classname>DirectoryData</classname> class holds
678          information about directories. Directories are organised in the
679          ususal way as as tree structure. All directories must have
680          a parent directory, except the system-defined root directory.
681        </para>
682       
683        <para>
684          The <classname>FileData</classname> class holds information about
685          a file. The actual file contents is stored on disk in the directory
686          specified by the <varname>userfiles</varname> setting in
687          <filename>base.config</filename>. The <varname>internalName</varname>
688          property is the name of the file on disk, but this is never exposed to
689          client applications. The filenames and directories
690          on the disk doesn't correspond to the the filenames and directories in
691          BASE.
692        </para>
693       
694        <para>
695          The <varname>location</varname> property can take three values:
696        </para>
697       
698        <itemizedlist>
699        <listitem>
700          <para>
701          0 = The file is offline, ie. there is no file on the disk
702          </para>
703        </listitem>
704        <listitem>
705          <para>
706          1 = The file is in primary storage, ie. it is located on the disk
707          and can be used by BASE
708          </para>
709        </listitem>
710        <listitem>
711          <para>
712          2 = The file is in secondary storage, ie. it has been moved to some
713          other place and can't be used by BASE immediately.
714          </para>
715        </listitem>
716        </itemizedlist>
717       
718        <para>
719          The <varname>action</varname> property controls how a file is
720          moved between primary and seconday storage. It can have the following
721          values:
722        </para>
723       
724        <itemizedlist>
725        <listitem>
726          <para>
727          0 = Do nothing
728          </para>
729        </listitem>
730        <listitem>
731          <para>
732          1 = If the file is in secondary storage, move it back to the primary storage
733          </para>
734        </listitem>
735        <listitem>
736          <para>
737          2 = If the file is in primary storage, move it to the secondary storage
738          </para>
739        </listitem>
740        </itemizedlist>
741       
742        <para>
743          The actual moving between primary and secondary storage is done by an
744          external program. See
745          <xref linkend="appendix.base.config.secondary" /> and
746          <xref linkend="plugin_developer.other.secondary" /> for more information.
747        </para>
748     
749        <para>
750          The <varname>md5</varname> property can be used to check for file
751          corruption when it is moved between primary and secondary storage or
752          when a user re-uploads a file that has been offline.
753        </para>
754       
755        <para>
756          BASE can store files in a compressed format. This is handled internally
757          and is not visible to client applications. The <varname>compressed</varname>
758          and <varname>diskSize</varname> properties are used to store information
759          about this. A file may always be compressed if the users says so, but
760          BASE can also do this automatically if the file is uploaded
761          to a directory with the <varname>autoCompress</varname> flag set
762          or if the file has MIME type with the <varname>autoCompress</varname>
763          flag set.
764        </para>
765       
766        <para>
767          The <classname>FileTypeData</classname> class holds information about
768          file types. It is used only to make it easier for users to organise
769          their files.
770        </para>
771       
772        <para>
773          The <classname>MimeTypeData</classname> is used to register mime types and
774          map them to file extensions. The information is only used to lookup values
775          when needed. Given the filename we can set the <varname>File.mimeType</varname>
776          and <varname>File.fileType</varname> properties. The MIME type is also
777          used to decide if a file should be stored in a compressed format or not.
778          The extension of a MIME type must be unique. Extensions should be registered
779          without a dot, ie <emphasis>html</emphasis>, not <emphasis>.html</emphasis>
780        </para>
781       
782      </sect3>
783     
784     
785    </sect2>
786   
787    <sect2 id="data_api.platforms">
788      <title>Experimental platforms</title>
789
790      <para>
791         This section gives an overview of experimental platforms
792         and how they are used to enable data storage in files
793         instead of in the database.
794      </para>
795     
796      <note>
797        <title>THIS IS A DRAFT!</title>
798        <para>
799          This document is a draft currently beeing worked on!
800          Changes are expected before the design is finalized.
801        </para>
802      </note>
803     
804      <sect3 id="data_api.platforms.uml">
805        <title>UML diagram</title>
806       
807        <figure id="data_api.figures.platforms">
808          <title>Experimental platforms</title>
809          <screenshot>
810            <mediaobject>
811              <imageobject>
812                <imagedata 
813                  fileref="figures/uml/datalayer.platforms.png" format="PNG" />
814              </imageobject>
815            </mediaobject>
816          </screenshot>
817        </figure>
818      </sect3>
819     
820      <sect3 id="data_api.platforms.platforms">
821        <title>Platforms</title>
822       
823        <para>
824          The <classname>PlatformData</classname> holds information about a
825          platform. A platform can have one or more <classname>PlatformVariant</classname>:s.
826          Both the platform and variant are identified by an external ID that
827          is fixed and can't be changed. <emphasis>Affymetrix</emphasis>
828          and <emphasis>Illumina</emphasis> are examples of platforms.
829          If the <varname>fileOnly</varname> flag is set data for the platform
830          can only be stored in files and not imported into the database. If
831          the flag is not set data can be imported into the database.
832          The <varname>rawDataType</varname> can be used to lock the platform
833          to a specific raw data type. If the value is <constant>null</constant>
834          the platform can use any raw data type.
835        </para>
836       
837        <para>
838          Each platform and it's variant can be connected to one or more
839          <classname>DataFileTypeData</classname> items. This item
840          describes the kind of files that are used to hold data for
841          the platform and/or variant. The file types are re-usable between
842          different platforms and variants. Note that a file type may be attached
843          to either only a platform or to a platform with a variant. File
844          types attached to platforms are inherited by the variants. The variants
845          can only define additional file types, not remove or redefine file types
846          that has been attached to the platform.
847        </para>
848        <para>
849          The file type is also identified
850          by a fixed, non-changable external ID. The <varname>itemType</varname>
851          property tells us what type of item the file holds data for (ie.
852          array design or raw bioassay). It also links to a <classname>FileType</classname>
853          which is the generic type of data in the file. This allows to query
854          the database for, as an example, for files with the generic type
855          <constant>FileType.RAW_DATA</constant>. If we are in an Affymetrix
856          experiment we will get the CEL file, for another platform we will
857          get another file.
858        </para>
859        <para>
860          The <varname>required</varname> flag in <classname>PlatformFileTypeData</classname>
861          is used to signal that the file is a required file. This will, however, not be
862          enforeced by the core. It is intended to be used by client applications
863          for creating a better GUI and/or validation of an experiment.
864        </para>
865
866      </sect3>
867     
868      <sect3 id="data_api.platforms.files">
869        <title>Data files</title>
870       
871        <para>
872          An item must implement the <interfacename>FileStoreEnabledData</interfacename>
873          interface to be able to store data in files instead of in the database.
874          The interface creates a link to a <classname>FileSetData</classname> object,
875          which is can hold several <classname>FileSetMemberData</clasname> items.
876          Each member points to specific <classname>FileData</classname> item.
877          A file set can only store one file of each type.
878        </para>
879       
880      </sect3>
881    </sect2>
882
883    <sect2 id="data_api.protocols">
884      <title>Protocols</title>
885    </sect2>
886
887    <sect2 id="data_api.parameters">
888      <title>Parameters</title>
889    </sect2>
890
891    <sect2 id="data_api.annotations">
892      <title>Annotations</title>
893    </sect2>
894
895    <sect2 id="data_api.plugins">
896      <title>Plug-ins, jobs and job agents</title>
897    </sect2>
898   
899    <sect2 id="data_api.biomaterials">
900      <title>Biomaterials</title>
901    </sect2>
902
903    <sect2 id="data_api.plates">
904      <title>Array LIMS - plates</title>
905    </sect2>
906
907    <sect2 id="data_api.arrays">
908      <title>Array LIMS - arrays</title>
909    </sect2>
910
911    <sect2 id="data_api.rawdata">
912      <title>Hybridizations and raw data</title>
913    </sect2>
914
915    <sect2 id="data_api.experiments">
916      <title>Experiments and analysis</title>
917    </sect2>
918   
919    <sect2 id="data_api.misc">
920      <title>Other classes</title>
921    </sect2>
922
923  </sect1>
924 
925  <sect1 id="api_overview.core_api" chunked="1">
926    <title>The Core API</title>
927   
928    <para>
929      This section gives an overview of various parts of the core API.
930    </para>
931   
932    <sect2 id="core_api.data_in_files">
933      <title>Using files to store data</title>
934      <note>
935        <title>THIS IS A DRAFT!</title>
936        <para>
937          This document is a draft currently beeing worked on!
938          Changes are expected before the design is finalized.
939        </para>
940      </note>
941     
942      <para>
943        This section is about how BASE can use files to store data instead
944        of importing it into the database. See <xref linkend="data_api.platforms" />
945        for an overview of the database schema for this feature. Files can be attached
946        to any item that implements the <interfacename>FileStoreEnabled</interfacename>
947        interface. Currently this is <classname>RawBioAssay</classname>, <classname>ArrayDesign</classname>,
948        <classname>BioAssaySet</classname> and <classname>BioAssay</classname>. The
949        ability to store data in files is not a replacement for storing data in the
950        database. It is possible (for some platforms/raw data types) to have data in
951        files and in the database at the same time. We would have liked to enforce
952        that (raw) data is always present in files, but this will not be backwards compatible
953        with older installations, so there are three cases:
954      </para>
955     
956      <itemizedlist>
957      <listitem>
958        <para>
959        Data in files only
960        </para>
961      </listitem>
962      <listitem>
963        <para>
964        Data in the database only
965        </para>
966      </listitem>
967      <listitem>
968        <para>
969        Data in both files and in the database
970        </para>
971      </listitem>
972      </itemizedlist>
973     
974      <para>
975        Not all three cases are supported for all types of data. This is controlled
976        by the <classname>Platform</classname> class, which may disallow
977        that data is stored in the database. To check this call
978        <methodname>getRawDataType()</methodname> which may return:
979      </para>
980     
981      <itemizedlist>
982      <listitem>
983        <para>
984          <constant>null</constant>: The platform can store data with any
985          raw data type in the database.
986        </para>
987      </listitem>
988      <listitem>
989        <para>
990        A <classname>RawDataType</classname> that has <code>isStoredInDb() == true</code>:
991        The platform can store data in the database but only data with the specified raw
992        data type.
993        </para>
994      </listitem>
995      <listitem>
996        <para>
997        A <classname>RawDataType</classname> that has <code>isStoredInDb() == false</code>:
998        The platform can't store data in the database.
999        </para>
1000      </listitem>
1001      </itemizedlist>
1002
1003      <para>
1004        One major modification is that the registration of raw data types
1005        has changed. The <filename>raw-data-types.xml</filename> file should
1006        only be used for raw data types that are stored in the database. The
1007        <sgmltag>storage</sgmltag> tag has been deprecated and BASE will ignore
1008        any raw data type definitions with <code>storage="file"</code>.
1009        To replace this, each <classname>Platform</classname> that
1010        can only store data in files also defines a "virtual" raw data type.
1011      </para>
1012     
1013      <sect3 id="core_api.data_in_files.diagram">
1014        <title>Diagram of classes and methods</title>
1015        <figure id="core_api.figures.data_in_files">
1016          <title>Store data in files</title>
1017          <screenshot>
1018            <mediaobject>
1019              <imageobject>
1020                <imagedata 
1021                  fileref="figures/uml/corelayer.datainfiles.png" format="PNG" />
1022              </imageobject>
1023            </mediaobject>
1024          </screenshot>
1025        </figure>
1026      </sect3>
1027     
1028      <sect3 id="core_api.data_in_files.ask">
1029        <title>Asking the user for files</title>
1030
1031        <para>
1032          A client application must know what types of files it makes sense
1033          to ask the user for. In some cases, data may be split into more than
1034          one file so we need a generic way to select files.
1035        </para>
1036       
1037        <para>
1038          Given that we have a <interfacename>FileStoreEnabled</interfacename>
1039          item we use the <methodname>DataFileType.getQuery()</methodname>
1040          method to find which file types that can be used for that
1041          item. Internally, the <methodname>getQuery()</methodname>
1042          method uses the <methodname>FileStoreEnabled.getPlatform()</methodname>
1043          and <methodname>FileStoreEnabled.getVariant()</methodname>
1044          methods to restrict the query to only return file types for
1045          a given platform and/or variant. If the item doesn't have
1046          a platform or variant the query will only return file types
1047          that are associated with the given item type, but not with any specific
1048          platform. In any case, we get a list of <classname>DataFileType</classname>
1049          items, each one representing a specific file type that
1050          we should ask the user about. Examples:
1051        </para>
1052
1053        <orderedlist>
1054        <listitem>
1055          <para>
1056          The <constant>Affymetrix</constant> platform defines <constant>CEL</constant>
1057          for <constant>FileType.RAW_DATA</constant>
1058          and <constant>CDF</constant> for <constant>FileType.REPORTER_MAP</constant>.
1059          respectively. If we have a
1060          <classname>RawBioAssay</classname> the query will only return
1061          the CEL file type and the client can ask the user for a CEL file.
1062          </para>
1063        </listitem>
1064        <listitem>
1065          <para>
1066          More examples.... ???
1067          </para>
1068        </listitem>
1069        </orderedlist>
1070     
1071        <para>
1072          Here is a simple code template that might be useful.
1073        </para>
1074       
1075        <programlisting>
1076DbControl dc = ...
1077FileStoreEnabled item = ...
1078ItemQuery&lt;DataFileType&gt; query =
1079   DataFileType.getQuery(item);
1080List&lt;DataFileType&gt; types = query.list(dc);
1081// We now have a list of file types...
1082// ... ask the user to select a file for each one of them
1083</programlisting>
1084     
1085      </sect3>
1086     
1087      <sect3 id="core_api.data_in_files.link">
1088        <title>Link to the selected files</title>
1089        <para>
1090          When the user has selected the file(s) we must store the links
1091          to them in the database. This is done with a <classname>FileSet</classname>.
1092          object. A file set can contain any number of files. The only limitation
1093          is that it can only contain one file for each file type.
1094          Call <methodname>FileSet.setMember()</methodname> to store
1095          a file in the set. If a file already exists for the given file type
1096          it is replaced, otherwise a new entry is created.
1097        </para>
1098      </sect3>
1099     
1100      <sect3 id="core_api.data_in_files.validate">
1101        <title>Validate the file and extract metadata</title>
1102       
1103        <para>
1104          Validation and extraction of metadata is important since we want
1105          data in files to be equivalent to data in the database. The validation
1106          and metadata extraction is automatically done by the core when a
1107          file is added to a file set. The process is partly pluggable
1108          since each <classname>DataFileType</classname> can name a class
1109          that should do the validation and/or metadata extraction.
1110          Here is the general outline:
1111        </para>
1112       
1113        <programlisting>
1114FileStoreEnabled item = ...
1115DataFileType type = ...
1116File file = ...
1117FileSetMember member = new FileSetMember(file, type);
1118
1119DataFileValidator validator = type.getValidator();
1120DataFileMetadataReader metadata = type.getMetadataReader();
1121validator.setFile(member);
1122validator.setItem(item);
1123// Repeat for 'metadata' if not same as 'validator'
1124
1125validator.validate();
1126metadata.extractMetadata();
1127</programlisting>
1128       
1129        <note>
1130          <title>Only one instance of each validator class is created</title>
1131          <para>
1132          The validation/metadata extraction is not done until all files have been
1133          added to the fileset. If the same validator/meta data extractor is
1134          used for more than one file, the same instance is reused. Ie.
1135          the <methodname>setFile()</methodname> is called one time
1136          for each file/file type pair. The <methodname>validate()</methodname>
1137          and <methodname>extractMetadata()</methodname> methods are only
1138          called once.
1139          </para>
1140        </note>
1141       
1142        <para>
1143          All validators and meta data extractors should extend
1144          the <classname>AbstractDataFileHandler</classname> class. The reason
1145          is that we may want to add more methods to the <interfacename>DataFileHandler</interfacename>
1146          interface in the future. The <classname>AbstractDataFileHandler</classname> will
1147          be used to provide default implementations for backwards compatibility.
1148        </para>
1149       
1150      </sect3>
1151     
1152      <sect3 id="core_api.data_in_files.import">
1153        <title>Import data into the database</title>
1154       
1155        <para>
1156          This should be done by existing plug-ins in the same way as before.
1157          A slight modification is needed since it is good if the importers
1158          are made aware of already selected files in the <classname>FileSet</classname>
1159          to provide good default values. Something like this.
1160        </para>
1161       
1162        <programlisting>
1163File defaultFile = null;
1164RawBioAssay rba = ...;
1165if (rba.hasFileSet())
1166{
1167   FileSet fileSet = rba.getFileSet();
1168   List&lt;FileSetMember&gt; members =
1169      fileSet.getMembers(FileType.RAW_DATA);
1170   if (members.size() &gt; 0)
1171   {
1172      defaultFile = members.get(0).getFile();
1173   }
1174}       
1175</programlisting>
1176      </sect3>
1177     
1178      <sect3 id="core_api.data_in_files.experiments">
1179        <title>Using raw data from files in an experiment</title>
1180       
1181        <para>
1182          Just as before, an experiment is still locked to a single
1183          <classname>RawDataType</classname>. This is a design issue that
1184          would break too many things if changed. If data is stored in files
1185          the experiment is also locked to a single <classname>Platform</classname>.
1186          This has been designed to have as little impact on existing
1187          plug-ins as possible. In most cases, the plug-ins will continue
1188          to work as before.
1189        </para>
1190       
1191        <para>
1192          A plug-in (using data from the database that needs to check if it can
1193          be used within an experiment can still do:
1194        </para>
1195       
1196        <programlisting>
1197Experiment e = ...
1198RawDataType rdt = e.getRawDataType();
1199if (rdt.isStoredInDb())
1200{
1201   // Check number of channels, etc...
1202   // ... run plug-in code ...
1203}
1204</programlisting>
1205       
1206        <para>
1207          A newer plug-in which uses data from files should do:
1208        </para>
1209       
1210        <programlisting>
1211Experiment e = ...
1212RawDataType rdt = e.getRawDataType();
1213if (!rdt.isStoredInDb())
1214{
1215   Platform p = rdt.getPlatform();
1216   PlatformVariant v = rdt.getVariant();
1217   // Check that platform/variant is supported
1218   // ... run plug-in code ...
1219}
1220</programlisting>
1221       
1222      </sect3>
1223     
1224    </sect2>
1225  </sect1>
1226
1227  <sect1 id="api_overview.query_api">
1228    <title>The Query API</title>
1229    <para>
1230      This documentation is only available in the old format.
1231      See <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/query/index.html"
1232        >http://base.thep.lu.se/chrome/site/doc/development/overview/query/index.html</ulink>
1233    </para>
1234   
1235  </sect1>
1236 
1237  <sect1 id="api_overview.dynamic_and_batch_api">
1238    <title>Analysis and the Dynamic and Batch API:s</title>
1239    <para>
1240      This documentation is only available in the old format.
1241      See <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/dynamic/index.html"
1242        >http://base.thep.lu.se/chrome/site/doc/development/overview/dynamic/index.html</ulink>
1243    </para>
1244  </sect1>
1245
1246  <sect1 id="api_overview.other_api">
1247    <title>Other useful classes and methods</title>
1248    <para>
1249      TODO
1250    </para>
1251  </sect1>
1252 
1253</chapter>
Note: See TracBrowser for help on using the repository browser.