source: trunk/doc/src/docbook/developerdoc/api_overview.xml @ 3762

Last change on this file since 3762 was 3762, checked in by Nicklas Nordborg, 14 years ago

References #554, #746 and fixes #411. File and directories diagram transfered to new MagicDraw? and
docbook. Added documentation about compression support.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 41.9 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE chapter PUBLIC
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
5<!--
6  $Id: api_overview.xml 3762 2007-09-21 07:18:07Z nicklas $
7
8  Copyright (C) 2007 Peter Johansson, Nicklas Nordborg, Martin Svensson
9
10  This file is part of BASE - BioArray Software Environment.
11  Available at http://base.thep.lu.se/
12
13  BASE is free software; you can redistribute it and/or
14  modify it under the terms of the GNU General Public License
15  as published by the Free Software Foundation; either version 2
16  of the License, or (at your option) any later version.
17
18  BASE is distributed in the hope that it will be useful,
19  but WITHOUT ANY WARRANTY; without even the implied warranty of
20  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
21  GNU General Public License for more details.
22
23  You should have received a copy of the GNU General Public License
24  along with this program; if not, write to the Free Software
25  Foundation, Inc., 59 Temple Place - Suite 330,
26  Boston, MA  02111-1307, USA.
27-->
28
29<chapter id="api_overview">
30  <?dbhtml dir="api"?>
31  <title>API overview (how to use and code examples)</title>
32
33  <sect1 id="api_overview.public_api">
34    <title>The Public API of BASE</title>
35   
36    <para>
37      Not all public classes and methods in the <filename>BASE2Core.jar</filename>
38      and other JAR files shipped with BASE are considered as
39      <emphasis>Public API</emphasis>. This is important knowledge
40      since we will always try to maintain backwards compatibility
41      for classes that are part of the public API. For other
42      classes, changes may be instroduced at any time without
43      notice or specific documentation. In other words:
44    </para>
45   
46    <note>
47      <title>Only use the public API when developing plug-ins</title>
48      <para>
49        This will maximize the chance that you plug-in will continue
50        to work with the next BASE release. If you use the non-public API
51        you do so at your own risk.
52      </para>
53    </note>
54   
55    <para>
56      See the <ulink url="http://base.thep.lu.se/chrome/site/doc/api/index.html"
57        >javadoc</ulink> for information about
58      what parts of the API that contributes to the public API.
59      Methods, classes and other elements that have been tagged as
60      <code>@deprecated</code> should be considered as part of the internal API
61      and may be removed in a subsequent relase without warning.
62    </para>
63   
64    <para>
65      See <xref linkend="appendix.incompatible" /> to read more about
66      changes that have been introduced by each release.
67    </para>
68
69    <sect2 id="api_overview.compatibility">
70      <title>What is backwards compatibility?</title>
71     
72      <para>
73        There is a great article about this subject on <ulink 
74        url="http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs"
75          >http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs</ulink>.
76        This is what we will try to comply with. If you do not want to
77        read the entire article, here are some of the most important points:
78      </para>
79     
80     
81      <sect3 id="api_overview.compatibility.binary">
82        <title>Binary compatibility</title>
83        <para>
84        <blockquote>
85          Pre-existing Client binaries must link and run with new releases of the
86          Component without recompiling.
87        </blockquote>
88       
89        For example:
90        <itemizedlist>
91        <listitem>
92          <para>
93            We cannot change the number or types of parameters to a method
94            or constructor.
95          </para>
96        </listitem>
97        <listitem>
98          <para>
99            We cannot add or change methods to interfaces that are intended
100            to be implemented by plug-in or client code.
101          </para>
102        </listitem>
103        </itemizedlist>
104        </para>       
105      </sect3>
106     
107      <sect3 id="api_overview.compatibility.contract">
108        <title>Contract compatibility</title>
109        <para>
110          <blockquote>
111          API changes must not invalidate formerly legal Client code.
112          </blockquote>
113       
114          For example:
115          <itemizedlist>
116          <listitem>
117            <para>
118              We cannot change the implementation of a method to do
119              things differently than before. For example, allow <constant>null</constant>
120              as a return value when it was not allowed before.
121            </para>
122          </listitem>
123          </itemizedlist>
124       
125          <note>
126            <para>
127            Sometimes there is a very fine line between what is considered a
128            bug and what is considered a feature. For example, if the
129            actual implementation does not do what the javadoc says,
130            do we change the code or do we change the documentation?
131            This has to be considered from case to case and depends on
132            the age of the code and if we expect plug-ins and clients to be
133            affected by it or not.
134            </para>
135          </note>
136        </para>
137      </sect3>
138     
139      <sect3 id="api_overview.compatibility.source">
140        <title>Source code compatibility</title>
141        <para>
142        This is not an important matter and is not always possible to
143        achieve. In most cases, the problems are easy to fix.
144        Example:
145       
146        <itemizedlist>
147        <listitem>
148          <para>
149          Adding a class may break a plug-in or client that import
150          classes with <constant>.*</constant> if the same class name
151          exists in another package.
152          </para>
153        </listitem>
154        </itemizedlist>
155        </para>
156      </sect3>
157    </sect2>
158  </sect1>
159
160  <sect1 id="api_overview.data_api" chunked="1">
161    <title>The database schema and the Data Layer API</title>
162
163    <para>
164      This section gives an overview of the entire data layer API.
165      The figure below show how different modules relate to each other.
166    </para>
167   
168    <note>
169      All information has not yet been transfered from the old documentation.
170      The old documentation is available at
171      <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/data/index.html"
172        >http://base.thep.lu.se/chrome/site/doc/development/overview/data/index.html</ulink>
173    </note>
174   
175    <figure id="data_api.figures.overview">
176      <title>Data layer overview</title>
177      <screenshot>
178        <mediaobject>
179          <imageobject>
180            <imagedata 
181              fileref="figures/uml/datalayer.overview.png" format="PNG" />
182          </imageobject>
183        </mediaobject>
184      </screenshot>
185    </figure>
186
187    <sect2 id="data_api.basic">
188      <title>Basic classes and interfaces</title>
189     
190      <para>
191        This document contains information about the basic classes and interfaces in this package.
192        They are important since all data-layer classes must inherit from one of the already
193        existing abstract base classes or implement one or more of the
194        existing interfaces. They contain code that is common to all classes,
195        for example implementations of the <methodname>equals()</methodname>
196        and <methodname>hashCode()</methodname> methods or how to link with the owner of an
197        item.
198      </para>
199     
200      <sect3 id="data_api.basic.uml">
201        <title>UML diagram</title>
202       
203        <figure id="data_api.figures.basic">
204          <title>Basic classes and interfaces</title>
205          <screenshot>
206            <mediaobject>
207              <imageobject>
208                <imagedata 
209                  fileref="figures/uml/datalayer.basic.png" format="PNG" />
210              </imageobject>
211            </mediaobject>
212          </screenshot>
213        </figure>
214      </sect3>
215     
216      <sect3 id="data_api.basic.classes">
217        <title>Classes</title>
218       
219        <variablelist>
220        <varlistentry>
221          <term><classname>BasicData</classname></term>
222          <listitem>
223            <para>
224            The root class. It overrides the <methodname>equals()</methodname>,
225            <methodname>hashCode()</methodname> and <methodname>toString()</methodname> methods
226            from the <classname>Object</classname> class. It also defines the
227            <varname>id</varname> and <varname>version</varname> properties.
228            All data layer classes must inherit from this class or one of it's subclasses.
229            </para>
230          </listitem>
231        </varlistentry>
232       
233        <varlistentry>
234          <term><classname>OwnedData</classname></term>
235          <listitem>
236            <para>
237            Extends the <classname>BasicData</classname> class and adds
238            an <varname>owner</varname> property. The owner is a required link to a
239            <classname>UserData</classname> object, representing the user that
240            is the owner of the item.
241            </para>
242          </listitem>
243        </varlistentry>
244
245        <varlistentry>
246          <term><classname>SharedData</classname></term>
247          <listitem>
248            <para>
249            Extends the <classname>OwnedData</classname> class and adds
250            properties (<varname>itemKey</varname> and <varname>projectKey</varname>)
251            that holds access permission information for an item.
252            Access permissions are held in <classname>ItemKeyData</classname> and/or
253            <classname>ProjectKeyData</classname> objects. These objects only exists if
254            the item has been shared.
255            </para>
256          </listitem>
257        </varlistentry>
258
259        <varlistentry>
260          <term><classname>CommonData</classname></term>
261          <listitem>
262            <para>
263            This is a convenience class for items that extends the <classname>SharedData</classname>
264            class and implements the <interfacename>NameableData</interfacename> and
265            <interfacename>RemoveableData</interfacename> interfaces. This is one of
266            the most common situations.
267            </para>
268          </listitem>
269        </varlistentry>
270
271        <varlistentry>
272          <term><classname>AnnotatedData</classname></term>
273          <listitem>
274            <para>
275            This is a convenience class for items that can be annotated.
276            Annotations are held in <classname>AnnotationSetData</classname> objects.
277            The annotation set only exists if annotations has been created for the item.
278            </para>
279          </listitem>
280        </varlistentry>
281        </variablelist>
282       
283      </sect3>
284     
285      <sect3 id="data_api.basic.interfaces">
286        <title>Interfaces</title>
287       
288        <variablelist>
289        <varlistentry>
290          <term><classname>IdentifiableData</classname></term>
291          <listitem>
292            <para>
293            All items are identifiable, which means that they have a unique <varname>id</varname>.
294            The id is unique for all items of a specific type (ie. class). The id is number
295            that is automatically generated by the database and has no other meaning
296            outside of the application. The <varname>version</varname> property is used for
297            detecting and preventing concurrent modifications to an item.
298            </para>
299          </listitem>
300        </varlistentry>
301       
302        <varlistentry>
303          <term><classname>OwnableData</classname></term>
304          <listitem>
305            <para>
306            An ownable item is an item which has an owner. The owner is represented as a
307            required link to a <classname>UserData</classname> object.
308            </para>
309          </listitem>
310        </varlistentry>       
311
312        <varlistentry>
313          <term><classname>ShareableData</classname></term>
314          <listitem>
315            <para>
316            A shareable item is an item which can be shared to other users, groups or projects.
317            Access permissions are held in <classname>ItemKeyData</classname> and/or
318            <classname>ProjectKeyData</classname> objects.
319            </para>
320          </listitem>
321        </varlistentry>
322             
323        <varlistentry>
324          <term><classname>NameableData</classname></term>
325          <listitem>
326            <para>
327            A nameable item is an item that has a name (required) and a description
328            (optional). The name doesn't have to be unique, except in a few special
329            cases (for example, the name of a file).
330            </para>
331          </listitem>
332        </varlistentry>
333       
334        <varlistentry>
335          <term><classname>RemovableData</classname></term>
336          <listitem>
337            <para>
338            A removable item is an item that can be flagged as removed. This doesn't
339            remove the information about the item from the database, but can be used by
340            client applications to hide items that the user is not interested in.
341            A trashcan function can be used to either restore or permanently
342            remove items that has the flag set.
343            </para>
344          </listitem>
345        </varlistentry>
346               
347        <varlistentry>
348          <term><classname>SystemData</classname></term>
349          <listitem>
350            <para>
351            A system item is an item which has an additional id in the form of string. A system id
352            is required when we need to make sure that we can get a specific item without
353            knowing the numeric id. Example of such items are the root user and the everyone group.
354            A system id is generally constructed like:
355            <constant>net.sf.basedb.core.User.ROOT</constant>. The system id:s are defined in the
356            core layer by each item class.
357            </para>
358          </listitem>
359        </varlistentry>
360
361        <varlistentry>
362          <term><classname>DiskConsumableData</classname></term>
363          <listitem>
364            <para>
365            This interface is used by items which occupies a lot of disk space and
366            should be part of the quota system, for example files. The required
367            <classname>DiskUsageData</classname> contains information about the size,
368            location, owner etc. of the item.
369            </para>
370          </listitem>
371        </varlistentry>
372       
373        <varlistentry>
374          <term><classname>AnnotatableData</classname></term>
375          <listitem>
376            <para>
377            This interface is used by items which can be annotated. Annotations are name/value
378            pairs that are attached as extra information to an item. All annotations are
379            contained in an <classname>AnnotationSetData</classname> object.
380            </para>
381          </listitem>
382        </varlistentry>
383       
384        <varlistentry>
385          <term><classname>ExtendableData</classname></term>
386          <listitem>
387            <para>
388            This interface is used by items which can have extra administrator-defined
389            columns. The functionality is similar to annotations. It is not as flexible,
390            since it is a global configuration, but has better performance. BASE will
391            generate extra database columns to store the data in the tables for items that
392            can be extended.
393            </para>
394          </listitem>
395        </varlistentry>
396       
397        <varlistentry>
398          <term><classname>BatchableData</classname></term>
399          <listitem>
400            <para>
401            This interface is a tagging interface which is used by items that needs batch
402            functionality in the core.
403            </para>
404          </listitem>
405        </varlistentry>
406        </variablelist>
407
408      </sect3>
409    </sect2>
410   
411    <sect2 id="data_api.authentication">
412      <title>User authentication and access control</title>
413     
414      <para>
415         This section gives an overview of user authentication and
416         how groups, roles and projects are used for access control
417         to items.
418      </para>
419     
420      <sect3 id="data_api.authentication.uml">
421        <title>UML diagram</title>
422       
423        <figure id="data_api.figures.authentication">
424          <title>User authentication and access control</title>
425          <screenshot>
426            <mediaobject>
427              <imageobject>
428                <imagedata 
429                  fileref="figures/uml/datalayer.authentication.png" format="PNG" />
430              </imageobject>
431            </mediaobject>
432          </screenshot>
433        </figure>
434      </sect3>
435     
436      <sect3 id="data_api.authentication.users">
437        <title>Users and passwords</title>     
438     
439        <para>
440          The <classname>UserData</classname> class holds information about users.
441          We keep the passwords in a separate table and use proxies to avoid loading
442          password data each time a user is loaded to minimize security risks. It is
443          only if the password needs to be changed that the <classname>PasswordData</classname>
444          object is loaded. The one-to-one mapping between user and password is controlled
445          by the password class, but a cascade attribute on the user class makes sure
446          that the password is deleted when a user is deleted.
447        </para>
448      </sect3>
449
450      <sect3 id="data_api.authentication.groups">
451        <title>Groups, roles and projects</title>     
452     
453        <para>
454          The <classname>GroupData</classname>, <classname>RoleData</classname> and
455          <classname>ProjectData</classname> classes holds information about groups, roles
456          and projects respectively. A user may be a member of any number of groups,
457          roles and/or projects. The membership in a project comes with an attached
458          permission values. This is the highest permission the user has in the
459          project. No matter what permission an item has been shared with the
460          user will not get higher permission. Groups may be members of other groups and
461          also in projects.
462        </para>
463       
464      </sect3>
465     
466      <sect3 id="data_api.authentication.keys">
467        <title>Keys</title>     
468     
469        <para>
470          The <classname>KeyData</classname> class and it's subclasses
471          <classname>ItemKeyData</classname>, <classname>ProjectKeyData</classname> and
472          <classname>RoleKeyData</classname>, are used to store information about access
473          permissions to items. To get permission to manipulate an item a user must have
474          access to a key giving that permission. There are three types of keys:
475        </para>
476       
477        <variablelist>
478        <varlistentry>
479          <term><classname>ItemKey</classname></term>
480          <listitem>
481            <para>
482            Is used to give a user or group access to a specific item. The item
483            must be a <interfacename>ShareableData</interfacename> item.
484            The permissions are usually set be the owner of the item. Once created an
485            item key cannot be changed. This allows the core to reuse a key if the
486            permissions match exactly, ie. for a given set of users/groups/permissions
487            there can be only one item key object.
488            </para>
489          </listitem>
490        </varlistentry>
491
492        <varlistentry>
493          <term><classname>ProjectKey</classname></term>
494          <listitem>
495            <para>
496            Is used to give members of a project access to a specific item. The item
497            must be a <interfacename>ShareableData</interfacename> item. Once created a
498            project key cannot be changed. This allows the core to reuse a key if the
499            permissions match exactly, ie. for a given set of projects/permissions
500            there can be only one project key object.
501            </para>
502          </listitem>
503        </varlistentry>
504
505        <varlistentry>
506          <term><classname>RoleKey</classname></term>
507          <listitem>
508            <para>
509            Is used to give a user access to all items of a specific type, ie.
510            <constant>READ</constant> all <constant>SAMPLES</constant>. The installation
511            will make sure that there already exists a role key for each type of item, and
512            it is not possible to add new or delete existing keys. Unlike the other two types
513            this key can be modified.
514            </para>
515           
516            <para>
517            A role key is also used to assign permissions to plug-ins. If a plug-in has
518            been specified to use permissions the default is to deny everything.
519            The mapping to the role key is used to grant permissions to the plugin.
520            The <varname>granted</varname> value gives the plugin access to all items
521            of the related item type regardless of if the user that is running the plug-in has the
522            permission or not. The <varname>denied</varname> values denies access to all
523            items of the related item type even if the logged in user has the permission.
524            Permissions that are not granted nor denied are checked against the
525            logged in users regular permissions. Permissions to items that are
526            not linked are always denied.
527            </para>
528          </listitem>
529        </varlistentry>
530        </variablelist>
531       
532      </sect3>
533
534      <sect3 id="data_api.authentication.permissions">
535        <title>Permissions</title>
536       
537        <para>
538          The <varname>permission</varname> property appearing in many classes is an
539          integer values describing the permission:
540        </para>
541       
542        <informaltable>
543        <tgroup cols="2">
544          <colspec colname="value" />
545          <colspec colname="permission" />
546          <thead>
547            <row>
548              <entry>Value</entry>
549              <entry>Permission</entry>
550            </row>
551          </thead>
552          <tbody>
553            <row>
554              <entry>1</entry>
555              <entry>Read</entry>
556            </row>
557            <row>
558              <entry>3</entry>
559              <entry>Use</entry>
560            </row>
561            <row>
562              <entry>7</entry>
563              <entry>Restricted write</entry>
564            </row>
565            <row>
566              <entry>15</entry>
567              <entry>Write</entry>
568            </row>
569            <row>
570              <entry>31</entry>
571              <entry>Delete</entry>
572            </row>
573            <row>
574              <entry>47 (=32+15)</entry>
575              <entry>Set owner</entry>
576            </row>
577            <row>
578              <entry>79 (=64+15)</entry>
579              <entry>Set permissions</entry>
580            </row>
581            <row>
582              <entry>128</entry>
583              <entry>Create</entry>
584            </row>
585            <row>
586              <entry>256</entry>
587              <entry>Denied</entry>
588            </row>
589          </tbody>
590        </tgroup>
591        </informaltable>
592       
593        <para>
594          The values are constructed so that
595          <constant>READ</constant> -&gt;
596          <constant>USE</constant> -&gt;
597          <constant>RESTRICTED_WRITE</constant> -&gt;
598          <constant>WRITE</constant> -&gt;
599          <constant>DELETE</constant>
600          are chained in the sense that a higher permission always implies the lower permissions
601          also. The <constant>SET_OWNER</constant> and <constant>SET_PERMISSION</constant>
602          both implies <constant>WRITE</constant> permission. The <constant>DENIED</constant>
603          permission is only valid for role keys, and if specified it overrides all
604          other permissions.               
605        </para>
606       
607        <para>
608          When combining permission for a single item the permission codes for the different
609          paths are OR-ed together. For example a user has a role key with <constant>READ</constant>
610          permission for <constant>SAMPLES</constant>, but also an item key with <constant>USE</constant>
611          permission for a specific sample. Of course, the resulting permission for that
612          sample is <constant>USE</constant>. For other samples the resulting permission is
613          <constant>READ</constant>.
614        </para>
615       
616        <para>
617          If the user is also a member of a project which has <constant>WRITE</constant>
618          permission for the same sample, the user will have <constant>WRITE</constant>
619          permission when working with that project.
620        </para>
621       
622        <para>
623          The <constant>RESTRICTED_WRITE</constant> permission is in most cases the same
624          as the <constant>WRITE</constant> permission. So far the <constant>RESTRICTED_WRITE</constant>
625          permission is only given to users to their own <classname>UserData</classname>
626          object so they can change their address and other contact information,
627          but not quota, expiration date and other administrative information.
628        </para>
629
630      </sect3>
631    </sect2>
632
633    <sect2 id="data_api.wares">
634      <title>Hardware and software</title>
635    </sect2>
636   
637    <sect2 id="data_api.reporters">
638      <title>Reporters</title>
639    </sect2>
640
641    <sect2 id="data_api.quota">
642      <title>Quota and disk usage</title>
643    </sect2>
644
645    <sect2 id="data_api.sessions">
646      <title>Client, session and settings</title>
647    </sect2>
648
649    <sect2 id="data_api.files">
650      <title>Files and directories</title>
651
652      <para>
653        This section covers the details of the BASE file
654        system.
655      </para>
656
657      <sect3 id="data_api.files.uml">
658      <title>UML diagram</title>
659     
660        <figure id="data_api.figures.files">
661          <title>Files and directories</title>
662          <screenshot>
663            <mediaobject>
664              <imageobject>
665                <imagedata 
666                  fileref="figures/uml/datalayer.files.png" format="PNG" />
667              </imageobject>
668            </mediaobject>
669          </screenshot>
670        </figure>
671      </sect3>
672     
673      <sect3 id="data_api.files.description">
674        <title>Description</title>
675       
676        <para>
677          The <classname>DirectoryData</classname> class holds
678          information about directories. Directories are organised in the
679          ususal way as as tree structure. All directories must have
680          a parent directory, except the system-defined root directory.
681        </para>
682       
683        <para>
684          The <classname>FileData</classname> class holds information about
685          a file. The actual file contents is stored on disk in the directory
686          specified by the <varname>userfiles</varname> setting in
687          <filename>base.config</filename>. The <varname>internalName</varname>
688          property is the name of the file on disk, but this is never exposed to
689          client applications. The filenames and directories
690          on the disk doesn't correspond to the the filenames and directories in
691          BASE.
692        </para>
693       
694        <para>
695          The <varname>location</varname> property can take three values:
696        </para>
697       
698        <itemizedlist>
699        <listitem>
700          <para>
701          0 = The file is offline, ie. there is no file on the disk
702          </para>
703        </listitem>
704        <listitem>
705          <para>
706          1 = The file is in primary storage, ie. it is located on the disk
707          and can be used by BASE
708          </para>
709        </listitem>
710        <listitem>
711          <para>
712          2 = The file is in secondary storage, ie. it has been moved to some
713          other place and can't be used by BASE immediately.
714          </para>
715        </listitem>
716        </itemizedlist>
717       
718        <para>
719          The <varname>action</varname> property controls how a file is
720          moved between primary and seconday storage. It can have the following
721          values:
722        </para>
723       
724        <itemizedlist>
725        <listitem>
726          <para>
727          0 = Do nothing
728          </para>
729        </listitem>
730        <listitem>
731          <para>
732          1 = If the file is in secondary storage, move it back to the primary storage
733          </para>
734        </listitem>
735        <listitem>
736          <para>
737          2 = If the file is in primary storage, move it to the secondary storage
738          </para>
739        </listitem>
740        </itemizedlist>
741       
742        <para>
743          The actual moving between primary and secondary storage is done by an
744          external program. See
745          <xref linkend="appendix.base.config.secondary" /> and
746          <xref linkend="plugin_developer.other.secondary" /> for more information.
747        </para>
748     
749        <para>
750          The <varname>md5</varname> property can be used to check for file
751          corruption when it is moved between primary and secondary storage or
752          when a user re-uploads a file that has been offline.
753        </para>
754       
755        <para>
756          BASE can store files in a compressed format. This is handled internally
757          and is not visible to client applications. The <varname>compressed</varname>
758          and <varname>diskSize</varname> properties are used to store information
759          about this. A file may always be compressed if the users says so, but
760          BASE can also do this automatically if the file is uploaded
761          to a directory with the <varname>autoCompress</varname> flag set
762          or if the file has MIME type with the <varname>autoCompress</varname>
763          flag set.
764        </para>
765       
766        <para>
767          The <classname>FileTypeData</classname> class holds information about
768          file types. It is used only to make it easier for users to organise
769          their files.
770        </para>
771       
772        <para>
773          The <classname>MimeTypeData</classname> is used to register mime types and
774          map them to file extensions. The information is only used to lookup values
775          when needed. Given the filename we can set the <varname>File.mimeType</varname>
776          and <varname>File.fileType</varname> properties. The MIME type is also
777          used to decide if a file should be stored in a compressed format or not.
778          The extension of a MIME type must be unique. Extensions should be registered
779          without a dot, ie <emphasis>html</emphasis>, not <emphasis>.html</emphasis>
780        </para>
781       
782      </sect3>
783     
784     
785    </sect2>
786   
787    <sect2 id="data_api.platforms">
788      <title>Experimental platforms</title>
789
790      <para>
791         This section gives an overview of experimental platforms
792         and how they are used to enable data storage in files
793         instead of in the database.
794      </para>
795     
796      <note>
797        <title>THIS IS A DRAFT!</title>
798        <para>
799          This document is a draft currently beeing worked on!
800          Changes are expected before the design is finalized.
801        </para>
802      </note>
803     
804      <sect3 id="data_api.platforms.uml">
805        <title>UML diagram</title>
806       
807        <figure id="data_api.figures.platforms">
808          <title>Experimental platforms</title>
809          <screenshot>
810            <mediaobject>
811              <imageobject>
812                <imagedata 
813                  fileref="figures/uml/datalayer.platforms.png" format="PNG" />
814              </imageobject>
815            </mediaobject>
816          </screenshot>
817        </figure>
818      </sect3>
819     
820      <sect3 id="data_api.platforms.platforms">
821        <title>Platforms</title>
822       
823        <para>
824          The <classname>PlatformData</classname> holds information about a
825          platform. A platform can have one or more <classname>PlatformVariant</classname>:s.
826          Both the platform and variant are identified by a system ID that
827          is fixed and can't be changed. <emphasis>Affymetrix</emphasis>
828          and <emphasis>Illumina</emphasis> are examples of platforms.
829          If the <varname>fileOnly</varname> flag is set data for the platform
830          can only be stored in files and not imported into the database. If
831          the flag is not set data can be imported into the database.
832          The <varname>rawDataType</varname> can be used to lock the platform
833          to a specific raw data type. If the value is <constant>null</constant>
834          the platform can use any raw data type.
835        </para>
836       
837        <para>
838          Each platform and it's variant can be connected to one or more
839          <classname>FileSetMemberTypeData</classname> items. This item
840          describes the kind of files that are used to hold data for
841          the platform and/or variant. The file types are re-usable between
842          different platforms and variants. Note that a file type may be attached
843          to either only a platform or to a platform with a variant. File
844          types attached to platforms are inherited by the variants. The variants
845          can only define additional file types, not remove or redefine file types
846          that has been attached to the platform.
847        </para>
848        <para>
849          The file type is also identified
850          by a fixed, non-changable system ID. The <varname>itemType</varname>
851          property tells us what type of item the file holds data for (ie.
852          array design or raw bioassay). It also links to a <classname>FileType</classname>
853          which is the generic type of data in the file. This allows to query
854          the database for, as an example, for files with the generic type
855          <constant>FileType.RAW_DATA</constant>. If we are in an Affymetrix
856          experiment we will get the CEL file, for another platform we will
857          get another file.
858        </para>
859
860      </sect3>
861     
862      <sect3 id="data_api.platforms.files">
863        <title>Files</title>
864       
865        <para>
866          An item must implement the <interfacename>FileStoreEnabledData</interfacename>
867          interface to be able to store data in files instead of in the database.
868          The interface creates a link to a <classname>FileSetData</classname> object.
869          In a file set it is only possible to store one file for each
870          <classname>FileSetMemberTypeData</classname> item.
871        </para>
872       
873      </sect3>
874    </sect2>
875
876    <sect2 id="data_api.protocols">
877      <title>Protocols</title>
878    </sect2>
879
880    <sect2 id="data_api.parameters">
881      <title>Parameters</title>
882    </sect2>
883
884    <sect2 id="data_api.annotations">
885      <title>Annotations</title>
886    </sect2>
887
888    <sect2 id="data_api.plugins">
889      <title>Plug-ins, jobs and job agents</title>
890    </sect2>
891   
892    <sect2 id="data_api.biomaterials">
893      <title>Biomaterials</title>
894    </sect2>
895
896    <sect2 id="data_api.plates">
897      <title>Array LIMS - plates</title>
898    </sect2>
899
900    <sect2 id="data_api.arrays">
901      <title>Array LIMS - arrays</title>
902    </sect2>
903
904    <sect2 id="data_api.rawdata">
905      <title>Hybridizations and raw data</title>
906    </sect2>
907
908    <sect2 id="data_api.experiments">
909      <title>Experiments and analysis</title>
910    </sect2>
911   
912    <sect2 id="data_api.misc">
913      <title>Other classes</title>
914    </sect2>
915
916  </sect1>
917 
918  <sect1 id="api_overview.core_api" chunked="1">
919    <title>The Core API</title>
920   
921    <para>
922      This section gives an overview of various parts of the core API.
923    </para>
924   
925    <sect2 id="core_api.data_in_files">
926      <title>Using files to store data</title>
927      <note>
928        <title>THIS IS A DRAFT!</title>
929        <para>
930          This document is a draft currently beeing worked on!
931          Changes are expected before the design is finalized.
932        </para>
933      </note>
934     
935      <para>
936        This section is about how BASE can use files to store data instead
937        of importing it into the database. See <xref linkend="data_api.platforms" />
938        for an overview of the database schema for this feature. Files can be attached
939        to any item that implements the <interfacename>FileStoreEnabled</interfacename>
940        interface. Currently this is <classname>RawBioAssay</classname>, <classname>ArrayDesign</classname>,
941        <classname>BioAssaySet</classname> and <classname>BioAssay</classname>. The
942        ability to store data in files is not a replacement for storing data in the
943        database. It is possible (for some platforms/raw data types) to have data in
944        files and in the database at the same time. We would have liked to enforce
945        that (raw) data is always present in files, but this will not be backwards compatible
946        with older installations, so there are three cases:
947      </para>
948     
949      <itemizedlist>
950      <listitem>
951        <para>
952        Data in files only
953        </para>
954      </listitem>
955      <listitem>
956        <para>
957        Data in the database only
958        </para>
959      </listitem>
960      <listitem>
961        <para>
962        Data in both files and in the database
963        </para>
964      </listitem>
965      </itemizedlist>
966     
967      <para>
968        Not all three cases are supported for all types of data. This is controlled
969        by the <classname>Platform</classname> class, which may disallow
970        that data is stored in the database. To check this call
971        <methodname>getRawDataType()</methodname> which may return:
972      </para>
973     
974      <itemizedlist>
975      <listitem>
976        <para>
977          <constant>null</constant>: The platform can store data with any
978          raw data type in the database.
979        </para>
980      </listitem>
981      <listitem>
982        <para>
983        A <classname>RawDataType</classname> that has <code>isStoredInDb() == true</code>:
984        The platform can store data in the database but only data with the specified raw
985        data type.
986        </para>
987      </listitem>
988      <listitem>
989        <para>
990        A <classname>RawDataType</classname> that has <code>isStoredInDb() == false</code>:
991        The platform can't store data in the database.
992        </para>
993      </listitem>
994      </itemizedlist>
995
996      <para>
997        One major modification is that the registration of raw data types
998        has changed. The <filename>raw-data-types.xml</filename> file should
999        only be used for raw data types that are stored in the database. The
1000        <sgmltag>storage</sgmltag> tag has been deprecated and BASE will ignore
1001        any raw data type definitions with <code>storage="file"</code>.
1002        To replace this, each <classname>Platform</classname> that
1003        can only store data in files also defines a "virtual" raw data type.
1004      </para>
1005     
1006      <sect3 id="core_api.data_in_files.diagram">
1007        <title>Diagram of classes and methods</title>
1008        <figure id="core_api.figures.data_in_files">
1009          <title>Store data in files</title>
1010          <screenshot>
1011            <mediaobject>
1012              <imageobject>
1013                <imagedata 
1014                  fileref="figures/uml/corelayer.datainfiles.png" format="PNG" />
1015              </imageobject>
1016            </mediaobject>
1017          </screenshot>
1018        </figure>
1019      </sect3>
1020     
1021      <sect3 id="core_api.data_in_files.ask">
1022        <title>Asking the user for files</title>
1023
1024        <para>
1025          A client application must know what types of files it makes sense
1026          to ask the user for. In some cases, data may be split into more than
1027          one file so we need a generic way to select files.
1028        </para>
1029       
1030        <para>
1031          Given that we have a <interfacename>FileStoreEnabled</interfacename>
1032          item we use the <methodname>FileSetMemberType.getQuery()</methodname>
1033          method to find which file types that can be used for that
1034          item. Internally, the <methodname>getQuery()</methodname>
1035          method uses the <methodname>FileStoreEnabled.getPlatform()</methodname>
1036          and <methodname>FileStoreEnabled.getVariant()</methodname>
1037          methods to restrict the query to only return file types for
1038          a given platform and/or variant. If the item doesn't have
1039          a platform or variant the query will only return file types
1040          that are associated with the given item type, but not with any specific
1041          platform. In any case, we get a list of <classname>FileSetMemberType</classname>
1042          items, each one representing a specific file type that
1043          we should ask the user about. Examples:
1044        </para>
1045
1046        <orderedlist>
1047        <listitem>
1048          <para>
1049          The <constant>Affymetrix</constant> platform defines <constant>CEL</constant>
1050          for <constant>FileType.RAW_DATA</constant>
1051          and <constant>CDF</constant> for <constant>FileType.REPORTER_MAP</constant>.
1052          respectively. If we have a
1053          <classname>RawBioAssay</classname> the query will only return
1054          the CEL file type and the client can ask the user for a CEL file.
1055          </para>
1056        </listitem>
1057        <listitem>
1058          <para>
1059          More examples.... ???
1060          </para>
1061        </listitem>
1062        </orderedlist>
1063     
1064        <para>
1065          Here is a simple code template that might be useful.
1066        </para>
1067       
1068        <programlisting>
1069DbControl dc = ...
1070FileStoreEnabled item = ...
1071ItemQuery&lt;FileSetMemberType&gt; query =
1072   FileSetMemberType.getQuery(item);
1073List&lt;FileSetMemberType&gt; types = query.list(dc);
1074// We now have a list of file types...
1075// ... ask the user to select a file for each one of them
1076</programlisting>
1077     
1078      </sect3>
1079     
1080      <sect3 id="core_api.data_in_files.link">
1081        <title>Link to the selected files</title>
1082        <para>
1083          When the user has selected the file(s) we must store the links
1084          to them in the database. This is done with a <classname>FileSet</classname>.
1085          object. A file set can contain any number of files. The only limitation
1086          is that it can only contain one file for each file type.
1087          Call <methodname>FileSet.setMember()</methodname> to store
1088          a file in the set. If a file already exists for the given file type
1089          it is replaced, otherwise a new entry is created.
1090        </para>
1091      </sect3>
1092     
1093      <sect3 id="core_api.data_in_files.validate">
1094        <title>Validate the file and extract metadata</title>
1095       
1096        <para>
1097          Validation and extraction of metadata is important since we want
1098          data in files to be equivalent to data in the database. The validation
1099          and metadata extraction is automatically done by the core when a
1100          file is added to a file set. The process is partly pluggable
1101          since each <classname>FileSetMemberType</classname> can name a class
1102          that should do the validation and/or metadata extraction.
1103          Here is the general outline:
1104        </para>
1105       
1106        <programlisting>
1107FileStoreEnabled item = ...
1108FileSetMemberType type = ...
1109File file = ...
1110FileSetMember member = new FileSetMember(file, type);
1111
1112FileValidator validator = type.getValidator();
1113MetadataReader metadata = type.getMetadataReader();
1114validator.setFile(member);
1115validator.setItem(item);
1116// Repeat for 'metadata' if not same as 'validator'
1117
1118validator.validate();
1119metadata.extractMetadata();
1120</programlisting>
1121       
1122        <note>
1123          <title>Only one instance of each validator class is created</title>
1124          <para>
1125          The validation/metadata extraction is not done until all files have been
1126          added to the fileset. If the same validator/meta data extractor is
1127          used for more than one file, the same instance is reused. Ie.
1128          the <methodname>setFile()</methodname> is called one time
1129          for each file/file type pair. The <methodname>validate()</methodname>
1130          and <methodname>extractMetadata()</methodname> methods are only
1131          called once.
1132          </para>
1133        </note>
1134       
1135        <para>
1136          All validators and meta data extractors should extend
1137          the <classname>AbstractFileHandler</classname> class. The reason
1138          is that we may want to add more methods to the <interfacename>FileHandler</interfacename>
1139          interface in the future. The <classname>AbstractFileHandler</classname> will
1140          be used to provide default implementations for backwards compatibility.
1141        </para>
1142       
1143      </sect3>
1144     
1145      <sect3 id="core_api.data_in_files.import">
1146        <title>Import data into the database</title>
1147       
1148        <para>
1149          This should be done by existing plug-ins in the same way as before.
1150          A slight modification is needed since it is good if the importers
1151          are made aware of already selected files in the <classname>FileSet</classname>
1152          to provide good default values. Something like this.
1153        </para>
1154       
1155        <programlisting>
1156File defaultFile = null;
1157RawBioAssay rba = ...;
1158if (rba.hasFileSet())
1159{
1160   FileSet fileSet = rba.getFileSet();
1161   List&lt;FileSetMember&gt; members =
1162      fileSet.getMembers(FileType.RAW_DATA);
1163   if (members.size() &gt; 0)
1164   {
1165      defaultFile = members.get(0).getFile();
1166   }
1167}       
1168</programlisting>
1169      </sect3>
1170     
1171      <sect3 id="core_api.data_in_files.experiments">
1172        <title>Using raw data from files in an experiment</title>
1173       
1174        <para>
1175          Just as before, an experiment is still locked to a single
1176          <classname>RawDataType</classname>. This is a design issue that
1177          would break too many things if changed. If data is stored in files
1178          the experiment is also locked to a single <classname>Platform</classname>.
1179          This has been designed to have as little impact on existing
1180          plug-ins as possible. In most cases, the plug-ins will continue
1181          to work as before.
1182        </para>
1183       
1184        <para>
1185          A plug-in (using data from the database that needs to check if it can
1186          be used within an experiment can still do:
1187        </para>
1188       
1189        <programlisting>
1190Experiment e = ...
1191RawDataType rdt = e.getRawDataType();
1192if (rdt.isStoredInDb())
1193{
1194   // Check number of channels, etc...
1195   // ... run plug-in code ...
1196}
1197</programlisting>
1198       
1199        <para>
1200          A newer plug-in which uses data from files should do:
1201        </para>
1202       
1203        <programlisting>
1204Experiment e = ...
1205RawDataType rdt = e.getRawDataType();
1206if (!rdt.isStoredInDb())
1207{
1208   Platform p = rdt.getPlatform();
1209   PlatformVariant v = rdt.getVariant();
1210   // Check that platform/variant is supported
1211   // ... run plug-in code ...
1212}
1213</programlisting>
1214       
1215      </sect3>
1216     
1217    </sect2>
1218  </sect1>
1219
1220  <sect1 id="api_overview.query_api">
1221    <title>The Query API</title>
1222    <para>
1223      This documentation is only available in the old format.
1224      See <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/query/index.html"
1225        >http://base.thep.lu.se/chrome/site/doc/development/overview/query/index.html</ulink>
1226    </para>
1227   
1228  </sect1>
1229 
1230  <sect1 id="api_overview.dynamic_and_batch_api">
1231    <title>Analysis and the Dynamic and Batch API:s</title>
1232    <para>
1233      This documentation is only available in the old format.
1234      See <ulink url="http://base.thep.lu.se/chrome/site/doc/development/overview/dynamic/index.html"
1235        >http://base.thep.lu.se/chrome/site/doc/development/overview/dynamic/index.html</ulink>
1236    </para>
1237  </sect1>
1238
1239  <sect1 id="api_overview.other_api">
1240    <title>Other useful classes and methods</title>
1241    <para>
1242      TODO
1243    </para>
1244  </sect1>
1245 
1246</chapter>
Note: See TracBrowser for help on using the repository browser.