source: trunk/doc/src/docbook/developerdoc/base_overview.xml @ 4902

Last change on this file since 4902 was 4902, checked in by Nicklas Nordborg, 14 years ago

Fixes #550: Write "Overview of BASE" section in Developer documentation part

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 16.6 KB
1<?xml version="1.0" encoding="UTF-8"?>
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
6  $Id: base_overview.xml 4902 2009-04-24 10:56:19Z nicklas $
8  Copyright (C) 2007 Nicklas Nordborg, Martin Svensson
10  This file is part of BASE - BioArray Software Environment.
11  Available at
13  BASE is free software; you can redistribute it and/or
14  modify it under the terms of the GNU General Public License
15  as published by the Free Software Foundation; either version 3
16  of the License, or (at your option) any later version.
18  BASE is distributed in the hope that it will be useful,
19  but WITHOUT ANY WARRANTY; without even the implied warranty of
21  GNU General Public License for more details.
23  You should have received a copy of the GNU General Public License
24  along with BASE. If not, see <>.
27<chapter id="base_develop_overview" chunked="0">
28  <?dbhtml dir="develop_overview"?>
29  <title>Developer overview of BASE</title>
31  <para>
32    This section gives a brief overview of the architechture used
33    in  BASE. This is a good starting point if you need to know how
34    various parts of BASE are glued together. The figure below should
35    display most of the importants parts in BASE. The following
36    sections will briefly describe some parts of the figure
37    and give you pointers for further reading if you are interested in
38    the details.
39  </para>
41    <figure id="develop.figures.overview">
42      <title>Overview of the BASE application</title>
43      <screenshot>
44        <mediaobject>
45          <imageobject>
46            <imagedata 
47              align="center"
48              scalefit="1" width="100%"
49              fileref="figures/uml/base.overview.png" format="PNG" />
50          </imageobject>
51        </mediaobject>
52      </screenshot>
53    </figure>
55    <sect1 id="base_develop_overview.database">
56      <title>Fixed vs. dynamic database</title>
58      <para>
59        BASE stores most of it's data in a database. The database is divided into
60        two parts, one fixed and one dynamic part.
61      </para>
63      <para>
64        The fixed part contains tables that corresponds
65        to the various items found in BASE. There is, for example, one table
66        for users, one table for groups and one table for reporters. Some items
67        share the same table. Biosources, samples, extracts and labeled extracts are
68        all biomaterials and share the <code>BioMaterials</code> table. The access
69        to the fixed part of the database goes through Hibernate in most cases
70        or through the the Batch API in some cases (for example, access to reporters).
71      </para>
73      <para>
74        The dynamic part of the database contains tables for storing analyzed data. 
75        Each experiment has it's own set of tables and it is not possible to mix data
76        from two experiments. The dynamic part of the database can only be accessed
77        by the Batch API and the Query API using SQL and JDBC.
78      </para>
80      <note>
81        The actual location of the two parts depends on the database that is used.
82        MySQL uses two separate databases while PostgreSQL uses one database with two schemas.
83      </note>
85      <bridgehead>More information</bridgehead>
86      <itemizedlist>
87        <listitem>
88          <para>
89          <xref linkend="api_overview.dynamic_and_batch_api" />
90          </para>
91        </listitem>
92      </itemizedlist>
94    </sect1>
96    <sect1 id="base_develop_overview.hibernate">
97      <title>Hibernate and the DbEngine</title>
99      <para>
100        Hibernate (<ulink url=""></ulink>) is an
101        object/relational mapping software package. It takes plain Java objects
102        and stores them in a database. All we have to do is to set the properties
103        on the objects (for example: <code>user.setName("A name")</code>). Hibernate
104        will take care of the SQL generation and database communication for us.
105        This is not a magic or automatic process. We have to provide mapping
106        information about what objects goes into which tables and what properties
107        goes into which columns, and other stuff like caching and proxy settings, etc.
108        This is done by annotating the code with Javadoc comments. The classes
109        that are mapped to the database are found in the <code></code> 
110        package, which is shown as the <guilabel>Data classes</guilabel> box in the image above.
111        The <classname docapi="net.sf.basedb.core">HibernateUtil</classname> class contains a
112        lot of functionality for interacting with Hibernate.
113      </para>
115      <para>
116        Hibernate supports many different database systems. In theory, this means
117        that BASE should work with all those databases. However, in practice we have
118        found that this is not the case. For example, Oracle converts empty strings
119        to <code>null</code> values, which breaks some parts of our code that
120        expects non-null values. Another difficulty is that our Batch API and some parts of
121        the Query API:s generates native SQL as well. We try to use database dialect information
122        from Hibernate, but it is not always possible. The <interfacename 
123        docapi="net.sf.basedb.core.dbengine">DbEngine</interfacename> contains code
124        for generating the SQL that Hibernate can't help us with. We have implemented
125        a generic <classname docapi="net.sf.basedb.core.dbengine">DefaultDbEngine</classname>
126        which follows ANSI specifications and special drivers for MySQL
127        (<classname docapi="net.sf.basedb.core.dbengine">MySQLEngine</classname>) and
128        PostgreSQL (<classname docapi="net.sf.basedb.core.dbengine">PostgresDbEngine</classname>).
129        We don't expect BASE to work with other databases without modifications.
130      </para> 
132      <bridgehead>More information</bridgehead>
133      <itemizedlist>
134        <listitem>
135          <para>
136          <xref linkend="core_ref.rules.datalayer" />
137          </para>
138        </listitem>
139        <listitem>
140          <para>
141          <ulink url=""></ulink>
142          </para>
143        </listitem>
144      </itemizedlist>
146    </sect1>
148    <sect1 id="base_develop_overview.batchapi">
149      <title>The Batch API</title>
151      <para>
152        Hibernate comes with a price. It affects performance and uses a lot
153        of memory. This means that those parts of BASE that often handles
154        lots of items at the same time doesn't work well with Hibernate. This
155        is for example reporters, array design features and raw data. We
156        have created the Batch API to solve these problems.
157      </para>
159      <para>
160        The Batch API uses JDBC and SQL directly against the database. However, we
161        still use metadata and database dialect information available from Hibernate
162        to generate most of the SQL we need. In theory, this should make the Batch API
163        just as database-independent as Hibernate is. In practice there is some information
164        that we can't extract from Hibernate so we have implemented a simple
165        <interfacename docapi="net.sf.basedb.core.dbengine">DbEngine</interfacename>
166        to account for missing pieces. The Batch API can be used for any
167        <classname docapi="">BatchableData</classname> class in the
168        fixed part of the database and is the only way for adding data to the dynamic part.
169      </para>
171      <note>
172        The main reason for the Batch API is to avoid the internal caching
173        of Hibernate which eats lots of memory when handling thousands of items.
174        Hibernate 3.1 introduced a new stateless API which among other things doesn't
175        do any caching. This version was released after we had created the Batch API.
176        We made a few tests to check if it would be better for us to switch back to Hibernate
177        but found that it didn't perform as well as our own Batch API (it was about 2 times slower).
178        In any case, we can never get Hibernate to work with the dynamic database,
179        so the Batch API is needed.
180      </note>
182      <bridgehead>More information</bridgehead>
183      <itemizedlist>
184        <listitem>
185          <para>
186          <xref linkend="api_overview.dynamic_and_batch_api" />
187          </para>
188        </listitem>
189        <listitem>
190          <para>
191          <xref linkend="core_ref.rules.batchclass" />
192          </para>
193        </listitem>
194        <listitem>
195          <para>
196          <xref linkend="core_ref.batch" />
197          </para>
198        </listitem>
199      </itemizedlist>
200    </sect1>
202    <sect1 id="base_develop_overview.classes">
203      <title>Data classes vs. item classes</title>
205      <para>
206        The data classes are, with few exceptions, for internal use. These are the classes
207        that are mapped to the database with Hibernate mapping files. They are very simple
208        and contains no logic at all. They don't do any permission checks or any data
209        validation.
210      </para>
212      <para>
213        Most of the data classes has a corresponding item class. For example:
214        <classname docapi="">UserData</classname>
215        and <classname docapi="net.sf.basedb.core">User</classname>,
216        <classname docapi="">GroupData</classname> and
217        <classname docapi="net.sf.basedb.core">Group</classname>.
218        The item classes are what the client applications can see and use. They contain
219        logic for permission checking (for example if the logged in user has WRITE permission)
220        and data validation (for example setting a required property to null).
221      </para>
223      <para>
224        The exception to the above scheme are the batchable classes, which are
225        all subclasses of the <classname docapi="">BatchableData</classname>
226        class. For example, there is a <classname docapi="">ReporterData</classname>
227        class but no corresponding item class. Instead there is a 
228        batcher implementation, <classname docapi="net.sf.basedb.core">ReporterBatcher</classname>,
229        which takes care of the more or less the same things that an item class does,
230        but it also takes care of it's own SQL generation and JDBC calls that
231        bypasses Hibernate and the caching system.
232      </para>
234      <bridgehead>More information</bridgehead>
235      <itemizedlist>
236        <listitem>
237          <para>
238          <xref linkend="core_ref.rules.datalayer" />
239          </para>
240        </listitem>
241        <listitem>
242          <para>
243          <xref linkend="core_ref.rules.itemclass" />
244          </para>
245        </listitem>
246        <listitem>
247          <para>
248          <xref linkend="core_ref.rules.batchclass" />
249          </para>
250        </listitem>
251        <listitem>
252          <para>
253          <xref linkend="core_ref.accesspermissions" />
254          </para>
255        </listitem>
256        <listitem>
257          <para>
258          <xref linkend="core_ref.datavalidation" />
259          </para>
260        </listitem>
261        <listitem>
262          <para>
263          <xref linkend="core_ref.batch" />
264          </para>
265        </listitem>
266      </itemizedlist>
267    </sect1>
269    <sect1 id="base_develop_overview.queryapi">
270      <title>The Query API</title>
271      <para>
272        The Query API is used to build and execute queries against the data in the
273        database. It builds a query by using objects that represents certain
274        operations. For example, there is an <classname 
275        docapi="net.sf.basedb.core.query">EqRestriction</classname> object
276        which tests if two expressions are equal and there is an <classname 
277        docapi="net.sf.basedb.core.query">AddExpression</classname>
278        object which adds two expressions. In this way it is possible to build
279        very complex queries without using SQL or HQL.
280      </para>
282      <para>
283        The Query API knows how to work both via Hibernate and via SQL. In the first case it
284        generates HQL (Hibernate Query Language) statements which Hibernate then
285        translates into SQL. In the second case SQL is generated directly.
286        In most cases HQL and SQL are identical, but not
287        always. Some situations are solved by having the Query API generate
288        slightly different query strings (with the help of information from
289        Hibernate and the DbEngine). Some query elements can only be used
290        with one of the query types.
291      </para>
293      <note>
294        The object-based approach makes it a bit difficult to store
295        a query for later reuse. The <code>net.sf.basedb.util.jep</code> 
296        package contains an expression parser that can be used to convert
297        a string to <interfacename 
298        docapi="net.sf.basedb.core.query">Restriction</interfacename>:s and
299        <interfacename 
300        docapi="net.sf.basedb.core.query">Expression</interfacename>:s for
301        the Query API. While it doesn't cover 100% of the cases it should be
302        useful for the <code>WHERE</code> part of a query.
303      </note>
305      <bridgehead>More information</bridgehead>
306      <itemizedlist>
307        <listitem>
308          <para>
309          <xref linkend="api_overview.query_api" />
310          </para>
311        </listitem>
312      </itemizedlist>
313    </sect1>
315    <sect1 id="base_develop_overview.controllerapi">
316      <title>The Controller API</title>
317      <para>
318        The Controller API is the very heart of the Base 2 system. This part
319        of the core is used for boring but essential details, such as
320        user authentication, database connection management, transaction
321        management, data validation, and more. We don't write more about this
322        part here, but recommends reading the documents below.
323      </para>
325      <bridgehead>More information</bridgehead>
326      <itemizedlist>
327        <listitem>
328          <para>
329          <xref linkend="core_ref.coreinternals" />
330          </para>
331        </listitem>
332      </itemizedlist>
333    </sect1>
335    <sect1 id="base_develop_overview.plugins">
336      <title>Plug-ins</title>
338      <para>
339        From the core code's point of view a plug-in is just another client
340        application. A plug-in doesn't have more powers and doesn't have
341        access to some special API that allows it to do cool stuff that other
342        clients can't.
343      </para>
345      <para>
346        However, the core must be able to control when and where a plug-in is
347        executed. Some plug-ins may take a long time doing their calculations
348        and may use a lot of memory. It would be bad if a several users started
349        to execute a resource-demanding plug-in at the same time. This problem is
350        solved by adding a job queue. Each plug-in that should be executed is
351        registered as <classname 
352        docapi="net.sf.basedb.core">Job</classname> in the database. A job controller is
353        checking the job queue at regular intervals. The job controller can then
354        choose if it should execute the plug-in or wait depending on the current
355        load on the server.
356      </para>
358      <note>
359        BASE ships with two types of job controllers. One internal that runs
360        inside the web application, and one external that is designed to run
361        on separate servers, so called job agents. The internal job controller
362        should work fine in most cases. The drawback with this controller is
363        that a badly written plug-in may crash the entire web server. For example,
364        a call to <code>System.exit()</code> in the plug-in code shuts down Tomcat
365        as well.
366      </note>
368      <bridgehead>More information</bridgehead>
369      <itemizedlist>
370        <listitem>
371          <para>
372          <xref linkend="plugin_developer" />
373          </para>
374        </listitem>
375        <listitem>
376          <para>
377          <xref linkend="core_ref.pluginexecution" />
378          </para>
379        </listitem>
380      </itemizedlist>
381    </sect1>
383    <sect1 id="base_develop_overview.clients">
384      <title>Client applications</title>
385      <para>
386        Client applications are application that use the BASE Core API. The current web
387        application is built with Java Server Pages (JSP). It is supported by several
388        application servers but we have only tested it with Tomcat. Other client
389        applications are the external job agents that executes plug-ins on separate
390        servers, and the migration tool that migrates data from a BASE 1.2.x installation
391        to BASE 2.
392      </para>
394      <para>
395        Although it is possible to develop a completely new client appliction from
396        scratch we don't see this as a likely thing to happen. Instead, there are
397        some other possibilites to access data in BASE and to extend the functionality
398        in BASE.
399      </para>
401      <para>
402        The first possibility is to use the Web Service API. This allows you to access
403        some of the data in the BASE database and download it for further use. The
404        Web Service API is currently very limited but it is not hard to extend it
405        to cover more use cases.
406      </para>
408      <para>
409        A second possibility is to use the Extension API. This allows a developer to
410        add functionality that appears directly in the web interface. For example,
411        additional menu items and toolbar buttons. This API is also easy to extend to
412        cover more use cases.
413      </para>
415      <bridgehead>More information</bridgehead>
416      <itemizedlist>
417        <listitem>
418          <para>
419          <xref linkend="webservices" />
420          </para>
421        </listitem>
422        <listitem>
423          <para>
424          <xref linkend="extensions_developer" />
425          </para>
426        </listitem>
427        <listitem>
428          <para>
429          The <ulink url="">BASE plug-ins site</ulink> also
430          has examples of extensions and web services implementations.
431          </para>
432        </listitem>
433      </itemizedlist>
434    </sect1>
Note: See TracBrowser for help on using the repository browser.