source: trunk/doc/src/docbook/overview/features.xml @ 5845

Last change on this file since 5845 was 5845, checked in by Jari Häkkinen, 11 years ago

Addresses #523.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 18.6 KB
1<?xml version="1.0" encoding="UTF-8"?>
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
7  $Id: features.xml 5845 2011-11-01 16:58:55Z jari $
9  Copyright (C) 2008, 2011 Jari Häkkinen
11  This file is part of BASE - BioArray Software Environment.
12  Available at
14  BASE is free software; you can redistribute it and/or
15  modify it under the terms of the GNU General Public License
16  as published by the Free Software Foundation; either version 3
17  of the License, or (at your option) any later version.
19  BASE is distributed in the hope that it will be useful,
20  but WITHOUT ANY WARRANTY; without even the implied warranty of
22  GNU General Public License for more details.
24  You should have received a copy of the GNU General Public License
25  along with BASE. If not, see <>.
28<chapter id="features" chunked="0">
29  <?dbhtml filename="features.html"?>
30  <title>BASE features</title>
32  <para>
33    The BASE application features many components; MIAME compliance,
34    multi-user, data sharing, data access management, array and
35    biomaterial LIMS, multiple array platforms, RNAseq sequencing
36    support, extensibility, configurable plug-ins, annotation
37    customisation, streamlined access to analysis tools, integration
38    of <ulink url=''>MultiExperiment Viewer
39    (MeV)</ulink>, web services API, and more. To support all
40    components the underlying relational database has grown to become
41    very large and complex, especially since BASE itself works with
42    objects posing additional database tables to keep track of objects
43    stored in a relational database. Thus, rather than trying to
44    describe every feature in detail here, we highlight some of the
45    more important features.
46  </para>
48  <sect1 id="features.webinterface">
49    <title>Web interface</title>
51    <para>
52      The entire system is accessed through a web-interface over the
53      Internet using a standard web browser, such as Firefox, Safari,
54      Opera, or Internet Explorer. Access privileges to a particular
55      BASE installation are managed by personal accounts through the
56      web-interface. A local administrator creates new user accounts
57      with specific roles and access privileges and has an overall
58      managerial responsibility for an individual BASE
59      installation. With exception for the administrator with global
60      data access, individual users have sole access to and control
61      their inputted data. Users have the possibility to share data
62      they own (or have share credentials for) to other users of the
63      same BASE installation.
64    </para>
66  </sect1>
68  <sect1 id="features.datamangement">
69    <title>Information and annotation management</title>
71    <para>
72      BASE features a biomaterial LIMS tracking biological material
73      from its source to hybridisation/sequencing and ultimately to
74      raw data and analysis. All events throughout sample handling are
75      tracked and information on used and remaining quantities,
76      physical sample locations, quality control information, and
77      sample relations is stored in BASE. Racks or boxes holding
78      biomaterials can be created as BioPlates and plate events are
79      easily performed for extraction or labelling events. Although
80      becoming less commonly used, the array production LIMS of
81      previous BASE versions is retained to support researchers
82      with spotting facilities, e.g., protein array production and
83      BAC array printing that may not be commercially available.
84    </para>
86    <para>
87      Events in biomaterial and array LIMS are annotable with
88      protocols and event dates, and most items can be annotated with
89      customisable annotation types such as floats, integers, dates,
90      and Boolean flags. Change history for biomaterial items is available
91      if configured and can be used to track modifications in the database.
92      Annotations are either free form or from a preset list of values,
93      and can be marked as required for MIAME compliance. The annotation
94      system is searchable and the user can select any annotations to be
95      an experimental factors in analysis whereby it becomes availabe to
96      analysis plugins and plot-tools.
97    </para>
99  </sect1>
101  <sect1 id="features.sharingandprivacy">
102    <title>Data sharing and privacy</title>
104    <para>
105      One of the important features of BASE is its capabilities as a
106      local data repository. The repository functionality is amended
107      with data grouping, sharing, and privacy policies. A BASE
108      project is used to group items (biomaterial, raw data, and
109      experiments) into a logical entity, and a BASE experiment is a
110      collection of bioassays, e.g., array data, grouped logically together
111      for further analysis. All items can co-exist in several projects
112      and experiments without any unnecessary copying of information.
113    </para>
115    <para>
116      Data privacy is guarded by the data owner and BASE allows the
117      owner to set data access rules. To this end, each item in BASE
118      is owned by a user enabling him to share data with
119      colleagues. The grouping of data in projects allows the data
120      owner to simply include other users in a project in order to
121      share data. Each item can have different access levels even
122      within a project, and project members can have different
123      privileges. The data access rules are very flexible and can be
124      overwhelming since access levels on almost any item can be
125      individually set. However, using projects, the proper access
126      levels can be set at a single point of interaction.
127    </para>
129  </sect1>
131  <sect1 id="features.directorystructure">
132    <title>File and directory structure</title>
134    <para>
135      BASE has an integrated file system to provide the possibility for
136      researchers to collect all data files related to a project in
137      one single storage location. Data files are uploaded using a web
138      browser or an ftp client. The file storage is an integral part
139      of a strategy to store all experiment relevant data in BASE,
140      even data types not already supported in analysis. Collecting
141      all data allows future reuse of the data as more data are
142      produced, and new analysis tools becomes available.
143    </para>
145  </sect1>
147  <sect1 id="features.plugininfrastructure">
148    <title>Plugin and extension infrastructure</title>
151Analysis, extensions, and plug-ins
153    <para>
154      BASE features a hierarchically organised analysis interface that
155      allows data filtering, normalisation, transformation, and other
156      analyses. Parameters and settings are automatically stored for
157      each step in the analysis. The selection of analysis tools
158      depends on array type and available plug-ins where a wide range
159      of tools are pre-installed with BASE, and optional plug-ins can
160      be downloaded from the <ulink
161      href=''>BASE plug-in site
162      </ulink>. BASE capitalise from other software tools, such as
163      MEV, by integrating them into the user interface. Such
164      integration provide streamlined access to analysis modules in
165      external tools. BASE even features a rudimentary manual
166      transform creator that enables researchers to add analysis steps
167      within the hierarchical overview of analysis performed
168      independently of BASE. The transform creator enables storage of
169      result files and parameter information for archival, tracking,
170      and sharing purposes.
171    </para>
173    <para>
174      The analysis of genomics data is continuously evolving with new
175      methods and techniques. To this end BASE provides extensions and
176      plug-in programming interfaces (APIs) to enable straightforward
177      additions of new analysis tools. The use of the APIs is well
178      documented and there are numerous examples on how to create
179      extensions. The MEV and ftp-server integrations all utilise the
180      extension mechanism, and the automatically generated overview
181      plots available in the experimental analysis view are also
182      extensions. The plug-in API is used for all data imports and
183      exports, and most analysis tools, providing new developers a lot
184      of example code to examine when they create BASE plug-ins.
185    </para>
187  </sect1>
189  <sect1 id="features.batchdata">
190    <title>Batch upload and download of data</title>
192    <para>
193      File, annotation, and item upload can be done asynchronously as
194      data are generated or information becomes available. To relieve
195      researchers from the tedious task of entering data one by one a
196      set of batch import were created; the information generated
197      throughout the experimental work is uploaded to BASE in plain
198      tab-separated files. These files are supplied to batch importer
199      plug-ins that parse the files and create items and associations
200      according to the information in the files. The same plug-ins can
201      be used to batch update many items. Similarly, annotating items
202      is done by creating tab-separated files with annotation
203      information, uploading these to BASE, and loading the file
204      content into the database using annotation importers. If needed,
205      annotations are easily updated with the same mechanism.
206    </para>
208    <para>
209      Files uploaded to BASE are stored in the directory structure
210      within BASE and multiple files are easily transferred to BASE
211      either packaged in compressed files with a single upload action,
212      or by using an ftp client supporting transfer of file
213      structures. Similarly, downloading multiple files is
214      straightforward either using an ftp client or by a single click
215      in the BASE web interface. Download of items is done through
216      item listing views enabling users to filter and select what
217      information should be downloaded.
218    </para>
220  </sect1>
222  <sect1 id="features.supportedarrays">
223    <title>Supported array platforms and raw data formats</title>
225    <para>
226      There are many types of microarrays, techniques, and brands
227      available for researchers; one- or two-channel hybridizations,
228      spotted cDNA/oligo arrays, Affymetrix (GeneChip), Illumina (SNP,
229      DASL, WGEX, microRNA), aCGH, SNP, tiling arrays, and many
230      more. Data are produced in different file formats that must be
231      treated differently depending on type.
232    </para>
234    <para>
235      Many platforms and experimental setups are supported in
236      downstream analysis but some microarray techniques cannot
237      currently be analysed within BASE simply because lack of support
238      in available plug-ins. The problem is resolved by creating new,
239      or extending available, plug-ins that add analysis capabilities
240      of platforms and techniques not readily supported in
241      analysis. Extending analysis capabilities to new technologies is
242      only a matter of local needs and resources. We add support for
243      platforms in use at the Lund University microarray facility and
244      make our tools freely available to the community.
245    </para>
247    <para>
248      For two channel array platforms it is straightforward to
249      customize BASE for a specific array platform, the platform
250      simply needs to be adapted to the (BASE) Generic platform. The
251      adaptation is to create a raw data format definition and to
252      configure raw data importers, or make use of already available
253      raw data formats. However, it is not always possible to make an
254      natural mapping of a platform to the Generic platform. Platforms
255      such as Affymetrix and Illumina platforms cannot naturally be
256      mapped on to the Generic two channel platform. For Affymetrix,
257      BASE comes with a specific Affymetrix platform and Illumina can
258      be supported by customizing BASE (go to the <ulink
259      url="">
260      Illumina package</ulink> web site for more information on adding
261      Illumina support to BASE).
262    </para>
264    <para>
265      How to adapt new array platforms to the Generic platform format
266      or how to create a new platform type in BASE can be read
267      elsewhere in this document. Here we list different array
268      platforms used in BASE and also list raw data types supported
269      by BASE. However, not all platforms nor raw data types listed
270      below are available out-of-the box and a BASE administrator must
271      customize his local BASE installation for their specific
272      need. What comes pre-configured when BASE is installed is
273      indicated in the lists below.
274    </para>
276    <sect2 id="features.supportedarrays.platforms">
277      <title>Vendor specific and custom printing array
278      platforms</title>
280      <para>
281        Not all array platforms listed below are available by
282        default. The comments to specific platforms explain how to
283        enable the use of the array platform in BASE. In some cases
284        there is no confirmed usage of a platform but we believe it
285        has been tested by anonymous users.
286      </para>
288      <variablelist>
289        <varlistentry>
290          <term>Affymetrix</term>
291          <listitem>
292            <para>
293              The Affymetrix platform comes pre-configured with a
294              new BASE installation. Affymetrix platform in this
295              context are the Affymetrix expression arrays. So far
296              there has been no reason for expanding the Array
297              platform to other chip-types. In principle any
298              Affymetrix chip type can be stored in BASE but current
299              plug-ins will always assume that expression data is
300              stored and analyzed. This can be resolved by adding
301              variants of the Affymetrix platform but the Lund BASE
302              team currently has no plans to create Affymetrix
303              variants.
304            </para>
305          </listitem>
306        </varlistentry>
308        <varlistentry>
309          <term>Agilent</term>
310          <listitem>
311            <para>
312            </para>
313          </listitem>
314        </varlistentry>
316        <varlistentry>
317          <term>Custom printing</term>
318          <listitem>
319            <para>
320              The array layout options are endless and imagination is
321              the only limitation ... almost. BASE can import many
322              in-house array designs and platforms. The custom arrays
323              usually fall back on one of the raw data types already
324              available such as GenePix.
325            </para>
326          </listitem>
327        </varlistentry>
329        <varlistentry>
330          <term>Illumina</term>
331          <listitem>
332            <para>
333              There are several variants of the Illumina platform. Using
334              several variants allows BASE to adapt its handling of
335              different Illumina chip types. Illumina platform support
336              is not included in a standard BASE installation but there
337              is
338              a <ulink url="">
339                Illumina package</ulink> available for seamless
340              integration of the Illumina array platform to BASE.
341            </para>
342          </listitem>
343        </varlistentry>
345        <varlistentry>
346          <term>ImaGene</term>
347          <listitem>
348            <para>
349              No successful use confirmed but ImaGene raw data is
350              available in BASE.
351            </para>
352          </listitem>
353        </varlistentry>
355        <varlistentry>
356          <term>Unlisted</term>
357          <listitem>
358            <para>
359              In principle any platform generating a matrix of data can
360              be imported into BASE. Simply utilize the available raw
361              data formats and data importers.
362            </para>
363          </listitem>
364        </varlistentry>
365      </variablelist>
367    </sect2>
369    <sect2 id="features.supportedarrays.rawdatatypes">
370      <title>Available raw data types</title>
372      <para>
373        Raw data comes in many different formats. These formats are
374        usually defined by scanner software vendors and BASE must keep
375        track of the different formats for analysis and plotting. BASE
376        supports many formats out the box, but some formats need to
377        be added manually by the BASE administrator (indicated in the
378        list below).
379      </para>
381      <variablelist>
383        <varlistentry>
384          <term>Affymetrix</term>
385          <listitem>
386            <para>
387            </para>
388          </listitem>
389        </varlistentry>
391        <varlistentry>
392          <term>AIDA</term>
393          <listitem>
394            <para>
395            </para>
396          </listitem>
397        </varlistentry>
399        <varlistentry>
400          <term>Agilent</term>
401          <listitem>
402            <para>
403            </para>
404          </listitem>
405        </varlistentry>
407        <varlistentry>
408          <term>BZScan</term>
409          <listitem>
410            <para>
411            </para>
412          </listitem>
413        </varlistentry>
415        <varlistentry>
416          <term>ChipSkipper</term>
417          <listitem>
418            <para>
419            </para>
420          </listitem>
421        </varlistentry>
423        <varlistentry>
424          <term>GenePix</term>
425          <listitem>
426            <para>
427            </para>
428          </listitem>
429        </varlistentry>
431        <varlistentry>
432          <term>GeneTAC</term>
433          <listitem>
434            <para>
435            </para>
436          </listitem>
437        </varlistentry>
439        <varlistentry>
440          <term>Illumina</term>
441          <listitem>
442            <para>
443              The Illumina array platform usage is recommended to be
444              based on the <emphasis>Illumina Bead Summary
445              (IBS)</emphasis> raw data format below.
446            </para>
447          </listitem>
448        </varlistentry>
450        <varlistentry>
451          <term>Illumina Bead Summary (IBS)</term>
452          <listitem>
453            <para>
454              Not available in BASE directly but it is added with
455              the <ulink url="">
456              Illumina plug-in</ulink> that adds Illumina array
457              platform support to BASE.
458            </para>
459          </listitem>
460        </varlistentry>
462        <varlistentry>
463          <term>ImaGene</term>
464          <listitem>
465            <para>
466            </para>
467          </listitem>
468        </varlistentry>
470        <varlistentry>
471          <term>QuantArray Biotin</term>
472          <listitem>
473            <para>
474            </para>
475          </listitem>
476        </varlistentry>
478        <varlistentry>
479          <term>QuantArray Cy</term>
480          <listitem>
481          <para>
482          </para>
483          </listitem>
484        </varlistentry>
486        <varlistentry>
487          <term>SpotFinder</term>
488          <listitem>
489            <para>
490            </para>
491          </listitem>
492        </varlistentry>
494      </variablelist>
496    </sect2>
498  </sect1>
500  <sect1 id="features.repositoryandstandards">
501    <title>Repository and standards</title>
503    <para>
504      The Microarray Gene Expression Data Society (MGED) develops and
505      maintains standards for data acquisition, representation, and
506      interchange such as the MIAME guidelines, the MAGE-TAB
507      interchange format, and the MGED Ontology for microarray
508      experiments. BASE does not enforce the use of the MGED standards
509      but support storage of information required by MIAME. BASE has
510      an experiment item overview functionality useful for validating
511      information related to experiments. The validation level is user
512      selectable of which the option regarding MIAME compliance is
513      most relevant here. When users or server administrators create
514      annotation types in BASE these annotation values can be marked
515      as required by MIAME and optionally defined to be a list of
516      pre-defined values from a controlled vocabulary. Validation will
517      check for inconsistencies and report errors, and give the user
518      an opportunity to fix issues immediately or later. After
519      resolving the issues raised by the validation, data can be
520      exported for submission to public repositories such as
521      ArrayExpress, Gene Expression Omnibus (GEO), and CIBEX.
522    </para>
524  </sect1>
Note: See TracBrowser for help on using the repository browser.