source: trunk/doc/src/docbook/overview/features.xml @ 5846

Last change on this file since 5846 was 5846, checked in by Jari Häkkinen, 10 years ago

Fixes #523.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
File size: 19.9 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE chapter PUBLIC
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
5
6<!--
7  $Id: features.xml 5846 2011-11-01 18:02:40Z jari $
8
9  Copyright (C) 2008, 2011 Jari Häkkinen
10
11  This file is part of BASE - BioArray Software Environment.
12  Available at http://base.thep.lu.se/
13
14  BASE is free software; you can redistribute it and/or
15  modify it under the terms of the GNU General Public License
16  as published by the Free Software Foundation; either version 3
17  of the License, or (at your option) any later version.
18
19  BASE is distributed in the hope that it will be useful,
20  but WITHOUT ANY WARRANTY; without even the implied warranty of
21  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
22  GNU General Public License for more details.
23
24  You should have received a copy of the GNU General Public License
25  along with BASE. If not, see <http://www.gnu.org/licenses/>.
26-->
27
28<chapter id="features" chunked="0">
29  <?dbhtml filename="features.html"?>
30  <title>BASE features</title>
31
32  <para>
33    The BASE application features many components; MIAME compliance,
34    multi-user, data sharing, data access management, array and
35    biomaterial LIMS, multiple array platforms, RNAseq sequencing
36    support, extensibility, configurable plug-ins, annotation
37    customisation, streamlined access to analysis tools, integration
38    of <ulink url='http://www.tm4.org/mev/'>MultiExperiment Viewer
39    (MeV)</ulink>, web services API, and more. To support all
40    components the underlying relational database has grown to become
41    very large and complex, especially since BASE itself works with
42    objects posing additional database tables to keep track of objects
43    stored in a relational database. Thus, rather than trying to
44    describe every feature in detail here, we highlight some of the
45    more important features.
46  </para>
47
48  <sect1 id="features.webinterface">
49    <title>Web interface</title>
50
51    <para>
52      The entire system is accessed through a web-interface over the
53      Internet using a standard web browser, such as Firefox, Safari,
54      Opera, or Internet Explorer. Access privileges to a particular
55      BASE installation are managed by personal accounts through the
56      web-interface. A local administrator creates new user accounts
57      with specific roles and access privileges and has an overall
58      managerial responsibility for an individual BASE
59      installation. With exception for the administrator with global
60      data access, individual users have sole access to and control
61      their inputted data. Users have the possibility to share data
62      they own (or have share credentials for) to other users of the
63      same BASE installation.
64    </para>
65
66  </sect1>
67
68  <sect1 id="features.datamangement">
69    <title>Information and annotation management</title>
70
71    <para>
72      BASE features a biomaterial LIMS tracking biological material
73      from its source to hybridisation/sequencing and ultimately to
74      raw data and analysis. All events throughout sample handling are
75      tracked and information on used and remaining quantities,
76      physical sample locations, quality control information, and
77      sample relations is stored in BASE. Racks or boxes holding
78      biomaterials can be created as BioPlates and plate events are
79      easily performed for extraction or labelling events. Although
80      becoming less commonly used, the array production LIMS of
81      previous BASE versions is retained to support researchers with
82      spotting facilities, e.g., protein array production and BAC
83      array printing that may not be commercially available.
84    </para>
85
86    <para>
87      Events in biomaterial and array LIMS are annotable with
88      protocols and event dates, and most items can be annotated with
89      customisable annotation types such as floats, integers, dates,
90      and Boolean flags. Change history for biomaterial items is
91      available if configured and can be used to track modifications
92      in the database.  Annotations are either free form or from a
93      preset list of values, and can be marked as required for MIAME
94      compliance. The annotation system is searchable and the user can
95      select any annotations to be an experimental factors in analysis
96      whereby it becomes available to analysis plugins and plot-tools.
97    </para>
98
99  </sect1>
100
101  <sect1 id="features.sharingandprivacy">
102    <title>Data sharing and privacy</title>
103
104    <para>
105      One of the important features of BASE is its capabilities as a
106      local data repository. The repository functionality is amended
107      with data grouping, sharing, and privacy policies. A BASE
108      project is used to group items (biomaterial, raw data, and
109      experiments) into a logical entity, and a BASE experiment is a
110      collection of bioassays, e.g., array data, grouped logically
111      together for further analysis. All items can co-exist in several
112      projects and experiments without any unnecessary copying of
113      information.
114    </para>
115
116    <para>
117      Data privacy is guarded by the data owner and BASE allows the
118      owner to set data access rules. To this end, each item in BASE
119      is owned by a user enabling him to share data with
120      colleagues. The grouping of data in projects allows the data
121      owner to simply include other users in a project in order to
122      share data. Each item can have different access levels even
123      within a project, and project members can have different
124      privileges. The data access rules are very flexible and can be
125      overwhelming since access levels on almost any item can be
126      individually set. However, using projects, the proper access
127      levels can be set at a single point of interaction.
128    </para>
129
130  </sect1>
131
132  <sect1 id="features.directorystructure">
133    <title>File and directory structure</title>
134
135    <para>
136      BASE has an integrated file system to provide the possibility
137      for researchers to collect all data files related to a project
138      in one single storage location. Data files are uploaded using a
139      web browser or an ftp client. The file storage is an integral
140      part of a strategy to store all experiment relevant data in
141      BASE, even data types not already supported in
142      analysis. Collecting all data allows future reuse of the data as
143      more data are produced, and new analysis tools becomes
144      available.
145    </para>
146
147  </sect1>
148
149  <sect1 id="features.plugininfrastructure">
150    <title>Plugin and extension infrastructure</title>
151
152    <para>
153      BASE features a hierarchically organised analysis interface that
154      allows data filtering, normalisation, transformation, and other
155      analyses. Parameters and settings are automatically stored for
156      each step in the analysis. The selection of analysis tools
157      depends on array type and available plug-ins where a wide range
158      of tools are pre-installed with BASE, and optional plug-ins can
159      be downloaded from the <ulink
160      href='http://baseplugins.thep.lu.se'>BASE plug-in site
161      </ulink>. BASE capitalise from other software tools, such as
162      MEV, by integrating them into the user interface. Such
163      integration provide streamlined access to analysis modules in
164      external tools. BASE even features a rudimentary manual
165      transform creator that enables researchers to add analysis steps
166      within the hierarchical overview of analysis performed
167      independently of BASE. The transform creator enables storage of
168      result files and parameter information for archival, tracking,
169      and sharing purposes.
170    </para>
171
172    <para>
173      The analysis of genomics data is continuously evolving with new
174      methods and techniques. To this end BASE provides extensions and
175      plug-in programming interfaces (APIs) to enable straightforward
176      additions of new analysis tools. The use of the APIs is well
177      documented and there are numerous examples on how to create
178      extensions. The MEV and ftp-server integration all utilise the
179      extension mechanism, and the automatically generated overview
180      plots available in the experimental analysis view are also
181      extensions. The plug-in API is used for all data imports and
182      exports, and most analysis tools, providing new developers a lot
183      of example code to examine when they create BASE plug-ins.
184    </para>
185
186  </sect1>
187
188  <sect1 id="features.batchdata">
189    <title>Batch upload and download of data</title>
190
191    <para>
192      File, annotation, and item upload can be done asynchronously as
193      data are generated or information becomes available. To relieve
194      researchers from the tedious task of entering data one by one a
195      set of batch import were created; the information generated
196      throughout the experimental work is uploaded to BASE in plain
197      tab-separated files. These files are supplied to batch importer
198      plug-ins that parse the files and create items and associations
199      according to the information in the files. The same plug-ins can
200      be used to batch update many items. Similarly, annotating items
201      is done by creating tab-separated files with annotation
202      information, uploading these to BASE, and loading the file
203      content into the database using annotation importers. If needed,
204      annotations are easily updated with the same mechanism.
205    </para>
206
207    <para>
208      Files uploaded to BASE are stored in the directory structure
209      within BASE and multiple files are easily transferred to BASE
210      either packaged in compressed files with a single upload action,
211      or by using an ftp client supporting transfer of file
212      structures. Similarly, downloading multiple files is
213      straightforward either using an ftp client or by a single click
214      in the BASE web interface. Download of items is done through
215      item listing views enabling users to filter and select what
216      information should be downloaded.
217    </para>
218
219  </sect1>
220
221  <sect1 id="features.supportedarrays">
222    <title>Supported array platforms and raw data formats</title>
223
224    <para>
225      There are many types of microarrays, techniques, and brands
226      available for researchers; one- or two-channel hybridizations,
227      spotted cDNA/oligo arrays, Affymetrix (GeneChip), Illumina (SNP,
228      DASL, WGEX, microRNA), aCGH, SNP, tiling arrays, and many
229      more. In addition expression data can be derived from sequencing
230      data, i.e., RNASeq. Data is produced in different file formats
231      that must be treated differently depending on type.
232    </para>
233
234    <para>
235      Many platforms and experimental setups are supported in
236      downstream analysis but some microarray techniques cannot
237      currently be analysed within BASE simply because lack of support
238      in available plug-ins. The problem is resolved by creating new,
239      or extending available, plug-ins that add analysis capabilities
240      of platforms and techniques not readily supported in
241      analysis. Extending analysis capabilities to new technologies is
242      only a matter of local needs and resources. We add support for
243      platforms in use at the Lund University microarray facility and
244      make our tools freely available to the community.
245    </para>
246
247    <para>
248      For two channel array platforms it is straightforward to
249      customise BASE for a specific array platform, the platform
250      simply needs to be adapted to the (BASE) Generic platform. The
251      adaptation is to create a raw data format definition and to
252      configure raw data importers, or make use of already available
253      raw data formats. However, it is not always possible to make an
254      natural mapping of a platform to the Generic platform. Platforms
255      such as Affymetrix and Illumina platforms cannot naturally be
256      mapped on to the Generic two channel platform. For Affymetrix,
257      BASE comes with a specific Affymetrix platform and Illumina can
258      be supported by customising BASE (go to the <ulink
259      url="http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina">
260      Illumina package</ulink> web site for more information on adding
261      Illumina support to BASE).
262    </para>
263
264    <para>
265      How to adapt new array platforms to the Generic platform format
266      or how to create a new platform type in BASE can be read
267      elsewhere in this document. Here we list different array
268      platforms used in BASE and also list raw data types supported by
269      BASE. However, not all platforms nor raw data types listed below
270      are available out-of-the box and a BASE administrator must
271      customise his local BASE installation for their specific
272      need. What comes pre-configured when BASE is installed is
273      indicated in the lists below.
274    </para>
275
276    <sect2 id="features.supportedarrays.platforms">
277      <title>Vendor specific and custom printing array
278      platforms</title>
279
280      <para>
281        Not all array platforms listed below are available by
282        default. The comments to specific platforms explain how to
283        enable the use of the array platform in BASE. In some cases
284        there is no confirmed usage of a platform but we believe it
285        has been tested by anonymous users.
286      </para>
287
288      <variablelist>
289        <varlistentry>
290          <term>Affymetrix</term>
291          <listitem>
292            <para>
293              The Affymetrix platform comes pre-configured with a new
294              BASE installation. Affymetrix platform in this context
295              are the Affymetrix expression arrays. So far there has
296              been no reason for expanding the Array platform to other
297              chip-types. In principle any Affymetrix chip type can be
298              stored in BASE but current plug-ins will always assume
299              that expression data is stored and analysed. This can be
300              resolved by adding variants of the Affymetrix platform
301              but the Lund BASE team currently has no plans to create
302              Affymetrix variants.
303            </para>
304          </listitem>
305        </varlistentry>
306
307        <varlistentry>
308          <term>Agilent</term>
309          <listitem>
310            <para>
311            </para>
312          </listitem>
313        </varlistentry>
314
315        <varlistentry>
316          <term>Custom printing</term>
317          <listitem>
318            <para>
319              The array layout options are endless and imagination is
320              the only limitation ... almost. BASE can import many
321              in-house array designs and platforms. The custom arrays
322              usually fall back on one of the raw data types already
323              available such as GenePix.
324            </para>
325          </listitem>
326        </varlistentry>
327
328        <varlistentry>
329          <term>Illumina</term>
330          <listitem>
331            <para>
332              There are several variants of the Illumina
333              platform. Using several variants allows BASE to adapt
334              its handling of different Illumina chip types. Illumina
335              platform support is not included in a standard BASE
336              installation but there is a <ulink
337              url="http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina">
338              Illumina package</ulink> available for seamless
339              integration of the Illumina array platform to BASE.
340            </para>
341          </listitem>
342        </varlistentry>
343
344        <varlistentry>
345          <term>ImaGene</term>
346          <listitem>
347            <para>
348              No successful use confirmed but ImaGene raw data is
349              available in BASE.
350            </para>
351          </listitem>
352        </varlistentry>
353
354        <varlistentry>
355          <term>Sequencing</term>
356          <listitem>
357            <para>
358              Expression data from sequencing experiments. Cufflinks
359              raw-data type is available for expression values from
360              sequencing experiments.
361            </para>
362          </listitem>
363        </varlistentry>
364
365        <varlistentry>
366          <term>Unlisted</term>
367          <listitem>
368            <para>
369              In principle any platform generating a matrix of data
370              can be imported into BASE. Simply utilise the available
371              raw data formats and data importers.
372            </para>
373          </listitem>
374        </varlistentry>
375      </variablelist>
376
377    </sect2>
378
379    <sect2 id="features.supportedarrays.rawdatatypes">
380      <title>Available raw data types</title>
381
382      <para>
383        Raw data comes in many different formats. These formats are
384        usually defined by scanner software vendors and BASE must keep
385        track of the different formats for analysis and plotting. BASE
386        supports many formats out the box, but some formats need to be
387        added manually by the BASE administrator (indicated in the
388        list below).
389      </para>
390
391      <variablelist>
392
393        <varlistentry>
394          <term>Affymetrix</term>
395          <listitem>
396            <para>
397            </para>
398          </listitem>
399        </varlistentry>
400
401        <varlistentry>
402          <term>AIDA</term>
403          <listitem>
404            <para>
405            </para>
406          </listitem>
407        </varlistentry>
408
409        <varlistentry>
410          <term>Agilent</term>
411          <listitem>
412            <para>
413            </para>
414          </listitem>
415        </varlistentry>
416
417        <varlistentry>
418          <term>BZScan</term>
419          <listitem>
420            <para>
421            </para>
422          </listitem>
423        </varlistentry>
424
425        <varlistentry>
426          <term>ChipSkipper</term>
427          <listitem>
428            <para>
429            </para>
430          </listitem>
431        </varlistentry>
432
433        <varlistentry>
434          <term>Cufflinks</term>
435          <listitem>
436            <para>
437            </para>
438          </listitem>
439        </varlistentry>
440
441        <varlistentry>
442          <term>GenePix</term>
443          <listitem>
444            <para>
445            </para>
446          </listitem>
447        </varlistentry>
448
449        <varlistentry>
450          <term>GeneTAC</term>
451          <listitem>
452            <para>
453            </para>
454          </listitem>
455        </varlistentry>
456
457        <varlistentry>
458          <term>Illumina</term>
459          <listitem>
460            <para>
461              The Illumina array platform usage is recommended to be
462              based on the <emphasis>Illumina Bead Summary
463              (IBS)</emphasis> raw data format below.
464            </para>
465          </listitem>
466        </varlistentry>
467
468        <varlistentry>
469          <term>Illumina Bead Summary (IBS)</term>
470          <listitem>
471            <para>
472              Not available in BASE directly but it is added with the
473              <ulink
474              url="http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina">
475              Illumina plug-in</ulink> that adds Illumina array
476              platform support to BASE.
477            </para>
478          </listitem>
479        </varlistentry>
480
481        <varlistentry>
482          <term>ImaGene</term>
483          <listitem>
484            <para>
485            </para>
486          </listitem>
487        </varlistentry>
488
489        <varlistentry>
490          <term>QuantArray Biotin</term>
491          <listitem>
492            <para>
493            </para>
494          </listitem>
495        </varlistentry>
496
497        <varlistentry>
498          <term>QuantArray Cy</term>
499          <listitem>
500          <para>
501          </para>
502          </listitem>
503        </varlistentry>
504
505        <varlistentry>
506          <term>SpotFinder</term>
507          <listitem>
508            <para>
509            </para>
510          </listitem>
511        </varlistentry>
512
513      </variablelist>
514
515    </sect2>
516
517  </sect1>
518
519  <sect1 id="features.supportedsequencing">
520    <title>Supported sequencing applications</title>
521
522    <para>
523      BASE was originally developed for management and analysis of
524      array based data. Recent version, starting at version 3, have
525      been adopted to support sequencing based data. Being a newly
526      developed feature it is not as mature as the array part of BASE.
527    </para>
528
529    <para>
530      For sequencing data in general, BASE can be used for data
531      management and sharing. BASE currently has extended support for
532      sequencing applications such as RNAseq where data is transformed
533      to gene expression measurements. For such applications array
534      designs can be created based on gene structure defined in <ulink
535      url='http://en.wikipedia.org/wiki/Gene_transfer_format'>GTF
536      formatted files</ulink>. For example, GTF files for all RefSeqs
537      or known genes.
538    </para>
539
540  </sect1>
541
542  <sect1 id="features.repositoryandstandards">
543    <title>Repository and standards</title>
544
545    <para>
546      The Microarray Gene Expression Data Society (MGED) develops and
547      maintains standards for data acquisition, representation, and
548      interchange such as the MIAME guidelines, the MAGE-TAB
549      interchange format, and the MGED Ontology for microarray
550      experiments. BASE does not enforce the use of the MGED standards
551      but support storage of information required by MIAME. BASE has
552      an experiment item overview functionality useful for validating
553      information related to experiments. The validation level is user
554      selectable of which the option regarding MIAME compliance is
555      most relevant here. When users or server administrators create
556      annotation types in BASE these annotation values can be marked
557      as required by MIAME and optionally defined to be a list of
558      pre-defined values from a controlled vocabulary. Validation will
559      check for inconsistencies and report errors, and give the user
560      an opportunity to fix issues immediately or later. After
561      resolving the issues raised by the validation, data can be
562      exported for submission to public repositories such as
563      ArrayExpress, Gene Expression Omnibus (GEO), and CIBEX.
564    </para>
565
566  </sect1>
567
568</chapter>
Note: See TracBrowser for help on using the repository browser.