Nov 1, 2011, 5:58:55 PM (12 years ago)
Jari Häkkinen

Addresses #523.

1 edited


  • trunk/doc/src/docbook/overview/features.xml

    r5782 r5845  
    77  $Id$
    9   Copyright (C) 2008 Jari Häkkinen
     9  Copyright (C) 2008, 2011 Jari Häkkinen
    1111  This file is part of BASE - BioArray Software Environment.
    3232  <para>
    33     This chapter will explain the important features of BASE.
     33    The BASE application features many components; MIAME compliance,
     34    multi-user, data sharing, data access management, array and
     35    biomaterial LIMS, multiple array platforms, RNAseq sequencing
     36    support, extensibility, configurable plug-ins, annotation
     37    customisation, streamlined access to analysis tools, integration
     38    of <ulink url='http://www.tm4.org/mev/'>MultiExperiment Viewer
     39    (MeV)</ulink>, web services API, and more. To support all
     40    components the underlying relational database has grown to become
     41    very large and complex, especially since BASE itself works with
     42    objects posing additional database tables to keep track of objects
     43    stored in a relational database. Thus, rather than trying to
     44    describe every feature in detail here, we highlight some of the
     45    more important features.
    3446  </para>
     48  <sect1 id="features.webinterface">
     49    <title>Web interface</title>
     51    <para>
     52      The entire system is accessed through a web-interface over the
     53      Internet using a standard web browser, such as Firefox, Safari,
     54      Opera, or Internet Explorer. Access privileges to a particular
     55      BASE installation are managed by personal accounts through the
     56      web-interface. A local administrator creates new user accounts
     57      with specific roles and access privileges and has an overall
     58      managerial responsibility for an individual BASE
     59      installation. With exception for the administrator with global
     60      data access, individual users have sole access to and control
     61      their inputted data. Users have the possibility to share data
     62      they own (or have share credentials for) to other users of the
     63      same BASE installation.
     64    </para>
     66  </sect1>
     68  <sect1 id="features.datamangement">
     69    <title>Information and annotation management</title>
     71    <para>
     72      BASE features a biomaterial LIMS tracking biological material
     73      from its source to hybridisation/sequencing and ultimately to
     74      raw data and analysis. All events throughout sample handling are
     75      tracked and information on used and remaining quantities,
     76      physical sample locations, quality control information, and
     77      sample relations is stored in BASE. Racks or boxes holding
     78      biomaterials can be created as BioPlates and plate events are
     79      easily performed for extraction or labelling events. Although
     80      becoming less commonly used, the array production LIMS of
     81      previous BASE versions is retained to support researchers
     82      with spotting facilities, e.g., protein array production and
     83      BAC array printing that may not be commercially available.
     84    </para>
     86    <para>
     87      Events in biomaterial and array LIMS are annotable with
     88      protocols and event dates, and most items can be annotated with
     89      customisable annotation types such as floats, integers, dates,
     90      and Boolean flags. Change history for biomaterial items is available
     91      if configured and can be used to track modifications in the database.
     92      Annotations are either free form or from a preset list of values,
     93      and can be marked as required for MIAME compliance. The annotation
     94      system is searchable and the user can select any annotations to be
     95      an experimental factors in analysis whereby it becomes availabe to
     96      analysis plugins and plot-tools.
     97    </para>
     99  </sect1>
     101  <sect1 id="features.sharingandprivacy">
     102    <title>Data sharing and privacy</title>
     104    <para>
     105      One of the important features of BASE is its capabilities as a
     106      local data repository. The repository functionality is amended
     107      with data grouping, sharing, and privacy policies. A BASE
     108      project is used to group items (biomaterial, raw data, and
     109      experiments) into a logical entity, and a BASE experiment is a
     110      collection of bioassays, e.g., array data, grouped logically together
     111      for further analysis. All items can co-exist in several projects
     112      and experiments without any unnecessary copying of information.
     113    </para>
     115    <para>
     116      Data privacy is guarded by the data owner and BASE allows the
     117      owner to set data access rules. To this end, each item in BASE
     118      is owned by a user enabling him to share data with
     119      colleagues. The grouping of data in projects allows the data
     120      owner to simply include other users in a project in order to
     121      share data. Each item can have different access levels even
     122      within a project, and project members can have different
     123      privileges. The data access rules are very flexible and can be
     124      overwhelming since access levels on almost any item can be
     125      individually set. However, using projects, the proper access
     126      levels can be set at a single point of interaction.
     127    </para>
     129  </sect1>
     131  <sect1 id="features.directorystructure">
     132    <title>File and directory structure</title>
     134    <para>
     135      BASE has an integrated file system to provide the possibility for
     136      researchers to collect all data files related to a project in
     137      one single storage location. Data files are uploaded using a web
     138      browser or an ftp client. The file storage is an integral part
     139      of a strategy to store all experiment relevant data in BASE,
     140      even data types not already supported in analysis. Collecting
     141      all data allows future reuse of the data as more data are
     142      produced, and new analysis tools becomes available.
     143    </para>
     145  </sect1>
     147  <sect1 id="features.plugininfrastructure">
     148    <title>Plugin and extension infrastructure</title>
     151Analysis, extensions, and plug-ins
     153    <para>
     154      BASE features a hierarchically organised analysis interface that
     155      allows data filtering, normalisation, transformation, and other
     156      analyses. Parameters and settings are automatically stored for
     157      each step in the analysis. The selection of analysis tools
     158      depends on array type and available plug-ins where a wide range
     159      of tools are pre-installed with BASE, and optional plug-ins can
     160      be downloaded from the <ulink
     161      href='http://baseplugins.thep.lu.se'>BASE plug-in site
     162      </ulink>. BASE capitalise from other software tools, such as
     163      MEV, by integrating them into the user interface. Such
     164      integration provide streamlined access to analysis modules in
     165      external tools. BASE even features a rudimentary manual
     166      transform creator that enables researchers to add analysis steps
     167      within the hierarchical overview of analysis performed
     168      independently of BASE. The transform creator enables storage of
     169      result files and parameter information for archival, tracking,
     170      and sharing purposes.
     171    </para>
     173    <para>
     174      The analysis of genomics data is continuously evolving with new
     175      methods and techniques. To this end BASE provides extensions and
     176      plug-in programming interfaces (APIs) to enable straightforward
     177      additions of new analysis tools. The use of the APIs is well
     178      documented and there are numerous examples on how to create
     179      extensions. The MEV and ftp-server integrations all utilise the
     180      extension mechanism, and the automatically generated overview
     181      plots available in the experimental analysis view are also
     182      extensions. The plug-in API is used for all data imports and
     183      exports, and most analysis tools, providing new developers a lot
     184      of example code to examine when they create BASE plug-ins.
     185    </para>
     187  </sect1>
     189  <sect1 id="features.batchdata">
     190    <title>Batch upload and download of data</title>
     192    <para>
     193      File, annotation, and item upload can be done asynchronously as
     194      data are generated or information becomes available. To relieve
     195      researchers from the tedious task of entering data one by one a
     196      set of batch import were created; the information generated
     197      throughout the experimental work is uploaded to BASE in plain
     198      tab-separated files. These files are supplied to batch importer
     199      plug-ins that parse the files and create items and associations
     200      according to the information in the files. The same plug-ins can
     201      be used to batch update many items. Similarly, annotating items
     202      is done by creating tab-separated files with annotation
     203      information, uploading these to BASE, and loading the file
     204      content into the database using annotation importers. If needed,
     205      annotations are easily updated with the same mechanism.
     206    </para>
     208    <para>
     209      Files uploaded to BASE are stored in the directory structure
     210      within BASE and multiple files are easily transferred to BASE
     211      either packaged in compressed files with a single upload action,
     212      or by using an ftp client supporting transfer of file
     213      structures. Similarly, downloading multiple files is
     214      straightforward either using an ftp client or by a single click
     215      in the BASE web interface. Download of items is done through
     216      item listing views enabling users to filter and select what
     217      information should be downloaded.
     218    </para>
     220  </sect1>
    36222  <sect1 id="features.supportedarrays">
    39225    <para>
    40       BASE supports many different vendor specific and custom printing
    41       microarray platforms and data formats, there are even users that
    42       use BASE for protein arrays. For 2 channel array platforms it is
    43       straightforward to customize BASE for a specific array platform,
    44       the platform simply needs to be adapted to the (BASE) Generic
    45       platform. The adaptation is to create a raw data format
    46       definition and to configure raw data importers, or make use of
    47       already available raw data formats. However, it is not always
    48       possible to make an natural mapping of a platform to the Generic
    49       platform. Platforms such as Affymetrix and Illumina platforms
    50       cannot naturally be mapped on to the Generic 2 channel
    51       platform. For Affymetrix, BASE comes with a specific Affymetrix
    52       platform and Illumina can be supported by customizing BASE.
     226      There are many types of microarrays, techniques, and brands
     227      available for researchers; one- or two-channel hybridizations,
     228      spotted cDNA/oligo arrays, Affymetrix (GeneChip), Illumina (SNP,
     229      DASL, WGEX, microRNA), aCGH, SNP, tiling arrays, and many
     230      more. Data are produced in different file formats that must be
     231      treated differently depending on type.
     232    </para>
     234    <para>
     235      Many platforms and experimental setups are supported in
     236      downstream analysis but some microarray techniques cannot
     237      currently be analysed within BASE simply because lack of support
     238      in available plug-ins. The problem is resolved by creating new,
     239      or extending available, plug-ins that add analysis capabilities
     240      of platforms and techniques not readily supported in
     241      analysis. Extending analysis capabilities to new technologies is
     242      only a matter of local needs and resources. We add support for
     243      platforms in use at the Lund University microarray facility and
     244      make our tools freely available to the community.
     245    </para>
     247    <para>
     248      For two channel array platforms it is straightforward to
     249      customize BASE for a specific array platform, the platform
     250      simply needs to be adapted to the (BASE) Generic platform. The
     251      adaptation is to create a raw data format definition and to
     252      configure raw data importers, or make use of already available
     253      raw data formats. However, it is not always possible to make an
     254      natural mapping of a platform to the Generic platform. Platforms
     255      such as Affymetrix and Illumina platforms cannot naturally be
     256      mapped on to the Generic two channel platform. For Affymetrix,
     257      BASE comes with a specific Affymetrix platform and Illumina can
     258      be supported by customizing BASE (go to the <ulink
     259      url="http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina">
     260      Illumina package</ulink> web site for more information on adding
     261      Illumina support to BASE).
    53262    </para>
    128337              is
    129338              a <ulink url="http://baseplugins.thep.lu.se/wiki/net.sf.basedb.illumina">
    130                 Illumina plug-in</ulink> available for seamless
     339                Illumina package</ulink> available for seamless
    131340              integration of the Illumina array platform to BASE.
    132341            </para>
    289498  </sect1>
     500  <sect1 id="features.repositoryandstandards">
     501    <title>Repository and standards</title>
     503    <para>
     504      The Microarray Gene Expression Data Society (MGED) develops and
     505      maintains standards for data acquisition, representation, and
     506      interchange such as the MIAME guidelines, the MAGE-TAB
     507      interchange format, and the MGED Ontology for microarray
     508      experiments. BASE does not enforce the use of the MGED standards
     509      but support storage of information required by MIAME. BASE has
     510      an experiment item overview functionality useful for validating
     511      information related to experiments. The validation level is user
     512      selectable of which the option regarding MIAME compliance is
     513      most relevant here. When users or server administrators create
     514      annotation types in BASE these annotation values can be marked
     515      as required by MIAME and optionally defined to be a list of
     516      pre-defined values from a controlled vocabulary. Validation will
     517      check for inconsistencies and report errors, and give the user
     518      an opportunity to fix issues immediately or later. After
     519      resolving the issues raised by the validation, data can be
     520      exported for submission to public repositories such as
     521      ArrayExpress, Gene Expression Omnibus (GEO), and CIBEX.
     522    </para>
     524  </sect1>
Note: See TracChangeset for help on using the changeset viewer.