Changeset 2798


Ignore:
Timestamp:
Oct 24, 2006, 2:34:57 PM (15 years ago)
Author:
Jari Häkkinen
Message:

Updated information about affymetrix in BASE.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/doc/affymetrix.txt

    r2342 r2798  
    2323CEL file formats. There are EXP and RPT files, and also CHP files.
    2424
    25 
    26 
    27 Affymetrix/BASE issues:
    28 
    29 - The fact that BASE loads all raw result in a single table is a
    30   problem for Affymetrix as Affymetrix chips soon have ~6 millions
    31   probes.
    32 
    33 - Most of the plug-ins/tools freely available for Affymetrix expect
    34   the Affymetrix file(s) as input, and since those are in an
    35   Affymetrix binary format, it prevents from trying to hack these R
    36   packages.
    37 
    38 - the fact that Affymetric has stuff like probeset, match/mismatch
    39   oligos.
     25CEL and CDF files are the only ones supported and needed by BASE.
    4026
    4127
     
    79651) Loading Affymatrix probe values into the database is useless.
    8066
     67   BASE stores CEL files as raw data, i.e. no data from the cel files
     68   are pushed into the database.
     69
     70
    81712) Plug-ins should be able to accept files as input.
     72
     73   The "RMAExpress plug-in for BASE" works with CEL and CDF files.
     74
    8275
    83763) BASE should be aware that raw data might not be stored in database
    8477   tables but rather in files.
    8578
     79   See 1)
     80
     81
    86824) The dynamic part of BASE should also deal with data storage in
    8783   files or database tables (or even a mix).
     84
     85   BASE stores files and directory hierarchies, and the RMAExpress
     86   plug-ins pushes its result back into database tables.
     87
    8888
    8989Clarification for item 3 and 4:
     
    9696Tentative list of descisions:
    9797
    98 A) Raw Affymetrix data will not be loaded into database
    99    tables. [Target milestone: now]
    100 
    101 B) The Feature* classes/tables added in BASE 2 to allow for storage of
    102    raw Affymetrix data are deprecated will be removed. [Target
    103    milestone: BASE 2.0RC2]
    104 
    105 C) The CEL-files will be the raw Affymetrix data in BASE 2. This will
    106    require some modifications to the core (rawdata related). [Target
    107    milestone: BASE 2.0RC2]
    108 
    10998D) We have discussed to allow for four different types of imports,
    11099   implemented in the following order:
     
    114103      designs.
    115104
    116    i) CEL-file import using a free C++ tool. The idea is (if possible)
    117       to convert this to Java and ship it with BASE 2.0. [Target
    118       milestone BASE 2.0RC2]
     105      CDF files are stored as Reporter maps, i.e., array designs. CEL
     106      files are associated with a CDF file when imported as raw
     107      data. The RMAExpress plug-in requires a copy of the CDF when it
     108      is running.
     109
    119110
    120111   ii) Create a CEL-file importer using the available R-packages. This
    121112      will include normalization that provides provides minimum gcrma
    122113      support from within base without other support based on
    123       RMAExpress code [Target milestone BASE 2.0]
     114      RMAExpress code.
    124115
    125    iii) CHP-file import. For completeness but may be fairly
    126       useless. No target release set since import of CHP data will
    127       probably not be supported. [No target milestone]
     116      Other normaliziation should be straightforward to incorporate
     117      into the RMAExpress plug-in. If the plug-in is expanded, the
     118      plug-in started should be extended to allow the user to choose
     119      normalization scheme.
    128120
    129    The current view on where the imported (non-raw) data should be
    130    stored in the dynamic tables to allow other plug-ins to manipulate
    131    the probe sets. Maybe we should in some cases store data in files?
    132    [Target milestone for last question BASE 2.0+]?
    133 
    134 E) extended-properties.xml is data format dependent (as well as
    135    migration dependent when migrating). The default is 2 channel
    136    style. NuGO is to create a template for Affymetrix data. [Target
    137    milestone BASE 2.0RC2]
    138 
    139    This means that the extended-properties.xml file should include
    140    Affymetrix related annotation (e.g. Ensembl gene id, Uniprot id,
    141    refseq id...)
    142121
    143122F) Allow to create batch of slides in one go from any given Affymetrix
    144    design using an import function [Target milestone BASE 2.0RC2]
     123   design using an import function.
    145124
    146 G) Allow for upload and parsing of EXP and RPT files loading into
    147    specific fields of relevant parameters (scaling
    148    factor...). Philippe to document EXP and RPT format and important
    149    tags. [Target milestone ?]
    150 
    151 
     125   To be done.
    152126
    153127
    154128Other remarks and links:
    155 
    156 We have a Affymetrix data set available from
    157 /home/lev/jari/projects/base/base2/testdata/expt_111_cel.zip. The
    158 experiment is based on Affymetrix design MG_U74Av2. The archive
    159 contains cell files and a descriptive file formated according to the
    160 tab2mage specs (http://tab2mage.sourceforge.net/docs/spreadsheet.html)
    161129
    162130There is a resource for Affymetrix designs and annotations provided by
     
    164132http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/
    165133
    166 RMAExpress is available at
    167 http://www.stat.berkeley.edu/~bolstad/RMAExpress/RMAExpress.html
    168 
    169 For item D.ii) One could use the annotation data package and use the
    170 Reporter Import plugin. This might requires some changes to the
    171 Reporter table to make it suitable fro Affymetrix annotation or just
    172 map to some of the fields.
    173 
    174 For item D.iii) Chip-to-chip normalization must be done, not possible
    175 with .CHP files. This will require a plug-in (with documentation). If
    176 R (BioConductor) is to be used we should contact Henrik Bengtsson,
    177 creator of a BASE 1.2 R dispatcher.
     134For R (BioConductor) we should contact Henrik Bengtsson, creator of a
     135BASE 1.2 R dispatcher.
    178136
    179137Future development: Connectivity to R/bioc and develop specific
Note: See TracChangeset for help on using the changeset viewer.