Changeset 2798
- Timestamp:
- Oct 24, 2006, 2:34:57 PM (16 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/doc/affymetrix.txt
r2342 r2798 23 23 CEL file formats. There are EXP and RPT files, and also CHP files. 24 24 25 26 27 Affymetrix/BASE issues: 28 29 - The fact that BASE loads all raw result in a single table is a 30 problem for Affymetrix as Affymetrix chips soon have ~6 millions 31 probes. 32 33 - Most of the plug-ins/tools freely available for Affymetrix expect 34 the Affymetrix file(s) as input, and since those are in an 35 Affymetrix binary format, it prevents from trying to hack these R 36 packages. 37 38 - the fact that Affymetric has stuff like probeset, match/mismatch 39 oligos. 25 CEL and CDF files are the only ones supported and needed by BASE. 40 26 41 27 … … 79 65 1) Loading Affymatrix probe values into the database is useless. 80 66 67 BASE stores CEL files as raw data, i.e. no data from the cel files 68 are pushed into the database. 69 70 81 71 2) Plug-ins should be able to accept files as input. 72 73 The "RMAExpress plug-in for BASE" works with CEL and CDF files. 74 82 75 83 76 3) BASE should be aware that raw data might not be stored in database 84 77 tables but rather in files. 85 78 79 See 1) 80 81 86 82 4) The dynamic part of BASE should also deal with data storage in 87 83 files or database tables (or even a mix). 84 85 BASE stores files and directory hierarchies, and the RMAExpress 86 plug-ins pushes its result back into database tables. 87 88 88 89 89 Clarification for item 3 and 4: … … 96 96 Tentative list of descisions: 97 97 98 A) Raw Affymetrix data will not be loaded into database99 tables. [Target milestone: now]100 101 B) The Feature* classes/tables added in BASE 2 to allow for storage of102 raw Affymetrix data are deprecated will be removed. [Target103 milestone: BASE 2.0RC2]104 105 C) The CEL-files will be the raw Affymetrix data in BASE 2. This will106 require some modifications to the core (rawdata related). [Target107 milestone: BASE 2.0RC2]108 109 98 D) We have discussed to allow for four different types of imports, 110 99 implemented in the following order: … … 114 103 designs. 115 104 116 i) CEL-file import using a free C++ tool. The idea is (if possible) 117 to convert this to Java and ship it with BASE 2.0. [Target 118 milestone BASE 2.0RC2] 105 CDF files are stored as Reporter maps, i.e., array designs. CEL 106 files are associated with a CDF file when imported as raw 107 data. The RMAExpress plug-in requires a copy of the CDF when it 108 is running. 109 119 110 120 111 ii) Create a CEL-file importer using the available R-packages. This 121 112 will include normalization that provides provides minimum gcrma 122 113 support from within base without other support based on 123 RMAExpress code [Target milestone BASE 2.0]114 RMAExpress code. 124 115 125 iii) CHP-file import. For completeness but may be fairly 126 useless. No target release set since import of CHP data will 127 probably not be supported. [No target milestone] 116 Other normaliziation should be straightforward to incorporate 117 into the RMAExpress plug-in. If the plug-in is expanded, the 118 plug-in started should be extended to allow the user to choose 119 normalization scheme. 128 120 129 The current view on where the imported (non-raw) data should be130 stored in the dynamic tables to allow other plug-ins to manipulate131 the probe sets. Maybe we should in some cases store data in files?132 [Target milestone for last question BASE 2.0+]?133 134 E) extended-properties.xml is data format dependent (as well as135 migration dependent when migrating). The default is 2 channel136 style. NuGO is to create a template for Affymetrix data. [Target137 milestone BASE 2.0RC2]138 139 This means that the extended-properties.xml file should include140 Affymetrix related annotation (e.g. Ensembl gene id, Uniprot id,141 refseq id...)142 121 143 122 F) Allow to create batch of slides in one go from any given Affymetrix 144 design using an import function [Target milestone BASE 2.0RC2]123 design using an import function. 145 124 146 G) Allow for upload and parsing of EXP and RPT files loading into 147 specific fields of relevant parameters (scaling 148 factor...). Philippe to document EXP and RPT format and important 149 tags. [Target milestone ?] 150 151 125 To be done. 152 126 153 127 154 128 Other remarks and links: 155 156 We have a Affymetrix data set available from157 /home/lev/jari/projects/base/base2/testdata/expt_111_cel.zip. The158 experiment is based on Affymetrix design MG_U74Av2. The archive159 contains cell files and a descriptive file formated according to the160 tab2mage specs (http://tab2mage.sourceforge.net/docs/spreadsheet.html)161 129 162 130 There is a resource for Affymetrix designs and annotations provided by … … 164 132 http://www.bioconductor.org/packages/data/annotation/stable/src/contrib/html/ 165 133 166 RMAExpress is available at 167 http://www.stat.berkeley.edu/~bolstad/RMAExpress/RMAExpress.html 168 169 For item D.ii) One could use the annotation data package and use the 170 Reporter Import plugin. This might requires some changes to the 171 Reporter table to make it suitable fro Affymetrix annotation or just 172 map to some of the fields. 173 174 For item D.iii) Chip-to-chip normalization must be done, not possible 175 with .CHP files. This will require a plug-in (with documentation). If 176 R (BioConductor) is to be used we should contact Henrik Bengtsson, 177 creator of a BASE 1.2 R dispatcher. 134 For R (BioConductor) we should contact Henrik Bengtsson, creator of a 135 BASE 1.2 R dispatcher. 178 136 179 137 Future development: Connectivity to R/bioc and develop specific
Note: See TracChangeset
for help on using the changeset viewer.