1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
---|
2 | <!-- |
---|
3 | $Id: experiments.html 4509 2008-09-11 20:01:44Z jari $ |
---|
4 | |
---|
5 | Copyright (C) 2005 Jari Hakkinen, Nicklas Nordborg |
---|
6 | Copyright (C) 2006 Jari Hakkinen |
---|
7 | |
---|
8 | This file is part of BASE - BioArray Software Environment. |
---|
9 | Available at http://base.thep.lu.se/ |
---|
10 | |
---|
11 | BASE is free software; you can redistribute it and/or |
---|
12 | modify it under the terms of the GNU General Public License |
---|
13 | as published by the Free Software Foundation; either version 3 |
---|
14 | of the License, or (at your option) any later version. |
---|
15 | |
---|
16 | BASE is distributed in the hope that it will be useful, |
---|
17 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
---|
18 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
---|
19 | GNU General Public License for more details. |
---|
20 | |
---|
21 | You should have received a copy of the GNU General Public License |
---|
22 | along with BASE. If not, see <http://www.gnu.org/licenses/>. |
---|
23 | --> |
---|
24 | <html> |
---|
25 | <head> |
---|
26 | <title>BASE - Core specification - Experiments and analysis</title> |
---|
27 | <link rel=stylesheet type="text/css" href="../../styles.css"> |
---|
28 | </head> |
---|
29 | <body> |
---|
30 | |
---|
31 | <div class="navigation"> |
---|
32 | <a href="../../index.html">BASE</a> |
---|
33 | <img src="../../next.gif"> |
---|
34 | <a href="index.html">Core specification</a> |
---|
35 | <img src="../../next.gif"> |
---|
36 | Experiments and analysis |
---|
37 | </div> |
---|
38 | |
---|
39 | <h1>Experiments and analysis</h1> |
---|
40 | |
---|
41 | <div class="abstract"> |
---|
42 | <p> |
---|
43 | This document covers the details of how BASE groups data |
---|
44 | into experiments and performs analysis on it. |
---|
45 | </p> |
---|
46 | |
---|
47 | <b>Contents</b><br> |
---|
48 | <ol> |
---|
49 | <li><a href="#experiment">Experiment</a> |
---|
50 | <li><a href="#bioassayset">Bioassayset and bioassay</a> |
---|
51 | <li><a href="#intensitymeasure">The intensity measure plugin</a> |
---|
52 | <li><a href="#filtering">Filtering data</a> |
---|
53 | </ol> |
---|
54 | |
---|
55 | <p class="authors"> |
---|
56 | <b>Last updated:</b> $Date: 2008-09-11 20:01:44 +0000 (Thu, 11 Sep 2008) $ |
---|
57 | </p> |
---|
58 | </div> |
---|
59 | |
---|
60 | |
---|
61 | <a name="experiment"> |
---|
62 | <h2>1. Experiment</h2> |
---|
63 | </a> |
---|
64 | |
---|
65 | <ol> |
---|
66 | <li>An experiment represents an experiment carried out using a set |
---|
67 | of microarrays, including the analysis steps taken. |
---|
68 | <li>Raw data sets can be associated with an experiment, and may be |
---|
69 | dissociated from it at any time. |
---|
70 | <li>The owner of an experiment also owns all the analysis steps |
---|
71 | and other information contained in the experiment, and all access |
---|
72 | control is done on the experiment level. |
---|
73 | <li>An experiment has a number of channels. This is the number of |
---|
74 | intensities handled for each spot in the analysis. There is no |
---|
75 | restriction on the number of channels in the raw data sets |
---|
76 | associated with an experiment. |
---|
77 | <li>[Major implementation detail] It does not need to possible to |
---|
78 | query against more than one experiment at a time, so the bulk of |
---|
79 | the data for an experiment may be stored in a set of tables |
---|
80 | created specifically for that one experiment. |
---|
81 | </ol> |
---|
82 | |
---|
83 | |
---|
84 | <a name="bioassayset"> |
---|
85 | <h2>2. Bioassayset and bioassay</h2> |
---|
86 | </a> |
---|
87 | |
---|
88 | <ol> |
---|
89 | <li>A bioassay represents a set of measurements across a number of |
---|
90 | features, reporters, or other entities. Typically, it represents |
---|
91 | intensities measured for the spots of a raw data set. |
---|
92 | <li>A bioassay consists of a number of spots. Each spot has: |
---|
93 | <ul> |
---|
94 | <li>a position number, unique within the bioassay |
---|
95 | <li>a reporter |
---|
96 | <li>as many intensity values as the experiment has channels |
---|
97 | </ul> |
---|
98 | In addition to this, extra values may be attached. See below for |
---|
99 | details on this. |
---|
100 | <li>A bioassay always exists as part of a single bioassayset. |
---|
101 | <li>The bioassaysets of an experiment form a forest of bipartite |
---|
102 | trees, with a transformation separating a bioassayset |
---|
103 | from its parent bioassayset. |
---|
104 | <li>A transformation represents a filtering of the data |
---|
105 | in a bioassayset (in which case it has a single child bioassayset), |
---|
106 | or an arbitrary transformation (in which case there may be zero |
---|
107 | or more child bioassaysets), or the extraction of intensity values |
---|
108 | from the raw data. |
---|
109 | <li>If a bioassayset is not at the root level (i.e., if its parent |
---|
110 | transformation is not a root), its |
---|
111 | bioassays each have a set of parents, which must be part of the |
---|
112 | bioassayset's parent bioassayset. |
---|
113 | <li>Each bioassay points to the set of raw data sets from which its |
---|
114 | intensity values are derived. |
---|
115 | <li>A root bioassayset may be created from any non-empty set of |
---|
116 | raw data sets that are associated with the experiment. This |
---|
117 | creation should be handled by a plugin, as it may be a complex |
---|
118 | task. A plugin for the most common and simple case is described |
---|
119 | in the next section. |
---|
120 | <li>A bioassayset may be marked as containing log ratios rather than |
---|
121 | intensity values. This information is meant to be used by clients |
---|
122 | only, and may be useful when the bioassays are created as |
---|
123 | comparisons between pairs of bioassays. |
---|
124 | |
---|
125 | <li>Bioassays are annotatable, and should inherit annotations from |
---|
126 | their upstream biomaterials, raw data sets and array slides... |
---|
127 | |
---|
128 | <li>Some transformations need to merge spots. This means that in the |
---|
129 | general case positions alone will not be enough to identify the |
---|
130 | parent spot(s) of a bioassay's spots. Therefore there should be |
---|
131 | a position mapping table, where positions on a bioassay are |
---|
132 | mapped to positions on its parent bioassay(s). |
---|
133 | <li>Either all bioassays of a bioassayset use the mapping table, or |
---|
134 | none of them do. |
---|
135 | <li>There is a similar table for mapping to positions on |
---|
136 | the raw data sets. A position on a bioassay may map to multiple |
---|
137 | raw spots (it may also do this merely by being associated with |
---|
138 | multiple raw data sets). |
---|
139 | <li>Either all bioassays of a bioassayset use the raw mapping table, |
---|
140 | or none of them do. |
---|
141 | <li>A root bioassayset may use the raw mapping table. If a bioassayset |
---|
142 | uses the raw mapping table, its descendants must also do so. |
---|
143 | <li>If a bioassayset uses the raw mapping table, each of its bioassays |
---|
144 | may hold the id of an ancestor which had the same raw mapping as |
---|
145 | itself. This will make it possible to avoid unnecessary duplication |
---|
146 | of raw mappings, in the case that the transformation is a filtering |
---|
147 | which does not operate on the raw data. |
---|
148 | <li>When a root bioassayset is created, its bioassay's position should |
---|
149 | if possible uniquely define features on the array designs used for |
---|
150 | the raw data sets. If a lack of LIMS information makes this |
---|
151 | impossible, the positions should at the very least uniquely |
---|
152 | define reporters. This means that if two bioassays are created from |
---|
153 | raw data sets which have different array designs, they should have |
---|
154 | non-overlapping position numbers, but if there is no array design |
---|
155 | information the positions should at least be remapped so that no |
---|
156 | two spots have the same position but different reporters. If any |
---|
157 | bioassay spot ends up with a different position than the |
---|
158 | corresponding raw spot, then the bioassayset must be stored with |
---|
159 | a raw position mapping. |
---|
160 | <li>Extra values may be attached to the spots of a bioassayset. A |
---|
161 | privileged user must first define data types for extra values, |
---|
162 | for example "standard deviation" or "error measure xyz". Other |
---|
163 | value types may be allowed in the future, but for now it will be |
---|
164 | enough to allow floating-point values. Each bioassayset has a |
---|
165 | list of the extra data types its spots have, and each spot must |
---|
166 | have exactly one value of each such data type (and NULL should be |
---|
167 | allowed). |
---|
168 | <li>To make it easier to retain extra values though the analysis |
---|
169 | steps, a bioassayset's list of extra data types may for each |
---|
170 | extra data type point to an ancestor bioassayset whose spots |
---|
171 | already have that extra data type attached. This of course requires |
---|
172 | that the spot positions have not been remapped between the two |
---|
173 | bioassaysets, and that the old bioassayset's extra values are |
---|
174 | still valid for the newer bioassayset. |
---|
175 | <li>Values may be attached to the positions of a bioassayset. As with |
---|
176 | values attached to spots, an admin must create the data type. All |
---|
177 | positions must have an attached value, and again the list of |
---|
178 | attached data types for a bioassayset may point to ancestor |
---|
179 | bioassaysets that have the same attached values. |
---|
180 | <li class=question>[Q] Do we need to duplicate the previous point |
---|
181 | for per-reporter data? Maybe we're OK now that positions map |
---|
182 | uniquely to reporters anyway? |
---|
183 | </ol> |
---|
184 | |
---|
185 | |
---|
186 | <a name="intensitymeasure"> |
---|
187 | <h2>3. The intensity measure plugin</h2> |
---|
188 | </a> |
---|
189 | |
---|
190 | <ol> |
---|
191 | <li>This is described here because there's nowhere better to place |
---|
192 | it at the moment. |
---|
193 | <li>Extracting one intensity value per channel from the raw data set |
---|
194 | is not entirey trivial, as there may be many measures of intensity |
---|
195 | available, and different ways of doing background correction. |
---|
196 | A (properly privileged) user should be able to define intensity |
---|
197 | measures for a given raw data type and number of channels. For |
---|
198 | each channel, an intensity measure consists of a set of: |
---|
199 | <ul> |
---|
200 | <li>column specification |
---|
201 | <li>coefficient |
---|
202 | <li>flag: spot value or mean over raw data set |
---|
203 | </ul> |
---|
204 | That is, an intensity measure says (for each channel) what columns |
---|
205 | should be used, how much it should contribute to the intensity |
---|
206 | value, and whether the mean over all spots should be used instead |
---|
207 | of the value for each individual spot. |
---|
208 | <li>The bioassays of a root bioassayset may be created from different |
---|
209 | intensity measures, and each such bioassay should hold information |
---|
210 | about what intensity measure was used to create it. |
---|
211 | </ol> |
---|
212 | |
---|
213 | |
---|
214 | <a name="filtering"> |
---|
215 | <h2>4. Filtering data</h2> |
---|
216 | </a> |
---|
217 | |
---|
218 | <ol> |
---|
219 | <li>The task of the filtering system is to filter the spots of one |
---|
220 | bioassayset, producing a new bioassayset. |
---|
221 | </ol> |
---|
222 | |
---|
223 | |
---|
224 | </body> |
---|
225 | </html> |
---|
226 | |
---|