source: trunk/doc/historical/specifications/core/experiments.html @ 4509

Last change on this file since 4509 was 4509, checked in by Jari Häkkinen, 15 years ago

Addresses #1106. Missed to change reference wherefrom retrive GPLv3 license text. And some other changes.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Date
File size: 9.2 KB
Line 
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<!--
3  $Id: experiments.html 4509 2008-09-11 20:01:44Z jari $
4
5  Copyright (C) 2005 Jari Hakkinen, Nicklas Nordborg
6  Copyright (C) 2006 Jari Hakkinen
7
8  This file is part of BASE - BioArray Software Environment.
9  Available at http://base.thep.lu.se/
10
11  BASE is free software; you can redistribute it and/or
12  modify it under the terms of the GNU General Public License
13  as published by the Free Software Foundation; either version 3
14  of the License, or (at your option) any later version.
15
16  BASE is distributed in the hope that it will be useful,
17  but WITHOUT ANY WARRANTY; without even the implied warranty of
18  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19  GNU General Public License for more details.
20
21  You should have received a copy of the GNU General Public License
22  along with BASE. If not, see <http://www.gnu.org/licenses/>.
23-->
24<html>
25  <head>
26    <title>BASE - Core specification - Experiments and analysis</title>
27  <link rel=stylesheet type="text/css" href="../../styles.css">
28  </head>
29<body>
30
31<div class="navigation">
32  <a href="../../index.html">BASE</a>
33  <img src="../../next.gif">
34  <a href="index.html">Core specification</a>
35  <img src="../../next.gif">
36  Experiments and analysis
37</div>
38
39  <h1>Experiments and analysis</h1>
40
41  <div class="abstract">
42    <p>
43    This document covers the details of how BASE groups data
44      into experiments and performs analysis on it.
45    </p>
46
47    <b>Contents</b><br>
48    <ol>
49    <li><a href="#experiment">Experiment</a>
50    <li><a href="#bioassayset">Bioassayset and bioassay</a>
51    <li><a href="#intensitymeasure">The intensity measure plugin</a>
52    <li><a href="#filtering">Filtering data</a>
53    </ol>
54
55    <p class="authors">
56    <b>Last updated:</b> $Date: 2008-09-11 20:01:44 +0000 (Thu, 11 Sep 2008) $
57    </p>
58  </div>
59
60
61  <a name="experiment">
62  <h2>1. Experiment</h2>
63  </a>
64
65  <ol>
66  <li>An experiment represents an experiment carried out using a set
67    of microarrays, including the analysis steps taken.
68  <li>Raw data sets can be associated with an experiment, and may be
69    dissociated from it at any time.
70  <li>The owner of an experiment also owns all the analysis steps
71    and other information contained in the experiment, and all access
72    control is done on the experiment level.
73  <li>An experiment has a number of channels. This is the number of
74    intensities handled for each spot in the analysis. There is no
75    restriction on the number of channels in the raw data sets
76    associated with an experiment.
77  <li>[Major implementation detail] It does not need to possible to
78    query against more than one experiment at a time, so the bulk of
79    the data for an experiment may be stored in a set of tables
80    created specifically for that one experiment.
81  </ol>
82
83
84  <a name="bioassayset">
85  <h2>2. Bioassayset and bioassay</h2>
86  </a>
87
88  <ol>
89  <li>A bioassay represents a set of measurements across a number of
90    features, reporters, or other entities. Typically, it represents
91    intensities measured for the spots of a raw data set.
92  <li>A bioassay consists of a number of spots. Each spot has:
93    <ul>
94    <li>a position number, unique within the bioassay
95    <li>a reporter
96    <li>as many intensity values as the experiment has channels
97    </ul>
98    In addition to this, extra values may be attached. See below for
99    details on this.
100  <li>A bioassay always exists as part of a single bioassayset.
101  <li>The bioassaysets of an experiment form a forest of bipartite
102    trees, with a transformation separating a bioassayset
103    from its parent bioassayset.
104  <li>A transformation represents a filtering of the data
105    in a bioassayset (in which case it has a single child bioassayset),
106    or an arbitrary transformation (in which case there may be zero
107    or more child bioassaysets), or the extraction of intensity values
108    from the raw data.
109  <li>If a bioassayset is not at the root level (i.e., if its parent
110    transformation is not a root), its
111    bioassays each have a set of parents, which must be part of the
112    bioassayset's parent bioassayset.
113  <li>Each bioassay points to the set of raw data sets from which its
114    intensity values are derived.
115  <li>A root bioassayset may be created from any non-empty set of
116    raw data sets that are associated with the experiment. This
117    creation should be handled by a plugin, as it may be a complex
118    task. A plugin for the most common and simple case is described
119    in the next section.
120  <li>A bioassayset may be marked as containing log ratios rather than
121    intensity values. This information is meant to be used by clients
122    only, and may be useful when the bioassays are created as
123    comparisons between pairs of bioassays.
124
125  <li>Bioassays are annotatable, and should inherit annotations from
126    their upstream biomaterials, raw data sets and array slides...
127
128  <li>Some transformations need to merge spots. This means that in the
129    general case positions alone will not be enough to identify the
130    parent spot(s) of a bioassay's spots. Therefore there should be
131    a position mapping table, where positions on a bioassay are
132    mapped to positions on its parent bioassay(s).
133  <li>Either all bioassays of a bioassayset use the mapping table, or
134    none of them do.
135  <li>There is a similar table for mapping to positions on
136    the raw data sets. A position on a bioassay may map to multiple
137    raw spots (it may also do this merely by being associated with
138    multiple raw data sets).
139  <li>Either all bioassays of a bioassayset use the raw mapping table,
140    or none of them do.
141  <li>A root bioassayset may use the raw mapping table. If a bioassayset
142    uses the raw mapping table, its descendants must also do so.
143  <li>If a bioassayset uses the raw mapping table, each of its bioassays
144    may hold the id of an ancestor which had the same raw mapping as
145    itself. This will make it possible to avoid unnecessary duplication
146    of raw mappings, in the case that the transformation is a filtering
147    which does not operate on the raw data.
148  <li>When a root bioassayset is created, its bioassay's position should
149    if possible uniquely define features on the array designs used for
150    the raw data sets. If a lack of LIMS information makes this
151    impossible, the positions should at the very least uniquely
152    define reporters. This means that if two bioassays are created from
153    raw data sets which have different array designs, they should have
154    non-overlapping position numbers, but if there is no array design
155    information the positions should at least be remapped so that no
156    two spots have the same position but different reporters. If any
157    bioassay spot ends up with a different position than the
158    corresponding raw spot, then the bioassayset must be stored with
159    a raw position mapping.
160  <li>Extra values may be attached to the spots of a bioassayset. A
161    privileged user must first define data types for extra values,
162    for example "standard deviation" or "error measure xyz". Other
163    value types may be allowed in the future, but for now it will be
164    enough to allow floating-point values. Each bioassayset has a
165    list of the extra data types its spots have, and each spot must
166    have exactly one value of each such data type (and NULL should be
167    allowed).
168  <li>To make it easier to retain extra values though the analysis
169    steps, a bioassayset's list of extra data types may for each
170    extra data type point to an ancestor bioassayset whose spots
171    already have that extra data type attached. This of course requires
172    that the spot positions have not been remapped between the two
173    bioassaysets, and that the old bioassayset's extra values are
174    still valid for the newer bioassayset.
175  <li>Values may be attached to the positions of a bioassayset. As with
176    values attached to spots, an admin must create the data type. All
177    positions must have an attached value, and again the list of
178    attached data types for a bioassayset may point to ancestor
179    bioassaysets that have the same attached values.
180  <li class=question>[Q] Do we need to duplicate the previous point
181    for per-reporter data? Maybe we're OK now that positions map
182    uniquely to reporters anyway?
183  </ol>
184
185
186  <a name="intensitymeasure">
187  <h2>3. The intensity measure plugin</h2>
188  </a>
189
190  <ol>
191  <li>This is described here because there's nowhere better to place
192    it at the moment.
193  <li>Extracting one intensity value per channel from the raw data set
194    is not entirey trivial, as there may be many measures of intensity
195    available, and different ways of doing background correction.
196    A (properly privileged) user should be able to define intensity
197    measures for a given raw data type and number of channels. For
198    each channel, an intensity measure consists of a set of:
199      <ul>
200      <li>column specification
201      <li>coefficient
202      <li>flag: spot value or mean over raw data set
203      </ul>
204    That is, an intensity measure says (for each channel) what columns
205    should be used, how much it should contribute to the intensity
206    value, and whether the mean over all spots should be used instead
207    of the value for each individual spot.
208  <li>The bioassays of a root bioassayset may be created from different
209    intensity measures, and each such bioassay should hold information
210    about what intensity measure was used to create it.
211  </ol>
212
213
214  <a name="filtering">
215  <h2>4. Filtering data</h2>
216  </a>
217
218  <ol>
219  <li>The task of the filtering system is to filter the spots of one
220    bioassayset, producing a new bioassayset.
221  </ol>
222 
223
224</body>
225</html>
226
Note: See TracBrowser for help on using the repository browser.