source: trunk/doc/historical/specifications/core/hybridizations.html @ 4509

Last change on this file since 4509 was 4509, checked in by Jari Häkkinen, 15 years ago

Addresses #1106. Missed to change reference wherefrom retrive GPLv3 license text. And some other changes.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id Date
File size: 8.9 KB
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
3  $Id: hybridizations.html 4509 2008-09-11 20:01:44Z jari $
5  Copyright (C) 2005 Jari Hakkinen, Nicklas Nordborg
6  Copyright (C) 2006 Jari Hakkinen
8  This file is part of BASE - BioArray Software Environment.
9  Available at
11  BASE is free software; you can redistribute it and/or
12  modify it under the terms of the GNU General Public License
13  as published by the Free Software Foundation; either version 3
14  of the License, or (at your option) any later version.
16  BASE is distributed in the hope that it will be useful,
17  but WITHOUT ANY WARRANTY; without even the implied warranty of
19  GNU General Public License for more details.
21  You should have received a copy of the GNU General Public License
22  along with BASE. If not, see <>.
25  <head>
26    <title>BASE - Core specification - Hybridizations and raw data</title>
27  <link rel=stylesheet type="text/css" href="../../styles.css">
28  </head>
31<div class="navigation">
32  <a href="../../index.html">BASE</a>
33  <img src="../../next.gif">
34  <a href="index.html">Core specification</a>
35  <img src="../../next.gif">
36  Hybridizations
39  <h1>Hybridizations and raw data</h1>
41  <div class="abstract">
42    <p>
43    This document covers the details of how hybridizations and
44      raw data is handled by BASE.
45    </p>
47    <b>Contents</b><br>
48    <ol>
49    <li><a href="#hybridizations">Hybridizations</a>
50    <li><a href="#scans">Scans</a>
51    <li><a href="#rawdatasets">Raw data sets</a>
52    <li><a href="#rawdata">Raw data</a>
53    <li><a href="#spotimages">Spot images</a>
54    </ol>
56    <b>See also</b><br>
57    <ul>
58    <li><a href="../../development/overview/data/hybridizations.html">Implementation overview</a>
59    </ul>
61    <p class="authors">
62    <b>Last updated:</b> $Date: 2008-09-11 20:01:44 +0000 (Thu, 11 Sep 2008) $
63    </p>
64  </div>
66  <a name="hybridizations">
67  <h2>1. Hybridizations</h2>
68  </a>
70  <ol>
71  <li>A hybridization is attached to a <span class="invalid">list</span>
72    set of labeled extracts.
73    <span class="invalid">The same labeled extract may be used several times.
74    The position
75    in the list does not mean anything to BASE, but may be used by
76    plugins subsequently used to create derived data from raw data.</span>
77  <li>A hybridization may be attached to an array slide.
78  <li>The hybridization may be dissociated from the array slide and the
79    labeled extracts at any time.
80  <li>A hybridization protocol <span class="invalid">must</span> may be picked.
81  <li>A hybridiation may be annotated, but annotations on its
82    labeled extracts are not transferred to it.
83  </ol>
85  <a name="scans">
86  <h2>2. Scans</h2>
87  </a>
89  <ol>
90  <li>A scan (or image acquisition) represents the scanning of a slide.
91  <li>A hybridization may have any number of scans.
92  <li>A scan is associated with a scanner as well as with a scanning
93    protocol.
94  <li>Images may be attached to a scan.
95  <li>An image consists of a pointer to an uploaded file,
96    <span class="invalid">information about what channel(s) it has to do with</span>,
97    what format it's in (TIFF or JPEG), whether it's a preview or the full
98    image, <span class="invalid">and whether it should be used for generating spot images</span>.
99  </ol>
101  <a name="rawdatasets">
102  <h2>3. Raw data sets</h2>
103  </a>
105  <ol>
106  <li>A raw data set describes the result of applying some software
107    to a set of images (obtained from scanning a microarray) in order
108    to quantify the spots and identify them with features or reporters.
109    This includes the generated spot quantifications, which we refer
110    to as raw data.
111  <li>A raw data set normally belongs to a scan, but it should also
112    be possible to create raw data sets with no connection to a
113    scan.
114  <li>It should be possible to attach a scan-less raw data set to a
115    scan at a later stage.
116  <li>A raw data set <span class="invalid">is</span> can be associated with an software item.
117  <li>The file(s) generated by the software should be attached to
118    the raw data set. Typically this is one file, which we refer to
119    as a raw result file. <span class="note">[NOTE] As it is implemented, only
120    one file can be attached.</span>
121  <li>A raw data set may point to the array design it has to do
122    with, if any, <span class="invalid">but only if the array design has features</span>.
123  <li>The array design of a raw data set is typically that of its
124    hybridization's array slide, but it doesn't have to be. A raw
125    data set created without connection to a scan may still point to
126    an array design.
127  </ol>
129  <a name="rawdata">
130  <h2>4. Raw data</h2>
131  </a>
133  <ol>
134  <li>Because different software produces different sets of spot
135    measurements, it should be possible to define new types of raw
136    data.
137  <li>There is a single table which is used for all raw data types,
138    in which information common to all types is stored.
139  <li>The columns common to all types of raw data are at least:
140    <ul>
141    <li>id of the raw data set
142    <li>position in the raw data set (typically N for the Nth spot
143      in a raw data file)
144    <li>id of the reporter thought to occupy the spot
145    <li>id of the feature which corresponds to this spot, if any.
146      This is only allowed if the raw data set has an array design,
147      and the features must match the spot's coordinates and reporter.
148    <li>physical coordinates of the spot (possibly in pixels)
149    <li>grid coordinates of the spot, including meta coordinates.
150    <li class="invalid">user-provided flagging (see below)
151    </ul>
153  <li>For each type of raw data, there is one table with type-specific
154    data. This table is described in detailed in the database. For
155    each column, the following is recorded:
156    <ol>
157    <li>Column name and type
158    <li class="invalid">Whether the column holds an intensity, a standard deviation,
159      or neither
160    <li class="invalid">Whether the column holds a foreground value, a background value,
161      or neither
162    <li class="invalid">Whether the column holds a mean or median or neither
163    <li class="invalid">An optional label id, in the case that the raw data type
164      concerns itself with labels.
165    </ol>
167  <li><span class="invalid">In the table with raw data type specific columns, the spots should
168    be identified by raw data set and position.</span> We use the id of the rawdata entry.
170  <li class="invalid">The raw data set should store not only what type of raw
171    data it contains, but also which of the type-specific and
172    non-type-specific columns it uses.
174  <li class="invalid">Raw spots may be flagged/commented by users. There should be a
175    table with possible comments (modifiable by some users only), and
176    each spot may point to such a comment. This is the only property of
177    a raw spot that may change after the raw data set is added. The raw
178    data set should know the datetime of the last change to one of its
179    spots.
180  </ol>
183  <a name="spotimages">
184  <h2>5. Spot images</h2>
185  </a>
187  <ol>
188  <li>By spot images we mean small images of the individual spots
189    of a raw data set, meant to convey information about the
190    morphology of spots. These images are meant to be shown to users,
191    often many at a time from an arbitrary set of spots.
193  <li>It should be possible to generate spot images from a raw data
194    set whose spots have physical coordinates specified, if it is
195    connected to a scan which has sufficient images attached, and
196    if it has no more than three channels. If it has more than three
197    channels a user may select up to three images for the spot
198    image generation.
200  <li>By sufficient images we mean one high-resolution TIFF image
201    per channel, possibly stored in a single file.
203  <li>The user may need to enter information about how to scale and
204    offset the physical spot coordinates to get the corresponding
205    image coordinates. This information might be extracted from the
206    raw result file (or from the images).
208  <li>The size of the area to cut out for each spot image needs to
209    be given by the user.
211  <li>The input images cannot be visualized without modification, as
212    the dynamic range of the scanner far exceeds that of the user's
213    screen, and most spots would be completely black if rescaled to
214    8 bits per gun. Therefore, the colors of each spot should be
215    rescaled to use the full intensity range, with the same rescaling
216    done on all 1-3 channels. Gamma correction may also be applied
217    before going to 8 bpg.
219  <li>The spot images should be saved as JPEG or some other format
220    with good compression. To avoid an excessive number of small
221    space-wasting files, they should be lumped together in reasonable
222    numbers before compression. With JPEG, this means that the
223    each spot image should be a square with side divisible by 8
224    pixels (to avoid interference between spots).
226  <li>The scales, offsets, spotsize, gamma correction and JPEG quality
227    value should be stored in the database, along with the identities
228    of the images used to create spot images.
230  <li>It should be possible to remove and re-generate spot images.
231    When no spot images exist for a raw data set, the parameters for
232    generating them may be altered, but when spot images exist they
233    may not.
234  </ol>
Note: See TracBrowser for help on using the repository browser.