source: trunk/doc/src/docbook/user/rawbioassays.xml @ 5783

Last change on this file since 5783 was 5783, checked in by Nicklas Nordborg, 10 years ago

References #1590: Documentation cleanup

Experiments and analysis chapter.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Date Id
File size: 16.3 KB
Line 
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE sect1 PUBLIC
3    "-//Dawid Weiss//DTD DocBook V3.1-Based Extension for XML and graphics inclusion//EN"
4    "../../../../lib/docbook/preprocess/dweiss-docbook-extensions.dtd">
5<!--
6  $Id: rawbioassays.xml 5783 2011-10-05 10:34:24Z nicklas $
7
8  Copyright (C) 2007 Peter Johansson, Nicklas Nordborg, Martin Svensson
9
10  This file is part of BASE - BioArray Software Environment.
11  Available at http://base.thep.lu.se/
12
13  BASE is free software; you can redistribute it and/or
14  modify it under the terms of the GNU General Public License
15  as published by the Free Software Foundation; either version 3
16  of the License, or (at your option) any later version.
17
18  BASE is distributed in the hope that it will be useful,
19  but WITHOUT ANY WARRANTY; without even the implied warranty of
20  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
21  GNU General Public License for more details.
22
23  You should have received a copy of the GNU General Public License
24  along with BASE. If not, see <http://www.gnu.org/licenses/>.
25-->
26
27<sect1 id="experiments_analysis.rawbioassay">
28  <?dbhtml filename="rawbioassays.html" ?>
29  <title>Raw bioassays</title>
30  <para>
31    A <guilabel>Raw bioassay</guilabel> is the representation
32    of the result of analyzing data from the physical bioassay
33    down to the point where we have a file or a set of files
34    containing measurements per feature (eg. spot, gene, etc.)
35    for a single sample or extract. Further analysis is usually
36    needed before we can say something about individual features
37    or samples and how they relate to each other. This
38    kind of analisys is done in <guilabel>Experiments</guilabel>.
39    See <xref linkend="experiments_analysis.experiments" />.
40  </para>
41 
42  <para>
43    The term <guilabel>Raw bioassay</guilabel> is bit misleading since the
44    real "raw data" is actually the images from a microarray scan or the
45    output from a sequencer. For historical reasons we have chosen to keep
46    the term raw bioassay since this represents the first possibility for
47    a transition between file-base data and database-stored data. Typically,
48    all pre-rawbioassay analysis is done outside of BASE, and although
49    we now have the possibility to track this in detail, it will
50    probably remain so for some time in the future. See
51    <xref linkend="experiments_analysis.derivedbioassays" />.
52 
53  </para>
54 
55  <sect2 id="experiments_analysis.rawbioassay.create">
56    <title>Create raw bioassays</title>
57    <para>
58      Creating a new raw bioassay is a two- or three-step process:
59    </para>
60 
61    <orderedlist>
62      <listitem>
63        <para>
64          Create a new raw bioassay item with the &gbNew; button in the raw bioassays list view.
65          It is also possible to create raw bioassays from the derived bioassays
66          list- and single view- page.
67        </para>
68      </listitem>
69      <listitem>
70        <para>
71        Upload the file(s) with the raw data and attach them to the
72        raw bioassay.
73        </para>
74      </listitem>
75      <listitem>
76        <para>
77          The used platform may require that data is imported to the database.
78          See <xref linkend="import_data" />. If the platform is a
79          file-only platform, this step can be skipped.
80        </para>
81      </listitem>
82    </orderedlist>
83 
84    <note>
85      <title>Supported file formats</title>
86      BASE has built-in support for most file formats where the data comes
87      in a tab-separated (or similar) form. Data for one raw bioassay
88      must be in a single file. Support for other file formats
89      may be added through plug-ins.
90    </note>
91  </sect2>
92 
93  <sect2 id="experiments_analysis.rawbioassay.properties">
94    <title>Raw bioassay properties</title>
95   
96    <figure
97      id="experiments_analysis.figures.rawbioassay.edit">
98      <title>Raw bioassay properties</title>
99      <screenshot>
100        <mediaobject>
101          <imageobject>
102            <imagedata
103              fileref="figures/rawbioassay_edit.png" format="PNG" />
104          </imageobject>
105        </mediaobject>
106      </screenshot>
107    </figure> 
108
109    <helptext external_id="rawbioassay.edit" title="Edit raw bioassay">
110   
111    <variablelist>
112      <varlistentry>
113        <term><guilabel>Name</guilabel></term>
114        <listitem>
115          <para>
116            The name of the raw bioassay.
117          </para>
118        </listitem>
119      </varlistentry>
120      <varlistentry>
121        <term>
122          <guilabel>Platform</guilabel>
123        </term>
124        <listitem>
125          <para>
126            Select the platform / variant used for the
127            raw bioassay. The selected options affects which
128            files that can be selected on the <guilabel>Data files</guilabel>
129            tab. If the platform supports importing data to the database
130            you must also select a <guilabel>Raw data type</guilabel>.
131          </para>
132        </listitem>
133      </varlistentry>
134      <varlistentry>
135        <term><guilabel>Raw data type</guilabel></term>
136        <listitem>
137          <para>
138            The type of raw data. This option is disabled for file-only
139            platforms and for platforms that are locked to a specific
140            raw data type. This cannot be changed after raw data has been
141            imported. <nohelp>See
142            <xref linkend="experiments_analysis.rawdatatypes" />.</nohelp>
143          </para>
144        </listitem>
145      </varlistentry>
146      <varlistentry>
147        <term><guilabel>Parent bioassay</guilabel></term>
148        <listitem>
149          <para>
150            The derived bioassay that is the parent of this
151            raw bioassay.
152          </para>
153        </listitem>
154      </varlistentry>
155      <varlistentry>
156        <term><guilabel>Parent extract</guilabel></term>
157        <listitem>
158          <para>
159            The extract which this raw bioassay has measured. This is normally selected
160            among the extracts that are linked with the physical bioassay that this
161            raw bioassay is coming from. Selecting the correct extract is important if the
162            physical bioassay contains more than one extract, since otherwise it may affect
163            how annotations are inherited and used in downstream analysis.
164          </para>
165        </listitem>
166      </varlistentry>
167      <varlistentry>
168        <term><guilabel>Array design</guilabel></term>
169        <listitem>
170          <para>
171            The array design used on the array slide (optional).
172            If an array design is specified
173            the import will verify that the raw data has
174            the same reporter on the same position. This
175            prevents mistakes but also speed up analysis
176            since some optimizations can be used when assigning
177            positions in bioassay sets.
178            The array design can be changed after raw data has been
179            imported, but this triggers a new validation. If the raw data
180            is stored in the database, the features on the new array design must
181            match the the raw data. The verification can use three different methods:
182          </para>
183         
184          <itemizedlist>
185          <listitem>
186            <para>
187            Coordinates: Verify block, meta-grid, row and column coordinates.
188            </para>
189          </listitem>
190          <listitem>
191            <para>Position: Verify the position number.</para>
192          </listitem>
193          <listitem>
194            <para>
195            Feature ID: Verify the feature ID. This option can only be used
196            if the raw bioassay is currently connected to an array design that
197            has feature ID values already.
198            </para>
199          </listitem>
200          </itemizedlist>
201          <para>
202            In all three cases it is also verified that the reporter of the raw
203            data matches the reporter of the features.
204          </para>
205
206          <para>
207            For Affymetrix data, the
208            CEL file is validated against the CDF file of the new array design.
209            If the validation fails, the array design is not changed.
210          </para>
211        </listitem>
212      </varlistentry>
213      <varlistentry>
214        <term><guilabel>Software</guilabel></term>
215        <listitem>
216          <para>
217            The software used to generate the raw data (optional).
218          </para>
219        </listitem>
220      </varlistentry>
221      <varlistentry>
222        <term><guilabel>Protocol</guilabel></term>
223        <listitem>
224          <para>
225            The protocol used when generating the raw data (optional).
226            Software parameters may be registered as part of
227            the protocol.
228          </para>
229        </listitem>
230      </varlistentry>
231      <varlistentry>
232        <term><guilabel>Description</guilabel></term>
233        <listitem>
234          <para>
235            A description of the raw bioassay (optional).
236          </para>
237        </listitem>
238      </varlistentry>
239    </variablelist>
240   
241    <seeother>
242      <other external_id="datafiles.edit">Data files</other>
243      <other external_id="annotations.edit">Annotations</other>
244      <other external_id="annotations.edit.inerited">Inherit annotations</other>
245    </seeother>
246    </helptext> 
247       
248    <sect3 id="experiment_analysis.rawbioassay.datafiles">
249      <title>Data files</title>
250      <para>
251        This allows you to select files that contain data for the raw bioassay.
252        Read more about this in <xref linkend="platforms.selectfiles" />.
253      </para>
254    </sect3>
255   
256    <sect3 id="experiment_analysis.rawbioassay.annotations">
257      <title>Annotations and inherited annotations</title>
258      <para>
259        This allows you to input values associated to annotation types devised to refine
260        the description. Read more about annotations in
261        <xref linkend="annotations" />.
262      </para>
263    </sect3>
264   
265  </sect2>
266 
267  <sect2 id="experiments_analysis.rawbioassay.rawdata">
268    <title>Import raw data</title>
269    <para>
270      Depending on the platform, raw data may have to be imported after
271      you have created the raw bioassay item. This section doesn't apply
272      to file-only platforms. The import is handled by plug-ins. To start
273      the import click on the <guibutton>Import&hellip;</guibutton>
274      button on the single-item view for the raw bioassay.
275      If this button does not appear it may be because no file
276      format has been specified for the raw data type used by the
277      raw bioassay or that the logged in user does not have permission
278      to use the import plug-in or file format.
279      See <xref linkend="import_data" /> for more
280      information.
281    </para>
282   
283    <note>
284      <title>File-only platforms</title>
285      File-only platforms, such as Affymetrix, is handled differently and data is not
286      imported into the database.
287    </note>
288   
289  </sect2>
290
291  <sect2 id="experiments_analysis.rawdatatypes">
292    <title>Raw data types</title>
293   
294    <para>
295      A raw data type defines the types of measured values that can be stored
296      for individual features in the database. Usually this includes some
297      kind of foreground and background intensity values. The number and meaning
298      of the values usually depends on the hardware and software used to analyze
299      the data from the experiment. Many tools provide mean and median values,
300      standard deviations, quality control information, etc. Since there are so
301      many existing tools with many different data file formats BASE uses a
302      separate database table for each raw data type to store data. The raw data
303      tables have been optimized for the type of raw data they can hold and only
304      has the columns that are needed to store the data. BASE ships with a large
305      number of pre-defined raw data types. An administrator may also define
306      additional raw data type. See <xref linkend="appendix.rawdatatypes" /> 
307      for more information.
308    </para>
309   
310    <sect3 id="experiments_analysis.fileonly">
311      <title>File-only platforms</title>
312      <para>
313        In some cases it doesn't make sense to import any data into the
314        database. The main reason is that performance will suffer as the
315        number of entries in the database gets higher. A typical Genepix file
316        contains ~55K spots while an Affymetrix file may have millions.
317      </para>
318      <para>
319        The drawback of keeping the data in files is that none of the generic
320        tools in BASE can read it. Special plug-ins must be developed for each
321        type of data file that can be used to analyze and visualize the data.
322        For the Affymetrix platform there are implementations of the RMAExpress
323        and Plier normalizations available on the BASE plug-ins web site.
324        BASE also ships with built-in plug-ins for extracting metadata from
325        Affymetrix CEL and CDF files (ie. headers, number of spots, etc).
326      </para>
327      <para>
328        Users of other file-only platforms should check the BASE plug-ins
329        website for plug-ins related to their platform. If they can't
330        find any we recommend that they try to find other users of the same
331        platform and try to cooperate in developing the required tools and
332        plug-ins.
333      </para>
334    </sect3>
335   
336  </sect2>
337
338  <sect2 id="experiments_analysis.spotimages">
339    <title>Spot images</title>
340    <para>
341      This section only applies to microarray platforms where
342      a coordinate system is used to identify spots on the
343      array slides. The raw data must contain X and Y
344      coordinates of each spot.
345    </para>
346   
347    <para>
348      After raw data has been imported into the database
349      you will find that a <guibutton>Create spot images&hellip;</guibutton>
350      button appears in the toolbar on the single-item view
351      for the raw bioassay. Click on this button to open
352      a window that allows you to specify parameters for the
353      spot image extraction.
354    </para>
355   
356    <figure id="experiments_analysis.figures.spotimage">
357      <title>Create spot images</title>
358      <screenshot>
359        <mediaobject>
360          <imageobject><imagedata 
361            fileref="figures/create_spot_images.png" format="PNG"
362            /></imageobject>
363        </mediaobject>
364      </screenshot>
365    </figure>
366   
367    <helptext external_id="rawbioassay.edit.spotimages" 
368      title="Create spot images">
369     
370    <variablelist>
371      <varlistentry>
372        <term><guilabel>X/Y scale and offset</guilabel></term>
373        <listitem>
374          <para>
375            For the spot image creation process to be able
376            to find the spots, the X and Y coordinates from
377            the raw data must be converted into image pixel
378            values. The formula used is:
379            <code>pixelX = (rawX - offsetX) / scaleX</code>
380          </para>
381         
382          <important>
383            It is important that you get these values correct,
384            or the spot image creation process may fail or generate
385            incorrect spot images.
386          </important>
387         
388        </listitem>
389      </varlistentry>
390     
391      <varlistentry>
392        <term><guilabel>Spot size</guilabel></term>
393        <listitem>
394          <para>
395            The spot size is given in pixels and is the width and
396            hight around each spot that is large enough to contain
397            the spot without having too much empty space or
398            neighbouring spots around it.
399          </para>
400     
401        </listitem>
402      </varlistentry>
403     
404      <varlistentry>
405        <term><guilabel>Gamma correction</guilabel></term>
406        <listitem>
407          <para>
408            Gamma correction is needed to make the images
409            look good on computer displays. A value between
410            1.8 and 2.2 is usually best.
411            See <ulink url="http://en.wikipedia.org/wiki/Gamma_correction"
412              >http://en.wikipedia.org/wiki/Gamma_correction</ulink>
413            for more information.
414          </para>
415     
416        </listitem>
417      </varlistentry>
418     
419      <varlistentry>
420        <term><guilabel>Quality</guilabel></term>
421        <listitem>
422          <para>
423            The quality setting to use when saving the generated
424            spot images as JPEG images. A value between 0 = poor
425            and 1 = good can be used.
426          </para>
427        </listitem>
428      </varlistentry>
429     
430      <varlistentry>
431        <term><guilabel>Red, green and blue image files</guilabel></term>
432        <listitem>
433          <para>
434            You must select which scanned image files to use for the
435            red, green and blue component of the generated spot images.
436            Use the <guibutton>Select&hellip;</guibutton> buttons
437            to select existing images or upload new ones.
438            The original image files must be 8- or
439            16-bit grey scale images. Some scanners, for
440            example Genepix, can create TIFF files with more than
441            one image in each file. BASE supports this and uses
442            the images in the order they appear in the TIFF file.
443          </para>
444         
445          <note>
446            Avoid TIFF images which also contains previews
447            of the full image. BASE may use the wrong image
448            with an error as the result. If you have multi-image
449            TIFF files these must only contain the full images.
450          </note>
451         
452        </listitem>
453      </varlistentry>
454      <varlistentry>
455        <term><guilabel>Save as</guilabel></term>
456        <listitem>
457          <para>
458            Specify the path and filename where the generated spot
459            images should be saved. The process will create
460            a single zip file containing all the images.
461          </para>
462        </listitem>
463      </varlistentry>
464     
465      <varlistentry>
466        <term><guilabel>Overwrite existing file</guilabel></term>
467        <listitem>
468          <para>
469            If a file with the same name already exists you
470            must mark this checkbox to overwrite it.
471          </para>
472        </listitem>
473      </varlistentry>
474    </variablelist>
475   
476    <para>
477      Click on the <guibutton>Create</guibutton> button
478      to add the spot image creation job to the job queue,
479      or on &gbCancel; to abort.
480    </para>
481   
482    </helptext>
483  </sect2>
484</sect1>
485   
Note: See TracBrowser for help on using the repository browser.