Changeset 4002


Ignore:
Timestamp:
Nov 26, 2007, 1:36:38 PM (16 years ago)
Author:
Nicklas Nordborg
Message:

References #596

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/doc/src/docbook/appendix/raw_data_types.xml

    r3944 r4002  
    3131 
    3232  <para>
    33     Raw data can be stored either as files attached to items or in
    34     the database.
    35     The <classname docapi="net.sf.basedb.core">Platform</classname> item has information
    36     about this. Configuration information for the database tables
    37     and columns used to store raw data in the database is found in the
    38     <filename>raw-data-types.xml</filename> file. For detailed information
    39     see <xref linkend="core_api.data_in_files" />.
     33    Raw data can be stored either as files attached to items and/or in
     34    the database. The <classname docapi="net.sf.basedb.core">Platform</classname>
     35    item has information  about this.  For more information see
     36    <xref linkend="core_api.data_in_files" />.
    4037  </para>
    4138 
     
    113110   
    114111    <para>
    115       TODO
    116     </para>
     112      A given platform either supports importing data to the database or it
     113      doesn't. If it supports import, it may be locked to specific raw data type
     114      or it may use any raw data type. Among the default platforms installed with
     115      BASE, the Affymetrix platform doesn't support importing data. The Generic platform
     116      supports importing to any raw data type.
     117    </para>
     118   
     119    <para>
     120      Raw data types are defined in the <filename>raw-data-types.xml</filename>
     121      file. This file is located in the <filename>&lt;basedir&gt;/www/WEB-INF/classes</filename>
     122      directory and contains information about the database tables and columns to
     123      use for storing raw data. BASE ships with default raw data types for many
     124      different microarray platforms, including Genepix, Agilent and Illumina.
     125    </para>
     126   
     127    <para>
     128      If you want your BASE installation to be configured differently we recommend that
     129      you do it before the first initialisation of the database.
     130      It is possible to change the configuration of an existing BASE installation but it
     131      requires manual updates to the database. Follow this procedure:
     132    </para>
     133
     134  <orderedlist>
     135  <listitem>
     136    <para>
     137    Shut down the BASE web server. If you have installed job agents you should shut
     138    down them as well.
     139    </para>
     140  </listitem>
     141 
     142  <listitem>
     143    <para>
     144    Modify the <filename>raw-data-types.xml</filename> file. If you have installed
     145    job agents, make sure they all have the same version as the web server.
     146    </para>
     147  </listitem>
     148 
     149  <listitem>
     150    <para>
     151    Run the <filename>updatedb.sh</filename> script. Tables for new raw data types
     152    and new columns for existing raw data types automatically be created, but the script
     153    can't delete tables or columns that have been removed, or modify columns that have
     154    changed datatype. You will have to do these kind of changes by manually executing
     155    SQL against your database. Check your database documentation for information about SQL syntax.
     156    </para>
     157   
     158    <tip>
     159      <title>Create a parallell installation</title>
     160      <para>
     161      You can always create a new temporary parallell installation to check
     162      what the table generated by installation script looks like. Compare the
     163      new table to the existing one and make sure they match.
     164      </para>
     165    </tip>
     166  </listitem>
     167 
     168  <listitem>
     169    <para>
     170    Start up the BASE web server and job agents, if any, again.
     171    </para>
     172  </listitem>
     173  </orderedlist>
     174
     175  <tip>
     176    <title>Start with few columns</title>
     177    <para>
     178    It is better to start with too few columns, since it is easier to add
     179    more columns than it is to remove columns that are not needed.
     180    </para>
     181  </tip>
     182
     183  <bridgehead>Format of the raw-data-types.xml file</bridgehead>
     184  <para>
     185    The <filename>raw-data-types.xml</filename> is an XML file.
     186    The following example will serve as a description of the format:
     187  </para>
     188 
     189 
     190  <programlisting language="xml">
     191<![CDATA[
     192<?xml version="1.0" ?>
     193<?xml-stylesheet type="text/xsl" href="raw-data-types.xsl"?>
     194<!DOCTYPE raw-data-types SYSTEM "raw-data-types.dtd" >
     195<raw-data-types>
     196   <raw-data-type
     197      id="genepix"
     198      name="GenePix"
     199      channels="2"
     200      table="RawDataGenePix"
     201      >
     202      <property
     203         name="diameter"
     204         title="Spot diameter"
     205         description="The diameter of the spot in µm"
     206         column="diameter"
     207         type="float"
     208      />
     209      <property
     210         name="ch1FgMedian"
     211         title="Channel 1 foreground median"
     212         description="The median of the foreground intensity in channel 1"
     213         column="ch1_fg_median"
     214         type="float"
     215         channel="1"
     216      />
     217      <!-- skipped a lot of properties -->
     218      <intensity-formula
     219         name="mean"
     220         title="Mean FG - Mean BG"
     221         description="Subtract mean background from mean foreground"
     222         >
     223         <formula
     224            channel="1"
     225            expression="raw('ch1FgMean') - raw('ch1BgMean')"
     226         />
     227         <formula
     228            channel="2"
     229            expression="raw('ch2FgMean') - raw('ch2BgMean')"
     230         />
     231      </intensity-formula>
     232      <!-- and a few more... --->
     233   </raw-data-type>
     234</raw-data-types>
     235]]>
     236</programlisting>
     237 
     238  <para>
     239    Each raw data type is represented by a <sgmltag class="starttag">raw-data-type</sgmltag>
     240    tag. The following attributes can be used:
     241  </para>
     242 
     243    <table frame="all" id="appendix.rawdatatypes.tag">
     244    <title>Attributes for the <sgmltag class="starttag">raw-data-type</sgmltag> tag</title>
     245    <tgroup cols="3" align="left">
     246      <colspec colname="attribute" align="left" />
     247      <colspec colname="required" />
     248      <colspec colname="comment" />
     249      <thead>
     250        <row>
     251          <entry>Attribute</entry>
     252          <entry>Required</entry>
     253          <entry>Comment</entry>
     254        </row>
     255      </thead>
     256      <tbody>
     257        <row>
     258          <entry>id</entry>
     259          <entry>yes</entry>
     260          <entry>
     261            A unique ID of the raw data type. It should contain only letters,
     262            numbers and underscores and the first character must be a letter.
     263          </entry>
     264        </row>
     265        <row>
     266          <entry>name</entry>
     267          <entry>yes</entry>
     268          <entry>
     269            A unique name of the raw data type. The name is usually used by client
     270            applications for disaplay.
     271          </entry>
     272        </row>
     273        <row>
     274          <entry>table</entry>
     275          <entry>yes</entry>
     276          <entry>
     277            The name of the database table to store data in. The table name
     278            must be unique and can only contain letters,
     279            numbers and underscores. The first character must be a letter.
     280          </entry>
     281        </row>
     282        <row>
     283          <entry>channels</entry>
     284          <entry>yes</entry>
     285          <entry>
     286            The number of channels used by this raw data type. It must be
     287            a number &gt; 0.
     288          </entry>
     289        </row>
     290        <row>
     291          <entry>description</entry>
     292          <entry>no</entry>
     293          <entry>
     294            An optional (longer) description of the raw data type.
     295          </entry>
     296        </row>
     297      </tbody>
     298    </tgroup>
     299    </table>
     300   
     301    <para>
     302      Following the <sgmltag class="starttag">raw-data-type</sgmltag> tag
     303      is one or more  <sgmltag class="starttag">property</sgmltag> tags.
     304      Each one defines a column in the database that is designed to hold
     305      data values of a particular type. The following attributes can be used
     306      on this tag:
     307    </para>
     308 
     309    <table frame="all" id="appendix.rawdatatypes.property">
     310    <title>Attributes for the <sgmltag class="starttag">property</sgmltag> tag</title>
     311    <tgroup cols="3" align="left">
     312      <colspec colname="attribute" align="left" />
     313      <colspec colname="required" />
     314      <colspec colname="comment" />
     315      <thead>
     316        <row>
     317          <entry>Attribute</entry>
     318          <entry>Required</entry>
     319          <entry>Comment</entry>
     320        </row>
     321      </thead>
     322      <tbody>
     323        <row>
     324          <entry>*</entry>
     325          <entry></entry>
     326          <entry>
     327            All attributes defined by the
     328            <sgmltag class="starttag">property</sgmltag> tag in
     329            <filename>extended-properties.xml</filename>. See
     330            <xref linkend="appendix.extendedproperties.property" />.
     331          </entry>
     332        </row>
     333        <row>
     334          <entry>channels</entry>
     335          <entry>no</entry>
     336          <entry>
     337            The channel number the property belongs to. Allowed values are 0 to
     338            the number of channels specified for the raw data type. If the property
     339            doesn't belong to any channels set the value to 0 or leave it
     340            unspecified.
     341          </entry>
     342        </row>
     343      </tbody>
     344    </tgroup>
     345    </table>
     346   
     347    <para>
     348      Following the <sgmltag class="starttag">property</sgmltag> tags comes 0
     349      or more <sgmltag class="starttag">intensity-formula</sgmltag> tags.
     350      Each one defines mathematical formulas that can be used to create
     351      calculate the intensity values from the raw data. In the Genepix, case
     352      there are several formulas which differs in the way background is
     353      subtracted from foregorund intensity values. For other raw data
     354      types, the intensity formula may just copy one of the raw data values.
     355    </para>
     356   
     357    <para>
     358      The intensity formulas are installed as <classname
     359      docapi="net.sf.basedb.core">Formula</classname> items in the database. This
     360      means that you can manually add, change or remove intensity formulas directly
     361      from the web interface. The intensity formulas in the <filename>raw-data-types.xml</filename>
     362      file are only used at installation time.
     363    </para>
     364   
     365    <para>
     366      The <sgmltag class="starttag">intensity-formula</sgmltag> tag has the following
     367      attributes:
     368    </para>
     369   
     370    <table frame="all" id="appendix.rawdatatypes.intensity-formula">
     371    <title>Attributes for the <sgmltag class="starttag">intensity-formula</sgmltag> tag</title>
     372    <tgroup cols="3" align="left">
     373      <colspec colname="attribute" align="left" />
     374      <colspec colname="required" />
     375      <colspec colname="comment" />
     376      <thead>
     377        <row>
     378          <entry>Attribute</entry>
     379          <entry>Required</entry>
     380          <entry>Comment</entry>
     381        </row>
     382      </thead>
     383      <tbody>
     384        <row>
     385          <entry>name</entry>
     386          <entry>yes</entry>
     387          <entry>
     388            A unique name for the formula. This is only used during installation.
     389          </entry>
     390        </row>
     391        <row>
     392          <entry>title</entry>
     393          <entry>yes</entry>
     394          <entry>
     395            The title of the formula. This is used by client applications for
     396            display.
     397          </entry>
     398        </row>
     399        <row>
     400          <entry>description</entry>
     401          <entry>no</entry>
     402          <entry>
     403            An optional, longer, description of the formula.
     404          </entry>
     405        </row>
     406      </tbody>
     407    </tgroup>
     408    </table>
     409   
     410    <para>
     411      The <sgmltag class="starttag">intensity-formula</sgmltag> must contain
     412      one <sgmltag class="starttag">formula</sgmltag> tag for each channel
     413      of the raw data type. The attributes of this tag are:
     414    </para>
     415   
     416    <table frame="all" id="appendix.rawdatatypes.formula">
     417    <title>Attributes for the <sgmltag class="starttag">formula</sgmltag> tag</title>
     418    <tgroup cols="3" align="left">
     419      <colspec colname="attribute" align="left" />
     420      <colspec colname="required" />
     421      <colspec colname="comment" />
     422      <thead>
     423        <row>
     424          <entry>Attribute</entry>
     425          <entry>Required</entry>
     426          <entry>Comment</entry>
     427        </row>
     428      </thead>
     429      <tbody>
     430        <row>
     431          <entry>channel</entry>
     432          <entry>yes</entry>
     433          <entry>
     434            The channel number. One tag for each channel must be specified. No
     435            duplicates are allowed.
     436          </entry>
     437        </row>
     438        <row>
     439          <entry>expression</entry>
     440          <entry>yes</entry>
     441          <entry>
     442            The mathematical expression used to calculate the intensities.
     443            The expression is parsed with the <classname docapi="net.sf.basedb.util.jep">Jep</classname>
     444            parser. It supports the common mathematical operations such as +, -, *, /,
     445            some mathematical function like, log2(), ln(), sqrt(), etc. See the API
     446            documentation for Jep for more information. You can also use two special
     447            function developed specifically for this case:
     448            <itemizedlist>
     449            <listitem>
     450              <para>
     451              raw(name): Get the value from the raw data property with the given name,
     452              for example: <code>raw('ch1FgMedian')</code>.
     453              </para>
     454            </listitem>
     455            <listitem>
     456              <para>
     457              mean(name): Get the mean value of the raw data property with the given name,
     458              for example: <code>mean('ch1BgMean')</code>. The mean is calculated from
     459              all raw data spots in the raw bioassay.
     460              </para>
     461            </listitem>
     462            </itemizedlist>
     463          </entry>
     464        </row>
     465      </tbody>
     466    </tgroup>
     467    </table>
    117468   
    118469  </sect1>
Note: See TracChangeset for help on using the changeset viewer.