Changeset 580


Ignore:
Timestamp:
Feb 11, 2008, 2:50:54 PM (13 years ago)
Author:
Nicklas Nordborg
Message:

References #89: Installation program and documentation

Added 'Getting started' section to readme. Reformatted list of column mappings for reporter annotations. Minor changes to importer configurations to make them work with the HumanRef?-8_V2_0_R2_11223162_A.bgx file.

Location:
trunk/net/sf/basedb/illumina
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • trunk/net/sf/basedb/illumina/INSTALL

    r576 r580  
    5959 9. The auto-detection should find the plug-ins listed below. You should
    6060   change the 'Install' value to 'yes' for all of them.
    61   * Illumina plug-in package installer (set 'Allow immediate execution' to 'allow')
     61  * Illumina plug-in package installer
    6262  * Illumina BGX feature importer
    6363  * Illumina BGX reporter importer
  • trunk/net/sf/basedb/illumina/META-INF/base-configurations.xml

    r575 r580  
    1515      <label>Min data columns</label>
    1616      <description>The minimum number of columns for a line to be counted as a data line.</description>
    17       <class>java.lang.Integer</class>
    18       <value>10</value>
     17      <class />
     18      <value />
     19    </parameter>
     20    <parameter>
     21      <name>extendedColumnMapping.cytoband</name>
     22      <label>Cytoband</label>
     23      <description>The cytoband from which the reporter is derived</description>
     24      <class />
     25      <value />
    1926    </parameter>
    2027    <parameter>
     
    2633    </parameter>
    2734    <parameter>
    28       <name>extendedColumnMapping.cytoband</name>
    29       <label>Cytoband</label>
    30       <description>The cytoband from which the reporter is derived</description>
     35      <name>extendedColumnMapping.source</name>
     36      <label>Source</label>
     37      <description />
     38      <class>java.lang.String</class>
     39      <value>\Source\</value>
     40    </parameter>
     41    <parameter>
     42      <name>extendedColumnMapping.markers</name>
     43      <label>Markers</label>
     44      <description />
    3145      <class />
    3246      <value />
     
    3953      <value />
    4054    </parameter>
    41      <parameter>
    42       <name>extendedColumnMapping.source</name>
    43       <label>Source</label>
    44       <description />
    45       <class>java.lang.String</class>
    46       <value>\Source\</value>
    47     </parameter>
    48     <parameter>
    49       <name>extendedColumnMapping.markers</name>
    50       <label>Markers</label>
    51       <description />
    52       <class />
    53       <value />
     55    <parameter>
     56      <name>extendedColumnMapping.searchKey</name>
     57      <label>Search key</label>
     58      <description />
     59      <class>java.lang.String</class>
     60      <value>\Search_Key\</value>
    5461    </parameter>
    5562    <parameter>
     
    6168    </parameter>
    6269    <parameter>
    63       <name>extendedColumnMapping.searchKey</name>
    64       <label>Search key</label>
    65       <description />
    66       <class>java.lang.String</class>
    67       <value>\Search_Key\</value>
     70      <name>descriptionColumnMapping</name>
     71      <label>Description</label>
     72      <description>Mapping that picks the reporter's description from the data columns. For example: \Description\</description>
     73      <class>java.lang.String</class>
     74      <value>\Definition\</value>
    6875    </parameter>
    6976    <parameter>
     
    7380      <class />
    7481      <value />
    75     </parameter>
    76     <parameter>
    77       <name>descriptionColumnMapping</name>
    78       <label>Description</label>
    79       <description>Mapping that picks the reporter's description from the data columns. For example: \Description\</description>
    80       <class>java.lang.String</class>
    81       <value>\Definition\</value>
    8282    </parameter>
    8383    <parameter>
     
    106106      <label>LocusLink</label>
    107107      <description />
    108       <class />
    109       <value />
     108      <class>java.lang.String</class>
     109      <value>\Entrez_Gene_ID\</value>
    110110    </parameter>
    111111    <parameter>
     
    152152    </parameter>
    153153    <parameter>
    154       <name>extendedColumnMapping.probeChrCoordinates</name>
    155       <label>Probe chr coordinates</label>
    156       <description />
    157       <class>java.lang.String</class>
    158       <value>\Probe_Chr_Orientation\</value>
     154      <name>extendedColumnMapping.probeCoordinates</name>
     155      <label>Probe coordinates</label>
     156      <description />
     157      <class>java.lang.String</class>
     158      <value>\Probe_Coordinates\</value>
    159159    </parameter>
    160160    <parameter>
     
    191191      <description>A regular expression that matches the header line just before the data begins. For example: Block\tRow\tColumn.*</description>
    192192      <class>java.lang.String</class>
    193       <value>\QSpecies  Source  Search_Key  Transcript  ILMN_Gene Source_Reference_ID RefSeq_ID Unigene_ID  Entrez_Gene_ID  GI  Accession Symbol  Protein_Product Probe_Id  Array_Address_Id  Probe_Type  Probe_Start Probe_Sequence  Chromosome  Probe_Chr_Orientation Probe_Coordinates Definition  Ontology_Component  Ontology_Process  Ontology_Function Synonyms  Obsolete_Probe_Id\E</value>
     193      <value>.*Probe_Id.+Array_Address_Id.*</value>
     194    </parameter>
     195    <parameter>
     196      <name>extendedColumnMapping.controlCompositeMap</name>
     197      <label>Control composite map</label>
     198      <description />
     199      <class />
     200      <value />
    194201    </parameter>
    195202    <parameter>
     
    207214      <value />
    208215    </parameter>
    209      <parameter>
     216    <parameter>
    210217      <name>extendedColumnMapping.probeChrOrientation</name>
    211218      <label>Probe chr orientation</label>
     
    237244    </parameter>
    238245    <parameter>
     246      <name>extendedColumnMapping.controlGroupName</name>
     247      <label>Control group name</label>
     248      <description />
     249      <class />
     250      <value />
     251    </parameter>
     252    <parameter>
     253      <name>extendedColumnMapping.controlGroupId</name>
     254      <label>Control group id</label>
     255      <description />
     256      <class />
     257      <value />
     258    </parameter>
     259    <parameter>
    239260      <name>dataSplitterRegexp</name>
    240261      <label>Data splitter</label>
     
    244265    </parameter>
    245266    <parameter>
     267      <name>extendedColumnMapping.antibiotics</name>
     268      <label>Antibiotics</label>
     269      <description />
     270      <class />
     271      <value />
     272    </parameter>
     273    <parameter>
    246274      <name>reporterIdColumnMapping</name>
    247275      <label>Reporter ID</label>
     
    249277      <class>java.lang.String</class>
    250278      <value>\Probe_Id\</value>
    251     </parameter>
    252     <parameter>
    253       <name>extendedColumnMapping.antibiotics</name>
    254       <label>Antibiotics</label>
    255       <description />
    256       <class />
    257       <value />
    258279    </parameter>
    259280    <parameter>
     
    308329      <description>A regular expression that matches the header line just before the data begins. For example: Block\tRow\tColumn.*</description>
    309330      <class>java.lang.String</class>
    310       <value>\QSpecies  Source  Search_Key  Transcript  ILMN_Gene Source_Reference_ID RefSeq_ID Unigene_ID  Entrez_Gene_ID  GI  Accession Symbol  Protein_Product Probe_Id  Array_Address_Id  Probe_Type  Probe_Start Probe_Sequence  Chromosome  Probe_Chr_Orientation Probe_Coordinates Definition  Ontology_Component  Ontology_Process  Ontology_Function Synonyms  Obsolete_Probe_Id\E</value>
     331      <value>.*Probe_Id.+Array_Address_Id.*</value>
    311332    </parameter>
    312333    <parameter>
  • trunk/net/sf/basedb/illumina/README

    r571 r580  
    6767IBS file including header and 3 rows of data.
    6868{{{
    69   Illumicode,N,Mean GRN,Dev GRN
    70   10008,26,222,47
    71   10010,16,57,11
    72   10014,16,56,13
     69Illumicode,N,Mean GRN,Dev GRN
     7010008,26,222,47
     7110010,16,57,11
     7210014,16,56,13
    7373}}}
     74
    7475The column content in an IBS file is described below.
    7576 - ''' Illumicode ''': A code corresponding to the Array_Address_Id in the
     
    7980 - ''' Dev GRN ''': Standard deviation of the mean intensity.
    8081
     82A new raw data type has been defined in illumina-raw-data-types.xml
     83to hold this kind of data. The name of the raw data type is
     84'''Illumina Bead Summary (IBS)''' and the unique ID is '''illumina_bead_summary'''
     85
     86
    8187== Illumina Sentrix® Array binary manifest (BGX) files ==
    8288
     
    9399described in the BGX file. See below for an example of the Heading section.
    94100{{{
    95   [Heading]
    96   Date  1/3/2007
    97   ContentVersion  1.0
    98   FormatVersion 1.0.0
    99   Number of Probes  48701
    100   Number of Controls  1426
     101[Heading]
     102Date  1/3/2007
     103ContentVersion  1.0
     104FormatVersion 1.0.0
     105Number of Probes  48701
     106Number of Controls  1426
    101107}}}
    102108Following the Heading section is the Probes section wich is preceeded by a row
     
    112118== Mapping reporter/control annotations from BGX files to BASE ==
    113119
    114 -> map to BASE; * Existing reporter annotation field in BASE; <new field>; -| do not map to BASE
    115 [Probes] ## below are the columns present in the Probes section of the BGX file
    116  1. Species -> *Species
    117  2. Source -> <Source>
    118  3. Search_Key -> <Search_Key>
    119  4. Transcript -|
    120  5. ILMN_Gene -> <ILMN_Gene>
    121  6. Source_Reference_ID -> <Source_Reference_ID>
    122  7. RefSeq_ID -> *RefSeq
    123  8. Unigene_ID -> *Cluster ID
    124  9. Entrez_Gene_ID -> *LocusLink
    125  10. GI -|
    126  11. Accession -> *Accession
    127  12. Symbol -> *Gene symbol
    128  13. Protein_Product -|
    129  14. Probe_Id -> *reporter.externalId
    130  15. Array_Address_Id -> Feature.featureId
    131  16. Probe_Type -> <Isoform_Type>
    132  17. Probe_Start -|
    133  18. Probe_Sequence -> *Sequence
    134  19. Chromosome -> *Chromosome
    135  20. Probe_Chr_Orientation -> <Probe_Chr_Orientation>
    136  21. Probe_Coordinates -> <Probe_Chr_Coordinates>
    137  22. Definition -> *Name
    138  23. Ontology_Component -> *GO cell location
    139  24. Ontology_Process -> *GO biological process
    140  25. Ontology_Function -> *GO molecular function
    141  26. Synonyms -> <Synonyms>
    142 
    143 Example Probe
    144 {{{
    145 1. Homo sapiens
    146 2. RefSeq
    147 3. ILMN_5998
    148 4. ILMN_5998
    149 5. BRCA1
    150 6. NM_007301.2
    151 7. NM_007301.2
    152 8.
    153 9. 672
    154 10. 63252878
    155 11. NM_007301.2
    156 12. BRCA1
    157 13. NP_009232.1
    158 14. ILMN_1738027
    159 15. 0003120095
    160 16. A
    161 17. 6438
    162 18. ATCCAGGACTGTTTATAGCTGTTGGAAGGACTAGGTCTTCCCTAGCCCCC
    163 19. 17
    164 20. -
    165 21. 38449935-38449984
    166 22. Homo sapiens breast cancer 1, early onset (BRCA1), transcript variant BRCA1-delta15-17, mRNA.
    167 23. ubiquitin ligase complex [goid 151] [pmid 14976165] [evidence NAS]; gamma-tubulin ring complex [goid 8274] [pmid 12214252] [evidence NAS]; intracellular [goid 5622] [evidence IEA]; nucleus [goid 5634] [pmid 10918303] [evidence TAS]; BRCA1-BARD1 complex [goid 31436] [pmid 15265711] [evidence IDA]
    168 24. protein ubiquitination [goid 16567] [pmid 15905410] [evidence NAS]; regulation of apoptosis [goid 42981] [pmid 10918303] [evidence TAS]; cell cycle checkpoint [goid 75] [evidence NAS]; positive regulation of transcription, DNA-dependent [goid 45893] [pmid 15572661] [evidence NAS]; androgen receptor signaling pathway [goid 30521] [pmid 15572661] [evidence NAS]; cell cycle [goid 7049] [evidence IEA]; regulation of transcription from RNA polymerase II promoter [goid 6357] [pmid 10910365] [evidence TAS]; negative regulation of progression through cell cycle [goid 45786] [evidence IEA]; positive regulation of DNA repair [goid 45739] [pmid 12242698] [evidence NAS]; DNA damage response, signal transduction by p53 class mediator resulting in transcription of p21 class mediator [goid 6978] [pmid 10918303] [evidence TAS]; DNA damage response, signal transduction resulting in induction of apoptosis [goid 8630] [pmid 14654789] [evidence IDA]; regulation of transcription from RNA polymerase III promoter [goid 6359] [pmid 10918303] [evidence TAS]; negative regulation of centriole replication [goid 46600] [pmid 12214252] [evidence NAS]; regulation of cell proliferation [goid 42127] [pmid 10918303] [evidence TAS]; DNA repair [goid 6281] [evidence IEA]
    169 25. metal ion binding [goid 46872] [evidence IEA]; transcription coactivator activity [goid 3713] [pmid 15572661] [evidence NAS]; DNA binding [goid 3677] [pmid 9662397] [evidence TAS]; androgen receptor binding [goid 50681] [pmid 15572661] [evidence NAS]; protein binding [goid 5515] [pmid 15265711] [evidence IPI]; ubiquitin-protein ligase activity [goid 4842] [pmid 15905410] [evidence NAS]; zinc ion binding [goid 8270] [pmid 8944023] [evidence TAS]; tubulin binding [goid 15631] [pmid 12214252] [evidence NAS]
    170 26. IRIS; PSCP; BRCAI; BRCC1; RNF53
    171 }}}
    172 -> map to BASE; * Existing reporter annotation field in BASE; <new field>; -| do not map to BASE
    173 [Controls]  ## below are the columns present in the Controls section of the BGX file
    174  1. Probe_Id -> *reporter.externalId
    175  2. Array_Address_Id -> Feature.featureId
    176  3. Reporter_Group_Name -> <Control_Group_Name>
    177  4. Reporter_Group_id -> <Control_Group_Id>
    178  5. Reporter_Composite_map -> <Control_Composite_map>
    179  6. Probe_Sequence -> *Sequence
    180 
    181 Example Control
    182 {{{
    183 1. ILMN_943471
    184 2. 0004780609
    185 3. housekeeping
    186 4. housekeeping
    187 5. GI_34304116-S
    188 6. CGTGAAGACCCTGACTGGTAAGACCATCACTCTCGAAGTGGAGCCGAGTG
    189 }}}
    190 
    191 == New raw data type ==
    192 
    193  - name = Illumina Bead Summary (IBS)
    194  - id = illumina_bead_summary
     120The table below shows how the [Probes] section in the BGX file are mapped to
     121reporter annotations in BASE. Annotations in <brackets> are new annotations
     122defined in the illumina-extended-properties.xml file. BGX columns marked
     123with - are not mapped to BASE.
     124
     125|| '''BGX column'''      || '''BASE reporter annotation''' || '''Example value'''       ||
     126|| Species               || Species                        || Homo sapiens              ||
     127|| Source                || <Source>                       || RefSeq                    ||
     128|| Search_Key            || <Search_Key>                   || ILMN_5998                 ||
     129|| Transcript            || -                              || ILMN_5998                 ||
     130|| ILMN_Gene             || <ILMN_Gene>                    || BRCA1                     ||
     131|| Source_Reference_ID   || <Source_Reference_ID>          || NM_007301.2               ||
     132|| RefSeq_ID             || RefSeq                         || NM_007301.2               ||
     133|| Unigene_ID            || Cluster ID                     ||                           ||
     134|| Entrez_Gene_ID        || LocusLink                      || 672                       ||
     135|| GI                    || -                              || 63252878                  ||
     136|| Accession             || Accession                      || NM_007301.2               ||
     137|| Symbol                || Gene symbol                    || BRCA1                     ||
     138|| Protein_Product       || -                              || NP_009232.1               ||
     139|| Probe_Id              || External ID                    || ILMN_1738027              ||
     140|| Array_Address_Id      || Feature ID *                   || 0003120095                ||
     141|| Probe_Type            || <Isoform_Type>                 || A                         ||
     142|| Probe_Start           || -                              || 6438                      ||
     143|| Probe_Sequence        || Sequence                       || ATCCAGGACTGTTTATAGCTGTTGGAAGGACTAGGTCTTCCCTAGCCCCC ||
     144|| Chromosome            || Chromosome                     || 17                        ||
     145|| Probe_Chr_Orientation || <Probe_Chr_Orientation>        ||                           ||
     146|| Probe_Coordinates     || <Probe_Coordinates>            || 38449935-38449984         ||
     147|| Definition            || Description                    || Homo sapiens breast cancer 1, early onset (BRCA1), transcript variant BRCA1-delta15-17, mRNA.                          ||
     148|| Ontology_Component    || GO cell location               || ubiquitin ligase complex [goid 151] [pmid 14976165] [evidence NAS]; ...  ||
     149|| Ontology_Process      || GO biological process          || protein ubiquitination [goid 16567] [pmid 15905410] [evidence NAS]; ... ||
     150|| Ontology_Function     || GO molecular function          || metal ion binding [goid 46872] [evidence IEA]; ... ||
     151|| Synonyms              || <Synonyms>                     || IRIS; PSCP; BRCAI; BRCC1; RNF53 ||
     152
     153The table below shows how the [Controls] section in the BGX file are mapped to
     154reporter annotations in BASE. Annotations in <brackets> are new annotations
     155defined in the illumina-extended-properties.xml file. BGX columns marked
     156with - are not mapped to BASE.
     157
     158|| '''BGX column'''       || '''BASE reporter annotation''' || '''Example value ''' ||
     159|| Probe_Id               || External ID                    || ILMN_943471          ||
     160|| Array_Address_Id       || Feature ID *                   || 0004780609           ||
     161|| Reporter_Group_Name    || <Control_Group_Name>           || housekeeping         ||
     162|| Reporter_Group_id      || <Control_Group_Id>             || housekeeping         ||
     163|| Reporter_Composite_map || <Control_Composite_map>        || GI_34304116-S        ||
     164|| Probe_Sequence         || Sequence                       || CGTGAAGACCCTGACTGGTAAGACCATCACTCTCGAAGTGGAGCCGAGTG ||
     165
     166* The Feature ID is not a reporter annotation. It is used only to
     167identify the probe on an array design.
     168
     169The column mappings for the [Probes] section can be changed by modifying
     170the existing import configuration or creating a new configuration. The
     171column mappings for [Controls] section can't be changed.
     172
     173== Getting started ==
     174
     175 1. Install this package as described by the instruction in the INSTALL file.
     176 2. Import reporter annotations. You will need the BGX files for this. They can
     177    be downloaded from http://www.switchtoi.com/annotationfiles.ilmn.
     178     * Upload the BGX file(s) to BASE.
     179     * Go to the View -> Reporters menu.
     180     * Click on the Import button.
     181     * Use the auto-detect function or select the Illumina BGX reporter importer plug-in.
     182     * Select the BGX file.
     183     * Finish the job registration and wait for the plug-in to complete.
     184     * Repeat this one time for each BGX file.
     185 3. Create array designs. You will need one array design for each BGX file.
     186     * Go to the Array LIMS -> Array designs menu.
     187     * Click on the New button.
     188     * Choose the Illumina/Expression 1 or the Illumina/Expression 2 platform. The difference
     189       is that the Expression 2 has two IBS files for each raw data set, but Expression 1
     190       only has one.
     191     * We recommend that you give the array design the same name as the BGX file.
     192     * Switch to the Data files tab and select the BGX file.
     193     * Click on Save.
     194     * Click on the newly created array design.
     195     * Click on the Import button and select the Illumina BGX feature importer plug-in.
     196     * Click on Next and select the Duplicate feature=skip option.
     197     * Finish the job registration and wait for the plug-in to complete.
     198     * Repeat this for each BGX file.
     199 4. Import raw data. You will need one or two IBS files.
     200     * Upload the IBS file(s) to BASE.
     201     * Go to the View -> Raw bioassays menu.
     202     * Click on the New button.
     203     * Select the Illumina/Expression 1 or the Illumina/Expression 2 platform. The difference
     204       is that the Expression 2 has two IBS files for each raw data set, but Expression 1
     205       only has one.
     206     * Select one of the array designs created in step 3.
     207     * Switch to the Data files tab and select the IBS file(s).
     208     * Click on Save.
     209     * Click on the newly created raw bioassay.
     210     * Click on the Import button and select the Illumina Bead Summary Importer
     211     * Finish the job registration and wait for the plug-in to complete.
     212     * Repeat this for each set of raw data files.
     213 5. Add your raw data sets to an experiment.
     214 
     215Tip! Steps 1-3 only needs to be done a single time for a BASE installation. If more than
     216one user is going to use the Illumina package we recommend that the array designs created
     217in step 3 are shared to the appropriate users, for example, the Everyone group.
     218
  • trunk/net/sf/basedb/illumina/config/illumina-extended-properties.xml

    r575 r580  
    5858    />
    5959    <property
    60       name="probeChrCoordinates"
    61       title="Probe chr coordinates"
     60      name="probeCoordinates"
     61      title="Probe coordinates"
    6262      description=""
    63       column="probeChrCoordinates"
     63      column="probeCoordinates"
    6464      type="string"
    6565      length="255"
Note: See TracChangeset for help on using the changeset viewer.