source:
plugins/base2/net.sf.basedb.illumina/trunk/README
@
1093
Last change on this file since 1093 was 1093, checked in by , 14 years ago | |
---|---|
|
|
File size: 12.9 KB |
Copyright (C) 2008 This file is part of Illumina plug-in package for BASE. Available at http://baseplugins.thep.lu.se/ BASE main site: http://base.thep.lu.se/ This is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. The software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with BASE. If not, see <http://www.gnu.org/licenses/>.
Requirements
- BASE 2.9.0 or later.
For expression experiments:
- Illumina Bead Summary (IBS) files. The IBS files contain quantified probe intensities.
- Illumina Sentrix Array binary manifest (BGX) file. The BGX files contain probe annotations.
For SNP experiments:
- Illumina SNP manifest files
- Illumina SNP raw data files
Tested using Illumina BeadArray? Reader (Version: 1.7.0.44) and BeadScan? (Version: 3.5.31.17122) ## This is what we use in Lund.
Introduction
This README file contains general information about the plug-in package and specific information about expression data. See the README_SNP file specific information about SNP data.
The Illumina BeadArray? Reader is a scanner that can read arrays including Illumina Sentrix BeadChips? and Sentrix Array Matrices (SAMs). Operation of the BeadArray? Reader and image aquisition from Sentrix arrays is handled by the Illumina BeadScan? software.
The data output from a BeadArray? Reader scanner by default consists of files including image data (IDAT) files that can be read by data analysis software such as the Illumina BeadStudio? software.
The Illumina plug-ins package for BASE reads Illumina Sentrix Array data from Illumina Bead Summary (IBS) files. The IBS files are not by default outputted by the BeadArray? Reader and the scanner must be configured to do so. Once the BeadArray? Reader is configured it will output IBS files in addition to any default output files. To configure a BeadArray? Reader to output IBS files, users are asked to contact their local Illumina Field Application Scientist.
The IBS files are text files that contain bead-type level data for scanned Sentrix arrays. The file format is explained in detail in the section Illumina Bead Summary files.
Illumina Bead Summary (IBS) files
The IBS files contain bead-type level data for scanned Sentrix arrays. The IBS files are simple comma separated text files with file extension .csv. The IBS files are outputted by the BeadArray? Reader in the same directory as any additional data files from a scan. Note that IBS files are not outputted by a BeadArray? Reader with default settings. Contact a local Illumina Field Application Scientist to configure the scanner to output IBS files.
IBS files are composed of four comma separated columns. See below for an example IBS file including header and 3 rows of data.
Illumicode,N,Mean GRN,Dev GRN 10008,26,222,47 10010,16,57,11 10014,16,56,13
The column content in an IBS file is described below.
- Illumicode : A code corresponding to the Array_Address_Id in the Illumina Sentrix Array binary manifest (BGX) file. Note that the Illuminacode is a string (or integer) of varying length. The Array_Address_Id is a string with a fixed lenght of 10 characters that consists of an Illuminacode padded with zeros.
- N : The total number of beads used to calculate Mean GRN and Dev GRN.
- Mean GRN : The mean intensity.
- Dev GRN : Standard deviation of the mean intensity.
IBS files may contain some rows with Illumicodes that are not represented in the BGX files. Our interpretation is that some probes that used to be annotated with a gene has later been considered poor by Illumina (as we have only observed an increase in the number of unmatched probes for later bgx revisions). To avoid that the raw data importer fails cause of some probes, listed in the IBS-file, couldn't be found in BASE - you have to set plug-in parameter 'Probe not found=skip', when importing data from an IBS-file into BASE.
A new raw data type has been defined in illumina-raw-data-types.xml to hold this kind of data. The name of the raw data type is Illumina Bead Summary (IBS) and the unique ID is illumina_bead_summary
Illumina Sentrix Array binary manifest (BGX) files
In addition to IDAT files, BeadStudio? requires Illumina Sentrix Array binary manifest (BGX) files that contain information about the probes on a specific Illumina Sentrix Array, including gene symbol, probe sequence, and so on. In BASE, the BGX files are used to create array designs that describe the probe content of a specific Illumina Sentrix Array.
BGX files are tab separated text files composed of 3 sections named Heading, Probes, and Controls respectively. The first section is the Heading section. It is preceeded by a row containing the text [Heading]. In the Heading section some information is presented including the number of Probes and Controls described in the BGX file. See below for an example of the Heading section.
[Heading] Date 1/3/2007 ContentVersion 1.0 FormatVersion 1.0.0 Number of Probes 48701 Number of Controls 1426
Following the Heading section is the Probes section wich is preceeded by a row containing the text [Probes]. The first row of the Probes section, i.e., the row after [Probes] contain the header for the Probes section. Following the Probes section is the Controls section wich is preceeded by a row containing the text [Controls]. The first row of the Controls section, i.e., the row after [Controls] contain the header for the Controls section. Note that the header row for the Controls section is completely different that the header row for the Probes section. See below for an example of Probes header and Controls header and how information in the BGX file is mapped to BASE.
Mapping reporter/control annotations from BGX files to BASE
The table below shows how the [Probes] section in the BGX file are mapped to reporter annotations in BASE. Annotations in <brackets> are new annotations defined in the illumina-extended-properties.xml file. BGX columns marked with - are not mapped to BASE.
BGX column | BASE reporter annotation | Example value |
Species | Species | Homo sapiens |
Source | <Source> | RefSeq? |
Search_Key | <Search_Key> | ILMN_5998 |
Transcript | - | ILMN_5998 |
ILMN_Gene | <ILMN_Gene> | BRCA1 |
Source_Reference_ID | <Source_Reference_ID> | NM_007301.2 |
RefSeq_ID | RefSeq? | NM_007301.2 |
Unigene_ID | Cluster ID | |
Entrez_Gene_ID | LocusLink? | 672 |
GI | - | 63252878 |
Accession | Accession | NM_007301.2 |
Symbol | Gene symbol | BRCA1 |
Protein_Product | - | NP_009232.1 |
Probe_Id | External ID | ILMN_1738027 |
Array_Address_Id | Feature ID * | 0003120095 |
Probe_Type | <Isoform_Type> | A |
Probe_Start | - | 6438 |
Probe_Sequence | Sequence | ATCCAGGACTGTTTATAGCTGTTGGAAGGACTAGGTCTTCCCTAGCCCCC |
Chromosome | Chromosome | 17 |
Probe_Chr_Orientation | <Probe_Chr_Orientation> | |
Probe_Coordinates | <Probe_Coordinates> | 38449935-38449984 |
Definition | Description | Homo sapiens breast cancer 1, early onset (BRCA1), transcript variant BRCA1-delta15-17, mRNA. |
Ontology_Component | GO cell location | ubiquitin ligase complex [goid 151] [pmid 14976165] [evidence NAS]; ... |
Ontology_Process | GO biological process | protein ubiquitination [goid 16567] [pmid 15905410] [evidence NAS]; ... |
Ontology_Function | GO molecular function | metal ion binding [goid 46872] [evidence IEA]; ... |
Synonyms | <Synonyms> | IRIS; PSCP; BRCAI; BRCC1; RNF53 |
The table below shows how the [Controls] section in the BGX file are mapped to reporter annotations in BASE. Annotations in <brackets> are new annotations defined in the illumina-extended-properties.xml file. BGX columns marked with - are not mapped to BASE.
BGX column | BASE reporter annotation | Example value |
Probe_Id | External ID | ILMN_943471 |
Array_Address_Id | Feature ID * | 0004780609 |
Reporter_Group_Name | <Control_Group_Name> | housekeeping |
Reporter_Group_id | <Control_Group_Id> | housekeeping |
Reporter_Composite_map | <Control_Composite_map> | GI_34304116-S |
Probe_Sequence | Sequence | CGTGAAGACCCTGACTGGTAAGACCATCACTCTCGAAGTGGAGCCGAGTG |
- The Feature ID is not a reporter annotation. It is used only to
identify the probe on an array design.
The column mappings for the [Probes] section can be changed by modifying the existing import configuration or creating a new configuration. The column mappings for [Controls] section can't be changed.
Getting started
- Install this package as described by the instructions in the INSTALL file.
- Import reporter annotations. You will need one or more BGX files for this.
BGX files can be downloaded from http://www.switchtoi.com/annotationfiles.ilmn.
- Upload the BGX file(s) to BASE.
- Go to the View -> Reporters menu.
- Click on the Import button.
- Use the auto-detect function or select the Illumina BGX reporter importer plug-in.
- Select the BGX file.
- Finish the job registration and wait for the plug-in to complete.
- Repeat this one time for each BGX file.
- Create array designs. You will need one array design for each BGX file.
- Go to the Array LIMS -> Array designs menu.
- Click on the New button.
- Choose the Illumina/Expression? 1 or the Illumina/Expression? 2 platform. The difference is that the Expression 2 has two IBS files for each raw data set, but Expression 1 only has one.
- We recommend that you give the array design the same name as the BGX file.
- Switch to the Data files tab and select the BGX file.
- Click on Save.
- Click on the newly created array design.
- Click on the Import button and select the Illumina BGX feature importer plug-in.
- Click on Next and select the Duplicate feature=skip option.
- Finish the job registration and wait for the plug-in to complete.
- Repeat this for each BGX file.
- Import raw data. You will need one or two IBS files.
- Upload the IBS file(s) to BASE.
- Go to the View -> Raw bioassays menu.
- Click on the New button.
- Select the Illumina/Expression? 1 or the Illumina/Expression? 2 platform. The difference is that the Expression 2 has two IBS files for each raw data set, but Expression 1 only has one.
- Select one of the array designs created in step 3.
- Switch to the Data files tab and select the IBS file(s).
- Click on Save.
- Click on the newly created raw bioassay.
- Click on the Import button and select the Illumina Bead Summary Importer
- Finish the job registration and wait for the plug-in to complete.
- Repeat this for each set of raw data files.
- Add your raw data sets to an experiment.
Tip! Steps 1-3 only needs to be done a single time for a BASE installation. If more than one user is going to use the Illumina package we recommend that the array designs created in step 3 are shared to the appropriate users, for example, the Everyone group.
Tip! The data import step in (4) above can be done for an entire experiment at a time.