wiki:se.lu.onk.IlluminaSNPNormalization

Version 8 (modified by markus, 13 years ago) (diff)

--

Illumina SNP Normalization

tQN is a strategy using quantile normalization to improve the quality of data from Illumina Infinium Whole-Genome Genotyping SNP Beadchips described in

Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios

  1. Staaf, J. Vallon-Christersson, D. Lindgren, G. Juliusson, R. Rosenquist, M. Höglund, Å. Borg, M. Ringnér

submitted

License

The tQN software is available as a stand-alone software package, and will become available as as a plug-in to BASE as the handling of SNP arrays in BASE is developed. Both versions are available under the GNU General Public License.

Download tQN

The software will be made available when the manuscript describing the method is accepted for publication.

How to use tQN

Requirements

tQN is written in R with a Perl wrapper, so both R and Perl are required. Required Perl modules are: File::Spec, Getopt::Long, IO::File and Pod::Usage (http://www.cpan.org). Required R package is limma (http://www.bioconductor.org).

Input data format

tQN is applied to data exported from BeadStudio. For a set of samples, the file exported from BeadStudio should be tab-delimited in the following format:

NameChrPositionsample1.Xsample1.Ysample2.Xsample2.Ysample3.Xsample3.Y
rs123540601100040.044248831.8182380.031577511.6327670.049736721.770216
rs26913101468440.70461261.3054450.83221421.2713290.80423331.151523
...........................

The data extracted from BeadStudio needs to be split into a separate file for each sample using the script split_beadstudio_samples.pl.

./split_beadstudio_samples.pl --beadstudio_file=example/example_beadstudio_data.txt

where example_beadstudio_data.txt is a file exported from BeadStudio in the format described above.

This script will generate one file per sample together with a file sample_names.txt in the tQN subdirectory extracted. These files are used when tQN is run and can be deleted once the samples are normalized.

Performing tQN

Run tQN with the following command:

 ./tQN_normalize_samples.pl --beadchip=humancnv370-duo

This command will perform tQN on the samples in the tQN subdirectory extracted that are specified in the file sample_names.txt. If you want to perform tQN on a subset of samples you can edit sample_names.txt accordingly. The normalized data is stored in the tQN subdirectory normalized. For each sample, there is a file with tQN normalized data. A file tQN_beadstudio.txt is also generated with tQN BAF and Log R Ratios for all samples in a format suitable for import into BeadStudio using its import column process. tQN also supports generating tQN data for further analysis with PennCNV and QuantiSNP. Running tQN with the following command:

 ./tQN_normalize_samples.pl --beadchip=humancnv370-duo --output_format=PennCNV

generates one data file per sample in the tQN subdirectory normalized for further analysis using PennCNV. Alternatives for --output_format are QuantiSNP, which generates one data file per sample for further analysis with QuantiSNP and BeadStudio, which is the default argument generating the default tQN_beadstudio.txt file with data for all samples. Beadchip types for which there is a cluster file in the tQN subdirectory lib are supported by tQN. For PennCNV and QuantiSNP, SNPs having missing values in either B allele frequencies or log R ratios after normalization are excluded from the respective output files.

Contact

If you have suggestions, comments or bug reports, please send an email to johan.staaf@…

Attachments (6)