source: trunk/se/lu/thep/wenni/README @ 317

Last change on this file since 317 was 317, checked in by Peter Johansson, 15 years ago

wikized README et al and corrected configure.ac

  • Property svn:eol-style set to native
  • Property svn:keywords set to Id
  • Property svn:mime-type set to text/x-trac-wiki
File size: 12.8 KB

$Id: README 317 2007-05-28 21:27:31Z peter $


Copyright (C) 2005, 2006 Jari Häkkinen, Peter Johansson
Copyright (C) 2007 Peter Johansson
This file is part of WeNNI,
http://lev.thep.lu.se/trac/baseplugins/wiki/WeNNI
WeNNI is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your
option) any later version.
WeNNI is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
USA.

Send comments, suggestion, complaints, or questions about this document to jari@…

Please use the below reference if you use WeNNI in connection with scientific publications:

"Improving missing value imputation of microarray data by using spot quality weights", P. Johansson and J. Häkkinen. BMC Bioinformatics 7, 306 (2006)

For installation and compilation instructions please read INSTALL.

Comments on WeNNI:

0) WeNNI is presented in the article referenced above.

1) The notion of weights becomes obsolete after running WeNNI, i.e., do

not use the weight fed into WeNNI in any subsequent analysis because all weights are now strictly 1.

2) Running WeNNI as a BASE plug-in makes WeNNI destined to impute log

ratios of channel 1 and channel 2 (M values in BASE world). A consequence of imputing log ratios is that a change in ratio cannot be assigned to a specific channel. This implies that log(channel1*channel2) (A values in BASE world) become undefined and useless. However, on request from BASE users it was decided that A values should not be affected by the transformation in cases where the A value is well defined before imputation. In cases when an A value do not exist before transformation (i.e. channel1<=0 or channel2<=0) it was decided that A should be set to 0. NOTE, this does not change the underlying WeNNI algorithm in any way but is rather conventions needed for BASE plug-in usage.

Remember, if you installed the software as a BASE plug-in, please log on to the server and make the server aware of the new plug-in. The plug-in definition file can be found in directory bin/base_plugin_script as file plugin_WeNNI.base

If you installed the software as a standalone package, read on. BASE plug-in users can also read on if they want some more information about the WeNNI package. There are three programs created during compilation:

BaseFileConverter?, NNIFileConverter, nni

(These files are found in directories ./bin/BaseFileConverter, ./bin/NNIFileConverter, and ./bin/nni, respectively.)

These three programs will take you from a BASE file to an imputed matrix by running them in the order below (examples 1 to 3). There is no need to run all steps, i.e., you can generate a data (matrix) file with an associated weights matrix file and just run nni.

There is a 4th example below where the wenni.pl script (found in directory ./bin/base_plugin_script/) is used to run all examples 1 through 3. wenni.pl also generates a resulting BASE file that a BASE server would accept as a result file (and import it into the database) when running wenni.pl as a plug-in from within BASE.

1) BaseFileConverter?.

Extracts columns from a file exported from BASE, and writes the extracted data into files in matrix form. What data should be extracted from the BASE file is set by command line options.

Command syntax:

BaseFileConverter? <basefile> <string> -<fieldtype> <fieldname> ...

or

BaseFileConverter? <basefile> -show

where

<basefile> is the file exported from BASE. <string> is a string added to the beginning of all matrix files created. <fieldtype> defines which field should be extracted from the BASE file. <fieldname> the name of the column to extract. ... means that the two last option can be defined several times. This to

facilitate export of several fields in one run.

-show will output available fieldtypes and fieldnames in the BASE file.

Example

a) This is an example on how to extract data from a sample BASE

file available in the data directory (data/basefile_in.data)

./BaseFileConverter? basefile_in.data wenni_ -assayFields FCh1Mean \

-assayFields BCh1Mean \ -assayFields FCh2Mean \ -assayFields BCh2Mean \ -assayFields BCh1SD \ -assayFields BCh2SD

Six files will be created: wenni_BCh1Mean.data, wenni_BCh1SD.data,

wenni_BCh2Mean.data, wenni_BCh2SD.data, wenni_FCh1Mean.data, wenni_FCh2Mean.data

b) Another example on how to extract data from a sample BASE file

available in the data directory (data/basefile_in.data)

./BaseFileConverter? basefile_in.data wenni_ -assayFields intensity1 \

-assayFields intensity2 \ -assayFields BCh1SD \ -assayFields BCh2SD

Four files will be created: wenni_intensity1.data, wenni_BCh1SD.data,

wenni_intensity2.data, wenni_BCh2SD.data

2) NNIFileConverter

This program merges the six files created in with BaseFileConverter? into two files: a logratio matrix file and a weight matrix file.

Command syntax:

NNIFilerConverter -o option

with the following available options

-beta the beta parameter of the snr to weight calculation. -datatype Values 'raw' or 'derived' . When 'derived' is given,

only foreground files are expected, and background files ignored (cf. options -fg1, -fg2, -bg1, and -bg2.

-fg1 input file first foreground -bg1 input file first background -bgstd1 input file first background standard deviation -fg2 input file second foreground -bg2 input file second background -bgstd2 input file second background standard deviation -logratio output file logratio (first/second) -weight output file weight

Example

c) This is an example on how to merge the files created in the

BaseFileConverter? example a) above.

./NNIFileConverter -beta 0.6 \

-bg1 wenni_BCh1Mean.data \ -bg2 wenni_BCh2Mean.data \ -bgstd1 wenni_BCh1SD.data \ -bgstd2 wenni_BCh2SD.data \ -fg1 wenni_FCh1Mean.data \ -fg2 wenni_FCh2Mean.data \ -logratio wenni_logratio.data \ -weight wenni_weight.data

Two files will be created: wenni_logratio.data wenni_weight.data

d) Another example on how to merge the files created in the

BaseFileConverter? example b) above.

./NNIFileConverter -beta 0.6 \

-bgstd1 wenni_BCh1SD.data \ -bgstd2 wenni_BCh2SD.data \ -datatype derived \ -fg1 wenni_intensity1.data \ -fg2 wenni_intensity2.data \ -logratio wenni_logratio.data \ -weight wenni_weight.data

Two files will be created: wenni_logratio.data wenni_weight.data. If you followed the examples carefully, using provided sample files, these result files should be the same as the ones produced in example c).

3) nni

Perform missing value imputation on a data matrix with associated weights. Weights may be SNR, see options.

The input matrices are read from files and the resulting matrix is written to stdout.

Command syntax:

nni -o option

with the following available options

-beta: set the beta value for weight calculation

This option is only used if the weights are SNR rather than weights. See the WeNNI paper for details on this parameter.

-data data file name -neighbours number of nearest neighbours

This sets the number of contributions to use in the imputation. For binary weights this corresponds directly to closest neighbours, whereas For non-binary weights the accumulated weights are compared against the cutoff.

-nni_algorithm the algorithm to use

Available algorithms are kNNI and WeNNI and a string is expected as the option

-weight weight file name

Weights must be within [0,1]. See -weight_cutoff parameter if kNNI algorithm is used.

-weight_cutoff set the cutoff value for weights

kNNI requires binary weights. All values larger than the cutoff will be treated as 1s, and 0s otherwise.

-weight_is_snr weight file contains snr values rather than weights

Use -beta to tune the weights.

Example

e) This is an example on how to (WeNNI) impute the data file using the

associated weights file created by NNIFileConverter above:

./nni -data wenni_logratio.data \

-neighbours 10 \ -nni_algorithm WeNNI \ -weight wenni_weight.data > wenni_imputed.data

One file will be created: wenni_imputed.data

f) This is an example on how to (WeNNI) impute the data file using

an SNR file instead of weights:

./nni -beta 0.6 \

-data wenni_logratio.data \ -neighbours 10 \ -nni_algorithm WeNNI \ -weight wenni_snr.data \ -weight_is_snr > wenni_imputed.data

One file will be created: wenni_imputed.data This file may differ slightly from the file generated in example e) above due to limited floating point precision.

g) This is an example on how to (KNNimpute) impute the data file

using the associated weights file created by NNIFileConverter above:

./nni -data wenni_logratio.data \

-neighbours 10 \ -nni_algorithm kNNI \ -weight wenni_weight.data \ -weight_cutoff 0.5 > knni_imputed.data

One file will be created: knni_imputed.data

4) wenni.pl

Performs missing value imputation on data in a BASE file. The BASE file is read from stdin, and the resulting BASE file is written to stdout. This is how a plug-in is expected to behave to make the BASE job runner happy.

Command syntax:

wenni.pl --option value < basefile_in.data > basefile_out.dat

wenni.pl will read options from basefile_in.data, but the settings may be changed by corresponding command line options. The available command line options are

--beta the beta parameter (cf. text to example 2). --datatype string that defines data exported from BASE.

Allowed string values for datatype is 'raw' or 'derived' (without characters). When 'derived' is given, the user defined intensity1 and intensity2 will be used from the BASE file. When 'raw' is used, intensity1 will be calculated as FCh1Mean-BCh1Mean (intensity2 is defined correspondingly).

--neighbours number of nearest neighbours (cf. text to example 3). --nodelete Do not clean up, i.e. temporary files will not be

deleted.

Example

h) wenni.pl will basically run through examples 1a, 2c, and 3e

sequentially, and produce a BASE file with the imputed values. 'raw' type of input data is demanded by the command line options --datatype. Other options are read from the input BASE file. Intermediate files will not be deleted

./wenni.pl --datatype raw --nodelete < basefile_in.data > basefile_out.data

i) wenni.pl will basically run through examples 1b, 2d, and 3e

sequentially, and produce a BASE file with the imputed values. 'derived' type of data is expected ('derived' type is default). Options are read from the input BASE file. Intermediate files are deleted.

./wenni.pl < basefile_in.data > basefile_out.data

Note: See TracBrowser for help on using the repository browser.