Changeset 1810 for trunk/yat/normalizer/qQuantileNormalizer.h
- Timestamp:
- Feb 20, 2009, 1:52:57 AM (14 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/yat/normalizer/qQuantileNormalizer.h
r1803 r1810 47 47 \brief Perform Q-quantile normalization 48 48 49 After a Q-quantile normalization each column has approximately 50 the same distribution of data (the Q-quantiles are the 51 same). Also, within each column the rank of an element is not 52 changed. 53 54 The normalization goes like this 55 - Data is not assumed to be sorted. 56 - Partition sorted target data in N parts. N must be 3 larger 49 Perform a Q-quantile normalization on a \a source range, after 50 which it will approximately have the same distribution of data as 51 the \a target range (the Q-quantiles are the same). The rank of 52 an element in the \a source range is not changed. 53 54 The class works also with unweighed ranges, and there is no 55 restriction that weighted \a source range requires weighted \a 56 target range or vice versa. 57 58 Normalization goes like this: 59 - Data are not assumed to be sorted. 60 - Partition sorted \a target data in N parts. N must be 3 or larger 57 61 because of requirements from the underlying cspline fit 58 - Calculate the arithmetic mean for each part, the mean is62 - Calculate the arithmetic (weighted) mean for each part, the mean is 59 63 assigned to the mid point of each part. 60 - Do the above for the data to be tranformed (called source64 - Do the above for the data to be tranformed (called \a source 61 65 here). 62 - For each part, calculate the difference between the target and63 the source. Now we have N differences d_i with associated rank64 (midpoint of each part).65 - Create a cubic spline fit to this difference vector d. The66 resulting curve is used to recalculate all column values.66 - For each part, calculate the difference between the \a target 67 and \a the source. Now we have \a N differences \f$ d_i \f$ 68 with associated rank (midpoint of each part). 69 - Create a cubic spline fit to this difference vector \a d. The 70 resulting curve is used to recalculate all values in \a source. 67 71 - Use the cubic spline fit for values within the cubic spline 68 72 fit range [midpoint 1st part, midpoint last part]. 69 73 - For data outside the cubic spline fit use linear 70 extrapolation, i.e., a constant shift. d_first for points 71 below fit range, and d_last for points above fit range. 74 extrapolation, i.e., a constant shift. \f$ d_{first} \f$ for 75 points below fit range, and \f$ d_last \f$ for points above fit 76 range. 72 77 73 78 \since New in yat 0.5 … … 77 82 public: 78 83 /** 79 \brief Documentation please.84 \brief Contructor 80 85 81 86 \a Q is the number of parts and must be within \f$ [3,N] \f$
Note: See TracChangeset
for help on using the changeset viewer.