# Changeset 1712 for trunk/yat

Ignore:
Timestamp:
Jan 13, 2009, 7:41:09 PM (12 years ago)
Message:

Addresses #425. Added linear extrapolation at ends. Added documentation. Cleaneup of code.

Location:
trunk/yat/normalizer
Files:
2 edited

Unmodified
Added
Removed
• ## trunk/yat/normalizer/qQuantileNormalizer.cc

 r1709 #include "yat/statistics/Averager.h" #include "yat/utility/Matrix.h" #include "yat/utility/VectorConstView.h" #include "yat/utility/Vector.h" #include "yat/utility/VectorBase.h" #include Partitioner::Partitioner(const utility::VectorConstView& vec, Partitioner::Partitioner(const utility::VectorBase& vec, unsigned int N) : average_(utility::Vector(N)), index_(utility::Vector(N)) qQuantileNormalizer::qQuantileNormalizer(const utility::VectorConstView& target, qQuantileNormalizer::qQuantileNormalizer(const utility::VectorBase& target, unsigned int Q) : target_(Partitioner(target,Q)) diff-=target_.averages(); const utility::Vector& idx=target_.index(); regression::CSplineInterpolation cspline(idx,diff); // add linear interpolation for first part for (size_t row=0; row
• ## trunk/yat/normalizer/qQuantileNormalizer.h

 r1711 namespace utility { class Matrix; class VectorConstView; class VectorBase; } namespace normalizer { /** \brief Documentation please. \brief Partition a vector of data into equal sizes. The class also calculates the average of each part and assigns the average to the mid point of each part. The midpoint is a double, i.e., it is not forced to be an integer index. */ class Partitioner public: /** \brief Documentation please. \brief Create the partition and perform required calculations. */ Partitioner(const utility::VectorConstView& vec, unsigned int N); Partitioner(const utility::VectorBase& vec, unsigned int N); /** \brief Documentation please. \brief Return the averages for each part. \return The average vector. */ const utility::Vector& averages(void) const; /** \brief Documentation please. \brief Return the mid point for each partition. \return The index vector. */ const utility::Vector& index(void) const; /** \brief The number of parts. \return The number of parts. */ size_t size(void) const; \brief Perform Q-quantile normalization After a Q-quantile normalization each column has the same distribution of data (the Q-quantiles are the same). Also, within each column the rank of an element is not changed. After a Q-quantile normalization each column has approximately the same distribution of data (the Q-quantiles are the same). Also, within each column the rank of an element is not changed. There is currently no weighted version of qQuantileNormalizer The normalization goes like this 0. Data is not assumed to be sorted. 1. Partition the target data in N+1 parts. The ends have half size of the "normal" part size ( = \#targetdata/N ) 2. Calculate the arithmetic mean for each part 3. Do the same for the data to be tranformed (called source here). 4. For each part, calculate the difference between the target and the source. Now we have N differences d_i. 5. Create a cubic spline fit to this difference vector d. The resulting curve is used to recalculate all column values. I. For values in parts 1 through N-1 we use a cubic spline fit. II. For end parts 0 and N linear interpolation is used Linear interpolation simply means a translation. - Data is not assumed to be sorted. - Partition the target data in N parts. - Calculate the arithmetic mean for each part, the mean is assigned to the mid point of each part. - Do the same for the data to be tranformed (called source here). - For each part, calculate the difference between the target and the source. Now we have N differences d_i with associated rank (midpoint of each part). - Create a cubic spline fit to this difference vector d. The resulting curve is used to recalculate all column values. - Use the cubic spline fit for values within the cubic spline fit range [midpoint 1st part, midpoint last part]. - For data outside the cubic spline fit use linear extrapolation, i.e., a constant shift. d_first for points below fit range, and d_last for points above fit range. \since New in yat 0.5 undefined. Keep \f\$ N \f\$ equal to or less than the smallest number of data points in the target or each data set to be normalized with a ginven target. normalized against a ginven target. */ qQuantileNormalizer(const utility::VectorConstView& target, unsigned int Q); qQuantileNormalizer(const utility::VectorBase& target, unsigned int Q); /**
Note: See TracChangeset for help on using the changeset viewer.