Changeset 494


Ignore:
Timestamp:
Jan 10, 2006, 2:44:14 PM (16 years ago)
Author:
Peter
Message:

added pdf link and creation of Statistics document

Location:
trunk
Files:
5 edited

Legend:

Unmodified
Added
Removed
  • trunk/doc/Makefile.am

    r489 r494  
    33## $Id$
    44
    5 all-local: doxygen-local Statistics
     5all-local: doxygen-local Statistics-local
    66
    77clean-local:
    8   @rm -rf html latex *~
     8  @rm -rf html latex Statistics Statistics.toc Statistics.pdf Statistics.log \
     9  Statistics.dvi Statistics.aux *~
    910
    1011distclean-local: clean-local
     
    1415
    1516Statistics: Statistics.tex
     17  @rm -rf html/Statistics
     18  @rm -f Statistics
    1619  latex2html Statistics.tex
     20  @mv Statistics html/.
     21  @ln -s html/Statistics Statistics
     22
     23Statistics-local: Statistics Statistics.pdf
     24
     25Statistics.dvi: Statistics.tex
     26  @latex Statistics.tex
     27  @latex Statistics.tex
     28
     29Statistics.pdf: Statistics.dvi
     30  @dvipdfm Statistics.dvi
     31  @cp -p Statistics.pdf Statistics/.
     32
  • trunk/doc/Statistics.tex

    r492 r494  
    1 \documentstyle[12pt]{article}
     1\documentclass[12pt]{article}
     2
     3\usepackage{html}
     4
    25
    36\flushbottom
     
    1215\topmargin 0pt
    1316
    14 \newcommand{\bea} {\begin{eqnarray}}
    15 \newcommand{\eea} {\end{eqnarray}}
    16 \newcommand{\beq} {\begin{equation}}
    17 \newcommand{\eeq} {\end{equation}}
    18 \newcommand{\bibl}[5]
    19         {#1, {\it #2} {\bf #3} (#4) #5}
    20 \newcommand{\ol}{\overline}
    21 
    2217\renewcommand{\baselinestretch} {1.0}
    2318\renewcommand{\textfraction} {0.1}
     
    3530\newcommand{\ovr}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)}
    3631
    37 % Use these to include comments and remarks into the text, these will
    38 % obviously appear as footnotes in the final output.
    39 \newcommand{\CR}[1]{\footnote{CR: #1}}
    40 \newcommand{\JH}[1]{\footnote{JH: #1}}
    41 \newcommand{\PE}[1]{\footnote{PE: #1}}
    42 \newcommand{\PJ}[1]{\footnote{PJ: #1}}
    43 
    4432\begin{document}
    4533
     
    4735{\bf Weighted Statistics}
    4836\normalsize
     37\begin{htmlonly}
     38This document is also available in
     39\htmladdnormallink{PDF}{Statistics.pdf}.
     40\end{htmlonly}
    4941
    5042\tableofcontents
     
    5244
    5345\section{Introduction}
    54 There are several different reasons why a statistical analysis needs to adjust for weighting. In literature reasons are mainly diveded in to groups.
    55 
    56 The first group is when some of the measurements are known to be more precise than others. The more precise a measuremtns is the larger weight it is given. The simplest case is when the weight are given before the measurements and they can be treated as deterministic. It becomes more complicated when the weight can be determined not until afterwards, and even more complicated if the weight depends on the value of the observable.
    57 
    58 The second group of situations is when calculating averages over one distribution and sampling from another distribution. Compensating for this discrepency weights are introduced to the analysis. A simple example may be that we are interviewing people but for economical reasons we choose to interview more people from the city than from the countryside. When summarizing the statistics the answers from the city are given a smaller weight. In this example we are choosing the proportions of people from countryside and people from city being intervied. Hence, we can determine the weights before and consider them to be deterministic. In other situations the proportions are not deterministic, but rather a result from the sampling and the weights must be treated as stochastic and only in rare situations the weights can be treated as independent of the observable.
    59 
    60 Since there are various origin for a weight occuring in a statistical analysis, there are various way to treat the weights and in general the analysis should be tailored to treat the weights correctly. We have not chosen one situation for our implementations, so see specific function documentation for what assumtions are made. Though, common for implementationare the following:
     46There are several different reasons why a statistical analysis needs
     47to adjust for weighting. In literature reasons are mainly diveded in
     48to groups.
     49
     50The first group is when some of the measurements are known to be more
     51precise than others. The more precise a measuremtns is the larger
     52weight it is given. The simplest case is when the weight are given
     53before the measurements and they can be treated as deterministic. It
     54becomes more complicated when the weight can be determined not until
     55afterwards, and even more complicated if the weight depends on the
     56value of the observable.
     57
     58The second group of situations is when calculating averages over one
     59distribution and sampling from another distribution. Compensating for
     60this discrepency weights are introduced to the analysis. A simple
     61example may be that we are interviewing people but for economical
     62reasons we choose to interview more people from the city than from the
     63countryside. When summarizing the statistics the answers from the city
     64are given a smaller weight. In this example we are choosing the
     65proportions of people from countryside and people from city being
     66intervied. Hence, we can determine the weights before and consider
     67them to be deterministic. In other situations the proportions are not
     68deterministic, but rather a result from the sampling and the weights
     69must be treated as stochastic and only in rare situations the weights
     70can be treated as independent of the observable.
     71
     72Since there are various origin for a weight occuring in a statistical
     73analysis, there are various way to treat the weights and in general
     74the analysis should be tailored to treat the weights correctly. We
     75have not chosen one situation for our implementations, so see specific
     76function documentation for what assumtions are made. Though, common
     77for implementationare the following:
    6178\begin{itemize}
    62 \item Setting all weights to unity yields the same result as the non-weighted version.
     79\item Setting all weights to unity yields the same result as the
     80non-weighted version.
    6381\item Rescaling the weights does not change any function.
    6482\item Setting a weight to zero is equivalent to removing the data point.
    6583\end{itemize}
    66 An important case is when weights are binary (either 1 or 0). Then we get same result using the weighted version as using the data with weight not equal to zero and the non-weighted version. Hence, using binary weights and the weighted version missing values can be treated in a proper way.
     84An important case is when weights are binary (either 1 or 0). Then we
     85get same result using the weighted version as using the data with
     86weight not equal to zero and the non-weighted version. Hence, using
     87binary weights and the weighted version missing values can be treated
     88in a proper way.
    6789
    6890\section{AveragerWeighted}
     
    7294\subsection{Mean}
    7395
    74 For any situation the weight is always designed so the weighted mean is calculated as $m=\frac{\sum w_ix_i}{\sum w_i}$, which obviously fulfills the conditions above.
    75 
    76 In the case of varying measurement error, it could be motivated that the weight shall be $w_i = 1/\sigma_i^2$. We assume measurement error to be Gaussian and the likelihood to get our measurements is
    77 $L(m)=\prod (2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}}$.
    78 We maximize the likelihood by taking the derivity with respect to $m$ on the logarithm of the likelihood
    79 $\frac{d\ln L(m)}{dm}=\sum \frac{x_i-m}{\sigma_i^2}$. Hence, the Maximum Likelihood method yields the estimator
    80 $m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2}$.
     96For any situation the weight is always designed so the weighted mean
     97is calculated as $m=\frac{\sum w_ix_i}{\sum w_i}$, which obviously
     98fulfills the conditions above.
     99
     100In the case of varying measurement error, it could be motivated that
     101the weight shall be $w_i = 1/\sigma_i^2$. We assume measurement error
     102to be Gaussian and the likelihood to get our measurements is
     103$L(m)=\prod
     104(2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}}$.  We
     105maximize the likelihood by taking the derivity with respect to $m$ on
     106the logarithm of the likelihood $\frac{d\ln L(m)}{dm}=\sum
     107\frac{x_i-m}{\sigma_i^2}$. Hence, the Maximum Likelihood method yields
     108the estimator $m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2}$.
    81109
    82110
    83111\subsection{Variance}
    84 In case of varying variance, there is no point estimating a variance since it is different for each data point.
    85 
    86 Instead we look at the case when we want to estimate the variance over $f$ but are sampling from $f'$. For the mean of an observable $O$ we have
    87 $\widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum w_i}$. Hence, an estimator of the variance of $X$ is
     112In case of varying variance, there is no point estimating a variance
     113since it is different for each data point.
     114
     115Instead we look at the case when we want to estimate the variance over
     116$f$ but are sampling from $f'$. For the mean of an observable $O$ we
     117have $\widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum
     118w_i}$. Hence, an estimator of the variance of $X$ is
    88119\begin{eqnarray}
    89120\sigma^2=<X^2>-<X>^2=
     
    93124\\\frac{\sum w_i(x_i-m)^2}{\sum w_i}
    94125\end{eqnarray}
    95 This estimator fulfills that it is invariant under a rescaling and having a weight equal to zero is equivalent to removing the data point. Having all weight equal to unity we get $\sigma=\frac{\sum (x_i-m)^2}{N}$, which is the same as returned from Averager. Hence, this estimator is slightly biased, but still very efficient.
     126This estimator fulfills that it is invariant under a rescaling and
     127having a weight equal to zero is equivalent to removing the data
     128point. Having all weight equal to unity we get $\sigma=\frac{\sum
     129(x_i-m)^2}{N}$, which is the same as returned from Averager. Hence,
     130this estimator is slightly biased, but still very efficient.
    96131
    97132\subsection{Standard Error}
    98 The standard error squared is equal to the expexted squared error of the estimation of $m$. The squared error consists of two parts, the variance of the estimator and the squared bias. $<m-\mu>^2=<m-<m>+<m>-\mu>^2=<m-<m>>^2+(<m>-\mu)^2$.
    99 In the case when weights are included in analysis due to varying measurement errors and the weights can be treated as deterministic ,we have
    100 \begin{eqnarray}
     133The standard error squared is equal to the expexted squared error of
     134the estimation of $m$. The squared error consists of two parts, the
     135variance of the estimator and the squared
     136bias. $<m-\mu>^2=<m-<m>+<m>-\mu>^2=<m-<m>>^2+(<m>-\mu)^2$.  In the
     137case when weights are included in analysis due to varying measurement
     138errors and the weights can be treated as deterministic ,we have
     139\begin{equation}
    101140Var(m)=\frac{\sum w_i^2\sigma_i^2}{\left(\sum w_i\right)^2}=
    102 \\\frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}
     141\frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}=
    103142\frac{\sigma_0^2}{\sum w_i},
    104 \end{eqnarray}
     143\end{equation}
    105144where we need to estimate $\sigma_0^2$. Again we have the likelihood
    106 $L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{-\frac{w_i(x-m)^2}{2\sigma_0^2}}$ and taking the derivity with respect to $\sigma_o^2$,
    107 $\frac{d\ln L}{d\sigma_i^2}=\sum -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}$
    108 which yields an estimator $\sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2$. This estimator is not ignoring weights equal to zero, because deviation is most often smaller than the expected infinity. Therefore, we modify the expression as follows $\sigma_i^2=\frac{\sum w_i^2}{\left(\sum w_i\right)^2}\sum w_i(x-m)^2$ and we get the following estimator of the variance of the mean
    109  $\sigma_i^2=\frac{\sum w_i^2}{\left(\sum w_i\right)^3}\sum w_i(x-m)^2$. This estimator fulfills the conditions above: adding a weight zero does not change it: rescaling the weights does not change it, and setting all weights to unity yields the same expression as in the non-weighted case.
    110 
    111 In a case when it is not a good approximation to treat the weights as deterministic, there are two ways to get a better estimation. The first one is to linearize the expression $\left<\frac{\sum w_ix_i}{\sum w_i}\right>$. The second method when the situation is more complicated is to estimate the standard error using a bootstrapping method. 
     145$L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{-\frac{w_i(x-m)^2}{2\sigma_0^2}}$
     146and taking the derivity with respect to $\sigma_o^2$, $\frac{d\ln
     147L}{d\sigma_i^2}=\sum
     148-\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}$ which
     149yields an estimator $\sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2$. This
     150estimator is not ignoring weights equal to zero, because deviation is
     151most often smaller than the expected infinity. Therefore, we modify
     152the expression as follows $\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
     153w_i\right)^2}\sum w_i(x-m)^2$ and we get the following estimator of
     154the variance of the mean $\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
     155w_i\right)^3}\sum w_i(x-m)^2$. This estimator fulfills the conditions
     156above: adding a weight zero does not change it: rescaling the weights
     157does not change it, and setting all weights to unity yields the same
     158expression as in the non-weighted case.
     159
     160In a case when it is not a good approximation to treat the weights as
     161deterministic, there are two ways to get a better estimation. The
     162first one is to linearize the expression $\left<\frac{\sum
     163w_ix_i}{\sum w_i}\right>$. The second method when the situation is
     164more complicated is to estimate the standard error using a
     165bootstrapping method.
    112166
    113167\section{AveragerPairWeighted}
    114 Here data points come in pairs (x,y). We are sampling from $f'_{XY}$ but want to measure from $f_{XY}$. To compensate for this decrepency, averages of $g(x,y)$ are taken as $\sum \frac{f}{f'}g(x,y)$. Even though, $X$ and $Y$ are not independent $(f_{XY}\neq f_Xf_Y)$ we assume that we can factorize the ratio and get $\frac{\sum w_xw_yg(x,y)}{\sum w_xw_y}$ 
     168Here data points come in pairs (x,y). We are sampling from $f'_{XY}$
     169but want to measure from $f_{XY}$. To compensate for this decrepency,
     170averages of $g(x,y)$ are taken as $\sum \frac{f}{f'}g(x,y)$. Even
     171though, $X$ and $Y$ are not independent $(f_{XY}\neq f_Xf_Y)$ we
     172assume that we can factorize the ratio and get $\frac{\sum
     173w_xw_yg(x,y)}{\sum w_xw_y}$
    115174\subsection{Covariance}
    116 Following the variance calculations for AveragerWeighted we have $Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}$ where $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$
     175Following the variance calculations for AveragerWeighted we have
     176$Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}$ where
     177$m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$
    117178
    118179\subsection{correlation}
    119180
    120 As the mean is estimated as $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$, the variance is estimated as
    121 $\sigma_x^2=\frac{\sum w_xw_y(x-m_x)^2}{\sum w_xw_y}$. As in the non-weighted case we define the correlation to be the ratio between the covariance and geomtrical avergae of the variances 
    122 
    123 $\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum w_xw_y(y-m_y)^2}}$.
     181As the mean is estimated as $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$,
     182the variance is estimated as $\sigma_x^2=\frac{\sum
     183w_xw_y(x-m_x)^2}{\sum w_xw_y}$. As in the non-weighted case we define
     184the correlation to be the ratio between the covariance and geomtrical
     185avergae of the variances
     186
     187$\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum
     188w_xw_y(y-m_y)^2}}$.
    124189
    125190This expression fulfills the following
    126191\begin{itemize}
    127192\item Having N weights the expression reduces to the non-weighted expression.
    128 \item Adding a pair of data, in which one weight is zero is equivalent to ignoring the data pair.
    129 \item Correlation is equal to unity if and only if $x$ is equal to $y$. Otherwise the correlation is between -1 and 1.
     193\item Adding a pair of data, in which one weight is zero is equivalent
     194to ignoring the data pair.
     195\item Correlation is equal to unity if and only if $x$ is equal to
     196$y$. Otherwise the correlation is between -1 and 1.
    130197\end{itemize}
    131198\section{Score}
     
    140207\subsection{ROC}
    141208
    142 An interpretation of the ROC curve area is the probability that if we take one sample from class $+$ and one sample from class $-$, what is the probability that the sample from class $+$ has greater value. The ROC curve area calculates the ratio of pairs fulfilling this
    143 
    144 \beq
     209An interpretation of the ROC curve area is the probability that if we
     210take one sample from class $+$ and one sample from class $-$, what is
     211the probability that the sample from class $+$ has greater value. The
     212ROC curve area calculates the ratio of pairs fulfilling this
     213
     214\begin{equation}
    145215\frac{\sum_{\{i,j\}:x^-_i<x^+_j}1}{\sum_{i,j}1}.
    146 \eeq
    147 
    148 An geometrical interpretation is to have a number of squares where each square correspond to a pair of samples. The ROC curve follows the border between pairs in which the samples from class $+$ has a greater value and pairs in which this is not fulfilled. The ROC curve area is the area of those latter squares and a natural extension is to weight each pair with its two weights and consequently the weighted ROC curve area becomes
    149 
    150 \beq
     216\end{equation}
     217
     218An geometrical interpretation is to have a number of squares where
     219each square correspond to a pair of samples. The ROC curve follows the
     220border between pairs in which the samples from class $+$ has a greater
     221value and pairs in which this is not fulfilled. The ROC curve area is
     222the area of those latter squares and a natural extension is to weight
     223each pair with its two weights and consequently the weighted ROC curve
     224area becomes
     225
     226\begin{equation}
    151227\frac{\sum_{\{i,j\}:x^-_i<x^+_j}w^-_iw^+_j}{\sum_{i,j}w^-_iw^+_j}
    152 \eeq
    153 
    154 This expression is invariant under a rescaling of weight. Adding a data value with weight zero adds nothing to the exprssion, and having all weight equal to unity yields the non-weighted ROC curve area.
     228\end{equation}
     229
     230This expression is invariant under a rescaling of weight. Adding a
     231data value with weight zero adds nothing to the exprssion, and having
     232all weight equal to unity yields the non-weighted ROC curve area.
    155233
    156234\subsection{tScore}
    157235
    158 Assume that $x$ and $y$ originate from the same distribution $N(\mu,\sigma_i^2)$ where $\sigma_i^2=\frac{\sigma_0^2}{w_i}$. We then estimate $\sigma_0^2$ as
     236Assume that $x$ and $y$ originate from the same distribution
     237$N(\mu,\sigma_i^2)$ where $\sigma_i^2=\frac{\sigma_0^2}{w_i}$. We then
     238estimate $\sigma_0^2$ as
    159239\begin{equation}
    160240\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
     
    164244The variance of difference of the means becomes
    165245\begin{eqnarray}
    166 Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}=
     246Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum
     247w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}=
    167248\frac{\sigma_0^2}{\sum w_i}+\frac{\sigma_0^2}{\sum w_i},
    168249\end{eqnarray}
     
    184265
    185266\subsection{FoldChange}
    186 Fold-Change is simply the difference between the weighted mean of the two groups //$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}$
     267Fold-Change is simply the difference between the weighted mean of the
     268two groups //$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}$
    187269
    188270\subsection{WilcoxonFoldChange}
    189 Taking all pair samples (one from class $+$ and one from class $-$) and calculating the weighted median of the distances.
     271Taking all pair samples (one from class $+$ and one from class $-$)
     272and calculating the weighted median of the distances.
    190273
    191274\section{Kernel}
    192275\subsection{Polynomial Kernel}
    193 The polynomial kernel of degree $N$ is defined as $(1+<x,y>)^N$, where $<x,y>$ is the linear kenrel (usual scalar product). For weights we define the linear kernel to be $<x,y>=\frac{\sum w_xw_yxy}{\sum w_xw_y}$ and the polynomial kernel can be calculated as before $(1+<x,y>)^N$. Is this kernel a proper kernel (always being semi positive definite). Yes, because $<x,y>$ is obviously a proper kernel as it is a scalar product. Adding a positive constant to a kernel yields another kernel so $1+<x,y>$ is still a proper kernel. Then also $(1+<x,y>)^N$ is a proper kernel because taking a proper kernel to the $Nth$ power yields a new proper kernel (see any good book on SVM).
     276The polynomial kernel of degree $N$ is defined as $(1+<x,y>)^N$, where
     277$<x,y>$ is the linear kenrel (usual scalar product). For weights we
     278define the linear kernel to be $<x,y>=\frac{\sum w_xw_yxy}{\sum
     279w_xw_y}$ and the polynomial kernel can be calculated as before
     280$(1+<x,y>)^N$. Is this kernel a proper kernel (always being semi
     281positive definite). Yes, because $<x,y>$ is obviously a proper kernel
     282as it is a scalar product. Adding a positive constant to a kernel
     283yields another kernel so $1+<x,y>$ is still a proper kernel. Then also
     284$(1+<x,y>)^N$ is a proper kernel because taking a proper kernel to the
     285$Nth$ power yields a new proper kernel (see any good book on SVM).
    194286\subsection{Gaussian Kernel}
    195 We define the weighted Gaussian kernel as
    196 $\exp\left(-\frac{\sum w_xw_y(x-y)^2}{\sum w_xw_y}\right)$, which fulfills the conditions listed in the introduction.
    197 
    198 Is this kernel a proper kernel? Yes, following the proof of the non-weighted kernel we see that $K=\exp\left(-\frac{\sum w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$, which is a product of two proper kernels. $\exp\left(-\frac{\sum w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum w_xw_y}\right)$ is a proper kernel, because it is a scalar product and $\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$ is a proper kernel, because it a polynomial of the linear kernel with positive coefficients. As product of two kernel also is a kernel, the Gaussian kernel is a proper kernel.
     287We define the weighted Gaussian kernel as $\exp\left(-\frac{\sum
     288w_xw_y(x-y)^2}{\sum w_xw_y}\right)$, which fulfills the conditions
     289listed in the introduction.
     290
     291Is this kernel a proper kernel? Yes, following the proof of the
     292non-weighted kernel we see that $K=\exp\left(-\frac{\sum
     293w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum
     294w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$,
     295which is a product of two proper kernels. $\exp\left(-\frac{\sum
     296w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum
     297w_xw_y}\right)$ is a proper kernel, because it is a scalar product and
     298$\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$ is a proper
     299kernel, because it a polynomial of the linear kernel with positive
     300coefficients. As product of two kernel also is a kernel, the Gaussian
     301kernel is a proper kernel.
    199302
    200303\section{Distance}
     
    205308We have the model
    206309
    207 \beq
     310\begin{equation}
    208311y_i=\alpha+\beta (x-m_x)+\epsilon_i,
    209 \eeq
     312\end{equation}
    210313
    211314where $\epsilon_i$ is the noise. The variance of the noise is
     
    214317model parameters, we minimimize the sum of quadratic errors.
    215318
    216 \beq
     319\begin{equation}
    217320Q_0 = \sum \epsilon_i^2
    218 \eeq
     321\end{equation}
    219322
    220323Taking the derivity with respect to $\alpha$ and $\beta$ yields two conditions
    221324
    222 \beq
    223 \frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha - \beta (x_i-m_x)=0
    224 \eeq
     325\begin{equation}
     326\frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha -
     327\beta (x_i-m_x)=0
     328\end{equation}
    225329
    226330and
    227331
    228 \beq
    229 \frac{\partial Q_0}{\partial \beta} = -2 \sum w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0
    230 \eeq
     332\begin{equation} \frac{\partial Q_0}{\partial \beta} = -2 \sum
     333w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0
     334\end{equation}
    231335
    232336or equivalently
    233337
    234 \beq
     338\begin{equation}
    235339\alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y
    236 \eeq
     340\end{equation}
    237341
    238342and
    239343
    240 \beq
    241 \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)}
    242 \eeq
     344\begin{equation} \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum
     345w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)}
     346\end{equation}
    243347
    244348Note, by having all weights equal we get back the unweighted
     
    246350$\alpha$ and $\beta$.
    247351
    248 \beq
     352\begin{equation}
    249353\textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}=
    250354\frac{\sigma^2}{\sum w_i}
    251 \eeq
     355\end{equation}
    252356
    253357and
    254 \beq
     358\begin{equation}
    255359\textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}}
    256360{(\sum w_i(x_i-m_x)^2)^2}=
    257361\frac{\sigma^2}{\sum w_i(x_i-m_x)^2}
    258 \eeq
     362\end{equation}
    259363
    260364Finally, we estimate the level of noise, $\sigma^2$. Inspired by the
    261365unweighted estimation
    262366
    263 \beq
     367\begin{equation}
    264368s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2}
    265 \eeq
     369\end{equation}
    266370
    267371we suggest the following estimator
    268372
    269 \beq
    270 s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum w_i-2\frac{\sum w_i^2}{\sum w_i}}
    271 \eeq
     373\begin{equation} s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum
     374w_i-2\frac{\sum w_i^2}{\sum w_i}} \end{equation}
    272375
    273376\section{Outlook}
    274377\subsection{Hierarchical clustering}
    275 \label{hc}
    276378A hierarchical clustering consists of two things: finding the two
    277379closest data points, merge these two data points two a new data point
    278 and calculate the new distances from this point to all other points.\\
     380and calculate the new distances from this point to all other points.
    279381
    280382In the first item, we need a distance matrix, and if we use Euclidean
    281383distanses the natural modification of the expression would be
    282384
    283 \beq
    284 d(x,y)=\frac{\sum w_i^xw_j^y(x_i-y_i)^2}{\sum w_i^xw_j^y} \eeq \\
     385\begin{equation}
     386d(x,y)=\frac{\sum w_i^xw_j^y(x_i-y_i)^2}{\sum w_i^xw_j^y}
     387\end{equation}
    285388
    286389For the second item, inspired by average linkage, we suggest
    287390
    288 \beq
    289 d(xy,z)=\frac{\sum w_i^xw_j^z(x_i-z_i)^2+\sum w_i^yw_j^z(y_i-z_i)^2}{\sum w_i^xw_j^z+\sum w_i^yw_j^z}
    290 \eeq
     391\begin{equation}
     392d(xy,z)=\frac{\sum w_i^xw_j^z(x_i-z_i)^2+\sum
     393w_i^yw_j^z(y_i-z_i)^2}{\sum w_i^xw_j^z+\sum w_i^yw_j^z}
     394\end{equation}
    291395
    292396to be the distance between the new merged point $xy$ and $z$, and we
  • trunk/doc/namespaces.doxygen

    r458 r494  
    4545/// @namespace theplu::statistics All classes and functions related to
    4646/// statistical methods or functions should be defined within this
    47 /// namespace.
     47/// namespace.
     48/// See <a href="Statistics/index.html">Weighted Statistics document</a><br>
    4849///
    4950/// @brief Statistical methods and functions
  • trunk/lib/statistics/AveragerPairWeighted.h

    r490 r494  
    1717  /// a weighted manner.
    1818  ///
    19   /// <a href="../Statistics/index.html">Weighted Statistics document</a>
     19  /// <a href="Statistics/index.html">Weighted Statistics document</a>
    2020  ///
    2121  /// If nothing else stated, each function fulfills the
  • trunk/lib/statistics/AveragerWeighted.h

    r492 r494  
    3030  /// \f$. Compensating for this discrepancy averages of observables
    3131  /// are taken to be \f$ \sum \frac{f}{f'}X \f$ For further discussion:
    32   /// <a href="../Statistics/index.html">Weighted Statistics document</a><br>
     32  /// <a href="Statistics/index.html">Weighted Statistics document</a><br>
    3333  ///
    3434  /// If nothing else stated, each function fulfills the
Note: See TracChangeset for help on using the changeset viewer.