Changeset 1109


Ignore:
Timestamp:
Feb 19, 2008, 10:35:41 PM (16 years ago)
Author:
Peter
Message:

fixes #325

Location:
trunk/doc
Files:
3 edited
1 moved

Legend:

Unmodified
Added
Removed
  • trunk/doc/Makefile.am

    r1000 r1109  
    2525# 02111-1307, USA.
    2626
    27 doc: doxygen.config doxygen-local Statistics-local
     27doc: doxygen.config doxygen-local
    2828
    29 dvi-local: Statistics.dvi
     29dvi-local:
    3030
    31 pdf-local: Statistics.pdf
     31pdf-local:
    3232
    33 html-local: doxygen.config doxygen-local html/Statistics/Statistics.html
     33html-local: doxygen.config doxygen-local
    3434
    3535mostlyclean-local:
     
    4848
    4949
    50 html/Statistics/Statistics.html: Statistics.tex
    51   @$(install_sh) -d html/Statistics
    52   @if $(HAVE_LATEX2HTML); then \
    53   latex2html -t "Weighted Statistics used in yat." \
    54   --dir html/Statistics Statistics.tex;\
    55   fi
    56 
    57 Statistics-local: html/Statistics/Statistics.html \
    58   html/Statistics/Statistics.pdf
    59 
    60 Statistics.dvi: Statistics.tex
    61   @if $(HAVE_LATEX); then \
    62   @latex Statistics.tex; \
    63   @latex Statistics.tex; \
    64   fi
    65 
    66 Statistics.pdf: Statistics.dvi
    67   @if $(HAVE_DVIPDFM); then \
    68   dvipdfm Statistics.dvi; \
    69   fi
    70 
    71 html/Statistics/Statistics.pdf: $(pdf)
    72   @if test -f Statistics.pdf; then \
    73     $(install_sh) -d html/Statistics; \
    74     cp Statistics.pdf html/Statistics/.; \
    75   fi
  • trunk/doc/Statistics.doxygen

    r1093 r1109  
    1 \documentclass[12pt]{article}
    2 
    3 % $Id$
    4 %
    5 % Copyright (C) 2005 Peter Johansson
    6 % Copyright (C) 2006 Jari Häkkinen, Markus Ringnér, Peter Johansson
    7 % Copyright (C) 2007 Peter Johansson
    8 %
    9 % This file is part of the yat library, http://trac.thep.lu.se/yat
    10 %
    11 % The yat library is free software; you can redistribute it and/or
    12 % modify it under the terms of the GNU General Public License as
    13 % published by the Free Software Foundation; either version 2 of the
    14 % License, or (at your option) any later version.
    15 %
    16 % The yat library is distributed in the hope that it will be useful,
    17 % but WITHOUT ANY WARRANTY; without even the implied warranty of
    18 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
    19 % General Public License for more details.
    20 %
    21 % You should have received a copy of the GNU General Public License
    22 % along with this program; if not, write to the Free Software
    23 % Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
    24 % 02111-1307, USA.
    25 
    26 
    27 
    28 \flushbottom
    29 \footskip 54pt
    30 \headheight 0pt
    31 \headsep 0pt
    32 \oddsidemargin 0pt
    33 \parindent 0pt
    34 \parskip 2ex
    35 \textheight 230mm
    36 \textwidth 165mm
    37 \topmargin 0pt
    38 
    39 \renewcommand{\baselinestretch} {1.0}
    40 \renewcommand{\textfraction} {0.1}
    41 \renewcommand{\topfraction} {1.0}
    42 \renewcommand{\bottomfraction} {1.0}
    43 \renewcommand{\floatpagefraction} {1.0}
    44 
    45 \renewcommand{\d}{{\mathrm{d}}}
    46 \newcommand{\nd}{$^{\mathrm{nd}}$}
    47 \newcommand{\eg}{{\it {e.g.}}}
    48 \newcommand{\ie}{{\it {i.e., }}}
    49 \newcommand{\etal}{{\it {et al.}}}
    50 \newcommand{\eref}[1]{Eq.~(\ref{e:#1})}
    51 \newcommand{\fref}[1]{Fig.~\ref{f:#1}}
    52 \newcommand{\ovr}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)}
    53 
    54 \begin{document}
    55 
    56 \large
    57 {\bf Weighted Statistics}
    58 \normalsize
    59 
    60 \tableofcontents
    61 \clearpage
    62 
    63 \section{Introduction}
     1// $Id$
     2//
     3// Copyright (C) 2005 Peter Johansson
     4// Copyright (C) 2006 Jari Häkkinen, Markus Ringnér, Peter Johansson
     5// Copyright (C) 2007, 2008 Peter Johansson
     6//
     7// This file is part of the yat library, http://trac.thep.lu.se/yat
     8//
     9// The yat library is free software; you can redistribute it and/or
     10// modify it under the terms of the GNU General Public License as
     11// published by the Free Software Foundation; either version 2 of the
     12// License, or (at your option) any later version.
     13//
     14// The yat library is distributed in the hope that it will be useful,
     15// but WITHOUT ANY WARRANTY; without even the implied warranty of
     16// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
     17// General Public License for more details.
     18//
     19// You should have received a copy of the GNU General Public License
     20// along with this program; if not, write to the Free Software
     21// Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
     22// 02111-1307, USA.
     23
     24
     25/**
     26\page weighted_statistics Weighted Statistics
     27
     28\section Introduction
    6429There are several different reasons why a statistical analysis needs
    6530to adjust for weighting. In literature reasons are mainly diveded in
     
    9459function documentation for what assumtions are made. Though, common
    9560for implementations are the following:
    96 \begin{itemize}
    97 \item Setting all weights to unity yields the same result as the
     61
     62 - Setting all weights to unity yields the same result as the
    9863non-weighted version.
    99 \item Rescaling the weights does not change any function.
    100 \item Setting a weight to zero is equivalent to removing the data point.
    101 \end{itemize}
     64 - Rescaling the weights does not change any function.
     65 - Setting a weight to zero is equivalent to removing the data point.
     66
    10267An important case is when weights are binary (either 1 or 0). Then we
    10368get the same result using the weighted version as using the data with
     
    10671in a proper way.
    10772
    108 \section{AveragerWeighted}
    109 
    110 
    111 
    112 \subsection{Mean}
     73\section AveragerWeighted
     74
     75
     76
     77\subsection Mean
    11378
    11479For any situation the weight is always designed so the weighted mean
    115 is calculated as $m=\frac{\sum w_ix_i}{\sum w_i}$, which obviously
     80is calculated as \f$ m=\frac{\sum w_ix_i}{\sum w_i} \f$, which obviously
    11681fulfills the conditions above.
    11782
     83
     84
    11885In the case of varying measurement error, it could be motivated that
    119 the weight shall be $w_i = 1/\sigma_i^2$. We assume measurement error
     86the weight shall be \f$ w_i = 1/\sigma_i^2 \f$. We assume measurement error
    12087to be Gaussian and the likelihood to get our measurements is
    121 $L(m)=\prod
    122 (2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}}$.  We
    123 maximize the likelihood by taking the derivity with respect to $m$ on
    124 the logarithm of the likelihood $\frac{d\ln L(m)}{dm}=\sum
    125 \frac{x_i-m}{\sigma_i^2}$. Hence, the Maximum Likelihood method yields
    126 the estimator $m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2}$.
    127 
    128 
    129 \subsection{Variance}
     88\f$ L(m)=\prod
     89(2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}} \f$.  We
     90maximize the likelihood by taking the derivity with respect to \f$ m \f$ on
     91the logarithm of the likelihood \f$ \frac{d\ln L(m)}{dm}=\sum
     92\frac{x_i-m}{\sigma_i^2} \f$. Hence, the Maximum Likelihood method yields
     93the estimator \f$ m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2} \f$.
     94
     95
     96\subsection Variance
    13097In case of varying variance, there is no point estimating a variance
    13198since it is different for each data point.
    13299
    133100Instead we look at the case when we want to estimate the variance over
    134 $f$ but are sampling from $f'$. For the mean of an observable $O$ we
    135 have $\widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum
    136 w_i}$. Hence, an estimator of the variance of $X$ is
    137 \begin{eqnarray}
    138 \sigma^2=<X^2>-<X>^2=
    139 \\\frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}=
    140 \\\frac{\sum w_i(x_i^2-m^2)}{\sum w_i}=
    141 \\\frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}=
    142 \\\frac{\sum w_i(x_i-m)^2}{\sum w_i}
    143 \end{eqnarray}
     101\f$f\f$ but are sampling from \f$ f' \f$. For the mean of an observable \f$ O \f$ we
     102have \f$ \widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum
     103w_i} \f$. Hence, an estimator of the variance of \f$ X \f$ is
     104
     105\f$
     106s^2 = <X^2>-<X>^2=
     107\f$
     108
     109\f$
     110 = \frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}=
     111\f$
     112
     113\f$
     114 = \frac{\sum w_i(x_i^2-m^2)}{\sum w_i}=
     115\f$
     116
     117\f$
     118 = \frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}=
     119\f$
     120
     121\f$
     122 = \frac{\sum w_i(x_i-m)^2}{\sum w_i}
     123\f$
     124
    144125This estimator fulfills that it is invariant under a rescaling and
    145126having a weight equal to zero is equivalent to removing the data
    146 point. Having all weights equal to unity we get $\sigma=\frac{\sum
    147 (x_i-m)^2}{N}$, which is the same as returned from Averager. Hence,
     127point. Having all weights equal to unity we get \f$ \sigma=\frac{\sum
     128(x_i-m)^2}{N} \f$, which is the same as returned from Averager. Hence,
    148129this estimator is slightly biased, but still very efficient.
    149130
    150 \subsection{Standard Error}
     131\subsection standard_error Standard Error
    151132The standard error squared is equal to the expexted squared error of
    152 the estimation of $m$. The squared error consists of two parts, the
    153 variance of the estimator and the squared
    154 bias. $<m-\mu>^2=<m-<m>+<m>-\mu>^2=<m-<m>>^2+(<m>-\mu)^2$.  In the
    155 case when weights are included in analysis due to varying measurement
    156 errors and the weights can be treated as deterministic ,we have
    157 \begin{equation}
     133the estimation of \f$m\f$. The squared error consists of two parts, the
     134variance of the estimator and the squared bias:
     135
     136\f$
     137<m-\mu>^2=<m-<m>+<m>-\mu>^2=
     138\f$
     139\f$
     140<m-<m>>^2+(<m>-\mu)^2
     141\f$. 
     142
     143In the case when weights are included in analysis due to varying
     144measurement errors and the weights can be treated as deterministic, we
     145have
     146
     147\f$
    158148Var(m)=\frac{\sum w_i^2\sigma_i^2}{\left(\sum w_i\right)^2}=
     149\f$
     150\f$
    159151\frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}=
     152\f$
     153\f$
    160154\frac{\sigma_0^2}{\sum w_i},
    161 \end{equation}
    162 where we need to estimate $\sigma_0^2$. Again we have the likelihood
    163 $L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{-\frac{w_i(x-m)^2}{2\sigma_0^2}}$
    164 and taking the derivity with respect to $\sigma_o^2$, $\frac{d\ln
    165 L}{d\sigma_i^2}=\sum
    166 -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}$ which
    167 yields an estimator $\sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2$. This
     155\f$
     156
     157where we need to estimate \f$ \sigma_0^2 \f$. Again we have the likelihood
     158
     159\f$
     160L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{(-\frac{w_i(x-m)^2}{2\sigma_0^2})}
     161\f$
     162and taking the derivity with respect to
     163\f$\sigma_o^2\f$,
     164
     165\f$
     166\frac{d\ln L}{d\sigma_i^2}=
     167\f$
     168\f$
     169\sum -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}
     170\f$
     171
     172which
     173yields an estimator \f$ \sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2 \f$. This
    168174estimator is not ignoring weights equal to zero, because deviation is
    169175most often smaller than the expected infinity. Therefore, we modify
    170 the expression as follows $\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
    171 w_i\right)^2}\sum w_i(x-m)^2$ and we get the following estimator of
    172 the variance of the mean $\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
    173 w_i\right)^3}\sum w_i(x-m)^2$. This estimator fulfills the conditions
     176the expression as follows \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
     177w_i\right)^2}\sum w_i(x-m)^2\f$ and we get the following estimator of
     178the variance of the mean \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum
     179w_i\right)^3}\sum w_i(x-m)^2\f$. This estimator fulfills the conditions
    174180above: adding a weight zero does not change it: rescaling the weights
    175181does not change it, and setting all weights to unity yields the same
     
    178184In a case when it is not a good approximation to treat the weights as
    179185deterministic, there are two ways to get a better estimation. The
    180 first one is to linearize the expression $\left<\frac{\sum
    181 w_ix_i}{\sum w_i}\right>$. The second method when the situation is
     186first one is to linearize the expression \f$\left<\frac{\sum
     187w_ix_i}{\sum w_i}\right>\f$. The second method when the situation is
    182188more complicated is to estimate the standard error using a
    183189bootstrapping method.
    184190
    185 \section{AveragerPairWeighted}
    186 Here data points come in pairs (x,y). We are sampling from $f'_{XY}$
    187 but want to measure from $f_{XY}$. To compensate for this decrepency,
    188 averages of $g(x,y)$ are taken as $\sum \frac{f}{f'}g(x,y)$. Even
    189 though, $X$ and $Y$ are not independent $(f_{XY}\neq f_Xf_Y)$ we
    190 assume that we can factorize the ratio and get $\frac{\sum
    191 w_xw_yg(x,y)}{\sum w_xw_y}$
    192 \subsection{Covariance}
     191\section AveragerPairWeighted
     192Here data points come in pairs (x,y). We are sampling from \f$f'_{XY}\f$
     193but want to measure from \f$f_{XY}\f$. To compensate for this decrepency,
     194averages of \f$g(x,y)\f$ are taken as \f$\sum \frac{f}{f'}g(x,y)\f$. Even
     195though, \f$X\f$ and \f$Y\f$ are not independent \f$(f_{XY}\neq f_Xf_Y)\f$ we
     196assume that we can factorize the ratio and get \f$\frac{\sum
     197w_xw_yg(x,y)}{\sum w_xw_y}\f$
     198\subsection Covariance
    193199Following the variance calculations for AveragerWeighted we have
    194 $Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}$ where
    195 $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$
    196 
    197 \subsection{correlation}
    198 
    199 As the mean is estimated as $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$,
    200 the variance is estimated as $\sigma_x^2=\frac{\sum
    201 w_xw_y(x-m_x)^2}{\sum w_xw_y}$. As in the non-weighted case we define
    202 the correlation to be the ratio between the covariance and geometrical
    203 average of the variances
    204 
    205 $\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum
    206 w_xw_y(y-m_y)^2}}$.
     200\f$Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}\f$ where
     201\f$m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}\f$
     202
     203\subsection Correlation
     204
     205As the mean is estimated as
     206\f$
     207m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}
     208\f$,
     209the variance is estimated as
     210\f$
     211\sigma_x^2=\frac{\sum w_xw_y(x-m_x)^2}{\sum w_xw_y}
     212\f$.
     213As in the non-weighted case we define the correlation to be the ratio
     214between the covariance and geometrical average of the variances
     215
     216\f$
     217\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum
     218w_xw_y(y-m_y)^2}}
     219\f$.
     220
    207221
    208222This expression fulfills the following
    209 \begin{itemize}
    210 \item Having N weights the expression reduces to the non-weighted expression.
    211 \item Adding a pair of data, in which one weight is zero is equivalent
     223 - Having N equal weights the expression reduces to the non-weighted expression.
     224 - Adding a pair of data, in which one weight is zero is equivalent
    212225to ignoring the data pair.
    213 \item Correlation is equal to unity if and only if $x$ is equal to
    214 $y$. Otherwise the correlation is between -1 and 1.
    215 \end{itemize}
    216 \section{Score}
    217 
    218 
    219 \subsection{Pearson}
    220 
    221 $\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}$.
     226 - Correlation is equal to unity if and only if \f$x\f$ is equal to
     227\f$y\f$. Otherwise the correlation is between -1 and 1.
     228
     229\section Score
     230
     231\subsection Pearson
     232
     233\f$\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}\f$.
    222234
    223235See AveragerPairWeighted correlation.
    224236
    225 \subsection{ROC}
     237\subsection ROC
    226238
    227239An interpretation of the ROC curve area is the probability that if we
    228 take one sample from class $+$ and one sample from class $-$, what is
    229 the probability that the sample from class $+$ has greater value. The
     240take one sample from class \f$+\f$ and one sample from class \f$-\f$, what is
     241the probability that the sample from class \f$+\f$ has greater value. The
    230242ROC curve area calculates the ratio of pairs fulfilling this
    231243
    232 \begin{equation}
     244\f$
    233245\frac{\sum_{\{i,j\}:x^-_i<x^+_j}1}{\sum_{i,j}1}.
    234 \end{equation}
     246\f$
    235247
    236248An geometrical interpretation is to have a number of squares where
    237249each square correspond to a pair of samples. The ROC curve follows the
    238 border between pairs in which the samples from class $+$ has a greater
     250border between pairs in which the samples from class \f$+\f$ has a greater
    239251value and pairs in which this is not fulfilled. The ROC curve area is
    240252the area of those latter squares and a natural extension is to weight
     
    242254area becomes
    243255
    244 \begin{equation}
     256\f$
    245257\frac{\sum_{\{i,j\}:x^-_i<x^+_j}w^-_iw^+_j}{\sum_{i,j}w^-_iw^+_j}
    246 \end{equation}
     258\f$
    247259
    248260This expression is invariant under a rescaling of weight. Adding a
     
    250262all weight equal to unity yields the non-weighted ROC curve area.
    251263
    252 \subsection{tScore}
    253 
    254 Assume that $x$ and $y$ originate from the same distribution
    255 $N(\mu,\sigma_i^2)$ where $\sigma_i^2=\frac{\sigma_0^2}{w_i}$. We then
    256 estimate $\sigma_0^2$ as
    257 \begin{equation}
     264\subsection tScore
     265
     266Assume that \f$x\f$ and \f$y\f$ originate from the same distribution
     267\f$N(\mu,\sigma_i^2)\f$ where \f$\sigma_i^2=\frac{\sigma_0^2}{w_i}\f$. We then
     268estimate \f$\sigma_0^2\f$ as
     269\f$
    258270\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
    259271{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+
    260272\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2}
    261 \end{equation}
     273\f$
    262274The variance of difference of the means becomes
    263 \begin{eqnarray}
     275\f$
    264276Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum
    265277w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}=
    266278\frac{\sigma_0^2}{\sum w_i}+\frac{\sigma_0^2}{\sum w_i},
    267 \end{eqnarray}
     279\f$
    268280and consequently the t-score becomes
    269 \begin{equation}
     281\f$
    270282\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
    271283{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+
    272284\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2}
    273285\left(\frac{1}{\sum w_i}+\frac{1}{\sum w_i}\right),
    274 \end{equation}
    275 
    276 For a $w_i=w$ we this expression get condensed down to
    277 \begin{equation}
     286\f$
     287
     288For a \f$w_i=w\f$ we this expression get condensed down to
     289\f$
    278290\frac{w\sum (x-m_x)^2+w\sum (y-m_y)^2}
    279291{n_x+n_y-2}
    280292\left(\frac{1}{wn_x}+\frac{1}{wn_y}\right),
    281 \end{equation}
     293\f$
    282294in other words the good old expression as for non-weighted.
    283295
    284 \subsection{FoldChange}
     296\subsection FoldChange
    285297Fold-Change is simply the difference between the weighted mean of the
    286 two groups //$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}$
    287 
    288 \subsection{WilcoxonFoldChange}
    289 Taking all pair samples (one from class $+$ and one from class $-$)
     298two groups \f$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}\f$
     299
     300\subsection WilcoxonFoldChange
     301Taking all pair samples (one from class \f$+\f$ and one from class \f$-\f$)
    290302and calculating the weighted median of the distances.
    291303
    292 \section{Kernel}
    293 \subsection{Polynomial Kernel}
    294 The polynomial kernel of degree $N$ is defined as $(1+<x,y>)^N$, where
    295 $<x,y>$ is the linear kernel (usual scalar product). For the weighted
    296 case we define the linear kernel to be $<x,y>=\sum {w_xw_yxy}$ and the
     304\section Kernel
     305\subsection Polynomial Kernel
     306The polynomial kernel of degree \f$N\f$ is defined as \f$(1+<x,y>)^N\f$, where
     307\f$<x,y>\f$ is the linear kernel (usual scalar product). For the weighted
     308case we define the linear kernel to be \f$<x,y>=\sum {w_xw_yxy}\f$ and the
    297309polynomial kernel can be calculated as before
    298 $(1+<x,y>)^N$. Is this kernel a proper kernel (always being semi
    299 positive definite). Yes, because $<x,y>$ is obviously a proper kernel
     310\f$(1+<x,y>)^N\f$. Is this kernel a proper kernel (always being semi
     311positive definite). Yes, because \f$<x,y>\f$ is obviously a proper kernel
    300312as it is a scalar product. Adding a positive constant to a kernel
    301 yields another kernel so $1+<x,y>$ is still a proper kernel. Then also
    302 $(1+<x,y>)^N$ is a proper kernel because taking a proper kernel to the
    303 $Nth$ power yields a new proper kernel (see any good book on SVM).
     313yields another kernel so \f$1+<x,y>\f$ is still a proper kernel. Then also
     314\f$(1+<x,y>)^N\f$ is a proper kernel because taking a proper kernel to the
     315\f$Nth\f$ power yields a new proper kernel (see any good book on SVM).
    304316\subsection{Gaussian Kernel}
    305 We define the weighted Gaussian kernel as $\exp\left(-\frac{\sum
    306 w_xw_y(x-y)^2}{\sum w_xw_y}\right)$, which fulfills the conditions
     317We define the weighted Gaussian kernel as \f$\exp\left(-\frac{\sum
     318w_xw_y(x-y)^2}{\sum w_xw_y}\right)\f$, which fulfills the conditions
    307319listed in the introduction.
    308320
    309321Is this kernel a proper kernel? Yes, following the proof of the
    310 non-weighted kernel we see that $K=\exp\left(-\frac{\sum
     322non-weighted kernel we see that \f$K=\exp\left(-\frac{\sum
    311323w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum
    312 w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$,
    313 which is a product of two proper kernels. $\exp\left(-\frac{\sum
     324w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)\f$,
     325which is a product of two proper kernels. \f$\exp\left(-\frac{\sum
    314326w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum
    315 w_xw_y}\right)$ is a proper kernel, because it is a scalar product and
    316 $\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$ is a proper
     327w_xw_y}\right)\f$ is a proper kernel, because it is a scalar product and
     328\f$\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)\f$ is a proper
    317329kernel, because it a polynomial of the linear kernel with positive
    318330coefficients. As product of two kernel also is a kernel, the Gaussian
    319331kernel is a proper kernel.
    320332
    321 \section{Distance}
    322 
    323 \section{Regression}
    324 \subsection{Naive}
    325 \subsection{Linear}
     333\section Distance
     334
     335\section Regression
     336\subsection Naive
     337\subsection Linear
    326338We have the model
    327339
    328 \begin{equation}
     340\f$
    329341y_i=\alpha+\beta (x-m_x)+\epsilon_i,
    330 \end{equation}
    331 
    332 where $\epsilon_i$ is the noise. The variance of the noise is
     342\f$
     343
     344where \f$\epsilon_i\f$ is the noise. The variance of the noise is
    333345inversely proportional to the weight,
    334 $Var(\epsilon_i)=\frac{\sigma^2}{w_i}$. In order to determine the
     346\f$Var(\epsilon_i)=\frac{\sigma^2}{w_i}\f$. In order to determine the
    335347model parameters, we minimimize the sum of quadratic errors.
    336348
    337 \begin{equation}
     349\f$
    338350Q_0 = \sum \epsilon_i^2
    339 \end{equation}
    340 
    341 Taking the derivity with respect to $\alpha$ and $\beta$ yields two conditions
    342 
    343 \begin{equation}
     351\f$
     352
     353Taking the derivity with respect to \f$\alpha\f$ and \f$\beta\f$ yields two conditions
     354
     355\f$
    344356\frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha -
    345357\beta (x_i-m_x)=0
    346 \end{equation}
     358\f$
    347359
    348360and
    349361
    350 \begin{equation} \frac{\partial Q_0}{\partial \beta} = -2 \sum
     362\f$ \frac{\partial Q_0}{\partial \beta} = -2 \sum
    351363w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0
    352 \end{equation}
     364\f$
    353365
    354366or equivalently
    355367
    356 \begin{equation}
     368\f$
    357369\alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y
    358 \end{equation}
     370\f$
    359371
    360372and
    361373
    362 \begin{equation} \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum
     374\f$ \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum
    363375w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)}
    364 \end{equation}
     376\f$
    365377
    366378Note, by having all weights equal we get back the unweighted
    367379case. Furthermore, we calculate the variance of the estimators of
    368 $\alpha$ and $\beta$.
    369 
    370 \begin{equation}
     380\f$\alpha\f$ and \f$\beta\f$.
     381
     382\f$
    371383\textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}=
    372384\frac{\sigma^2}{\sum w_i}
    373 \end{equation}
     385\f$
    374386
    375387and
    376 \begin{equation}
     388\f$
    377389\textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}}
    378390{(\sum w_i(x_i-m_x)^2)^2}=
    379391\frac{\sigma^2}{\sum w_i(x_i-m_x)^2}
    380 \end{equation}
    381 
    382 Finally, we estimate the level of noise, $\sigma^2$. Inspired by the
     392\f$
     393
     394Finally, we estimate the level of noise, \f$\sigma^2\f$. Inspired by the
    383395unweighted estimation
    384396
    385 \begin{equation}
     397\f$
    386398s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2}
    387 \end{equation}
     399\f$
    388400
    389401we suggest the following estimator
    390402
    391 \begin{equation} s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum
    392 w_i-2\frac{\sum w_i^2}{\sum w_i}} \end{equation}
    393 
    394 \section{Outlook}
    395 \subsection{Hierarchical clustering}
    396 A hierarchical clustering consists of two things: finding the two
    397 closest data points, merge these two data points two a new data point
    398 and calculate the new distances from this point to all other points.
    399 
    400 In the first item, we need a distance matrix, and if we use Euclidean
    401 distanses the natural modification of the expression would be
    402 
    403 \begin{equation}
    404 d(x,y)=\frac{\sum w_i^xw_j^y(x_i-y_i)^2}{\sum w_i^xw_j^y}
    405 \end{equation}
    406 
    407 For the second item, inspired by average linkage, we suggest
    408 
    409 \begin{equation}
    410 d(xy,z)=\frac{\sum w_i^xw_j^z(x_i-z_i)^2+\sum
    411 w_i^yw_j^z(y_i-z_i)^2}{\sum w_i^xw_j^z+\sum w_i^yw_j^z}
    412 \end{equation}
    413 
    414 to be the distance between the new merged point $xy$ and $z$, and we
    415 also calculate new weights for this point: $w^{xy}_i=w^x_i+w^y_i$
    416 
    417 \end{document}
    418 
    419 
    420 
     403\f$ s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum
     404w_i-2\frac{\sum w_i^2}{\sum w_i}} \f$
     405
     406*/
     407
     408
     409
  • trunk/doc/doxygen.config.in

    r1070 r1109  
    350350# with spaces.
    351351
    352 INPUT                  = first_page.doxygen namespaces.doxygen concepts.doxygen ../yat
     352INPUT                  = first_page.doxygen namespaces.doxygen concepts.doxygen Statistics.doxygen ../yat
    353353
    354354# If the value of the INPUT tag contains directories, you can use the
     
    874874DOTFILE_DIRS           =
    875875
    876 # The MAX_DOT_GRAPH_WIDTH tag can be used to set the maximum allowed width
    877 # (in pixels) of the graphs generated by dot. If a graph becomes larger than
    878 # this value, doxygen will try to truncate the graph, so that it fits within
    879 # the specified constraint. Beware that most browsers cannot cope with very
    880 # large images.
    881 
    882 MAX_DOT_GRAPH_WIDTH    = 1024
    883 
    884 # The MAX_DOT_GRAPH_HEIGHT tag can be used to set the maximum allows height
    885 # (in pixels) of the graphs generated by dot. If a graph becomes larger than
    886 # this value, doxygen will try to truncate the graph, so that it fits within
    887 # the specified constraint. Beware that most browsers cannot cope with very
    888 # large images.
    889 
    890 MAX_DOT_GRAPH_HEIGHT   = 1024
    891 
    892876# If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will
    893877# generate a legend page explaining the meaning of the various boxes and
  • trunk/doc/first_page.doxygen

    r1000 r1109  
    3636   href="namespacemembers.html">Namespace Members</a> link above.
    3737
    38    There is a document on the weighted statistics included in the
    39    package with underlying theory and more detailed motivations [ <a
    40    href="Statistics/index.html">html</a> | <a
    41    href="Statistics/Statistics.pdf">pdf</a> ].
    42    
    4338   <b>Future development</b><br>
    4439   We use trac as issue tracking system. Through the <a
Note: See TracChangeset for help on using the changeset viewer.