Changeset 1109
- Timestamp:
- Feb 19, 2008, 10:35:41 PM (16 years ago)
- Location:
- trunk/doc
- Files:
-
- 3 edited
- 1 moved
Legend:
- Unmodified
- Added
- Removed
-
trunk/doc/Makefile.am
r1000 r1109 25 25 # 02111-1307, USA. 26 26 27 doc: doxygen.config doxygen-local Statistics-local27 doc: doxygen.config doxygen-local 28 28 29 dvi-local: Statistics.dvi29 dvi-local: 30 30 31 pdf-local: Statistics.pdf31 pdf-local: 32 32 33 html-local: doxygen.config doxygen-local html/Statistics/Statistics.html33 html-local: doxygen.config doxygen-local 34 34 35 35 mostlyclean-local: … … 48 48 49 49 50 html/Statistics/Statistics.html: Statistics.tex51 @$(install_sh) -d html/Statistics52 @if $(HAVE_LATEX2HTML); then \53 latex2html -t "Weighted Statistics used in yat." \54 --dir html/Statistics Statistics.tex;\55 fi56 57 Statistics-local: html/Statistics/Statistics.html \58 html/Statistics/Statistics.pdf59 60 Statistics.dvi: Statistics.tex61 @if $(HAVE_LATEX); then \62 @latex Statistics.tex; \63 @latex Statistics.tex; \64 fi65 66 Statistics.pdf: Statistics.dvi67 @if $(HAVE_DVIPDFM); then \68 dvipdfm Statistics.dvi; \69 fi70 71 html/Statistics/Statistics.pdf: $(pdf)72 @if test -f Statistics.pdf; then \73 $(install_sh) -d html/Statistics; \74 cp Statistics.pdf html/Statistics/.; \75 fi -
trunk/doc/Statistics.doxygen
r1093 r1109 1 \documentclass[12pt]{article} 2 3 % $Id$ 4 % 5 % Copyright (C) 2005 Peter Johansson 6 % Copyright (C) 2006 Jari Häkkinen, Markus Ringnér, Peter Johansson 7 % Copyright (C) 2007 Peter Johansson 8 % 9 % This file is part of the yat library, http://trac.thep.lu.se/yat 10 % 11 % The yat library is free software; you can redistribute it and/or 12 % modify it under the terms of the GNU General Public License as 13 % published by the Free Software Foundation; either version 2 of the 14 % License, or (at your option) any later version. 15 % 16 % The yat library is distributed in the hope that it will be useful, 17 % but WITHOUT ANY WARRANTY; without even the implied warranty of 18 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 19 % General Public License for more details. 20 % 21 % You should have received a copy of the GNU General Public License 22 % along with this program; if not, write to the Free Software 23 % Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 24 % 02111-1307, USA. 25 26 27 28 \flushbottom 29 \footskip 54pt 30 \headheight 0pt 31 \headsep 0pt 32 \oddsidemargin 0pt 33 \parindent 0pt 34 \parskip 2ex 35 \textheight 230mm 36 \textwidth 165mm 37 \topmargin 0pt 38 39 \renewcommand{\baselinestretch} {1.0} 40 \renewcommand{\textfraction} {0.1} 41 \renewcommand{\topfraction} {1.0} 42 \renewcommand{\bottomfraction} {1.0} 43 \renewcommand{\floatpagefraction} {1.0} 44 45 \renewcommand{\d}{{\mathrm{d}}} 46 \newcommand{\nd}{$^{\mathrm{nd}}$} 47 \newcommand{\eg}{{\it {e.g.}}} 48 \newcommand{\ie}{{\it {i.e., }}} 49 \newcommand{\etal}{{\it {et al.}}} 50 \newcommand{\eref}[1]{Eq.~(\ref{e:#1})} 51 \newcommand{\fref}[1]{Fig.~\ref{f:#1}} 52 \newcommand{\ovr}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} 53 54 \begin{document} 55 56 \large 57 {\bf Weighted Statistics} 58 \normalsize 59 60 \tableofcontents 61 \clearpage 62 63 \section{Introduction} 1 // $Id$ 2 // 3 // Copyright (C) 2005 Peter Johansson 4 // Copyright (C) 2006 Jari Häkkinen, Markus Ringnér, Peter Johansson 5 // Copyright (C) 2007, 2008 Peter Johansson 6 // 7 // This file is part of the yat library, http://trac.thep.lu.se/yat 8 // 9 // The yat library is free software; you can redistribute it and/or 10 // modify it under the terms of the GNU General Public License as 11 // published by the Free Software Foundation; either version 2 of the 12 // License, or (at your option) any later version. 13 // 14 // The yat library is distributed in the hope that it will be useful, 15 // but WITHOUT ANY WARRANTY; without even the implied warranty of 16 // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 17 // General Public License for more details. 18 // 19 // You should have received a copy of the GNU General Public License 20 // along with this program; if not, write to the Free Software 21 // Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 22 // 02111-1307, USA. 23 24 25 /** 26 \page weighted_statistics Weighted Statistics 27 28 \section Introduction 64 29 There are several different reasons why a statistical analysis needs 65 30 to adjust for weighting. In literature reasons are mainly diveded in … … 94 59 function documentation for what assumtions are made. Though, common 95 60 for implementations are the following: 96 \begin{itemize} 97 \itemSetting all weights to unity yields the same result as the61 62 - Setting all weights to unity yields the same result as the 98 63 non-weighted version. 99 \itemRescaling the weights does not change any function.100 \itemSetting a weight to zero is equivalent to removing the data point.101 \end{itemize} 64 - Rescaling the weights does not change any function. 65 - Setting a weight to zero is equivalent to removing the data point. 66 102 67 An important case is when weights are binary (either 1 or 0). Then we 103 68 get the same result using the weighted version as using the data with … … 106 71 in a proper way. 107 72 108 \section {AveragerWeighted}109 110 111 112 \subsection {Mean}73 \section AveragerWeighted 74 75 76 77 \subsection Mean 113 78 114 79 For any situation the weight is always designed so the weighted mean 115 is calculated as $m=\frac{\sum w_ix_i}{\sum w_i}$, which obviously80 is calculated as \f$ m=\frac{\sum w_ix_i}{\sum w_i} \f$, which obviously 116 81 fulfills the conditions above. 117 82 83 84 118 85 In the case of varying measurement error, it could be motivated that 119 the weight shall be $w_i = 1/\sigma_i^2$. We assume measurement error86 the weight shall be \f$ w_i = 1/\sigma_i^2 \f$. We assume measurement error 120 87 to be Gaussian and the likelihood to get our measurements is 121 $L(m)=\prod122 (2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}} $. We123 maximize the likelihood by taking the derivity with respect to $m$ on124 the logarithm of the likelihood $\frac{d\ln L(m)}{dm}=\sum125 \frac{x_i-m}{\sigma_i^2} $. Hence, the Maximum Likelihood method yields126 the estimator $m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2}$.127 128 129 \subsection {Variance}88 \f$ L(m)=\prod 89 (2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}} \f$. We 90 maximize the likelihood by taking the derivity with respect to \f$ m \f$ on 91 the logarithm of the likelihood \f$ \frac{d\ln L(m)}{dm}=\sum 92 \frac{x_i-m}{\sigma_i^2} \f$. Hence, the Maximum Likelihood method yields 93 the estimator \f$ m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2} \f$. 94 95 96 \subsection Variance 130 97 In case of varying variance, there is no point estimating a variance 131 98 since it is different for each data point. 132 99 133 100 Instead we look at the case when we want to estimate the variance over 134 $f$ but are sampling from $f'$. For the mean of an observable $O$ we 135 have $\widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum 136 w_i}$. Hence, an estimator of the variance of $X$ is 137 \begin{eqnarray} 138 \sigma^2=<X^2>-<X>^2= 139 \\\frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}= 140 \\\frac{\sum w_i(x_i^2-m^2)}{\sum w_i}= 141 \\\frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}= 142 \\\frac{\sum w_i(x_i-m)^2}{\sum w_i} 143 \end{eqnarray} 101 \f$f\f$ but are sampling from \f$ f' \f$. For the mean of an observable \f$ O \f$ we 102 have \f$ \widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum 103 w_i} \f$. Hence, an estimator of the variance of \f$ X \f$ is 104 105 \f$ 106 s^2 = <X^2>-<X>^2= 107 \f$ 108 109 \f$ 110 = \frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}= 111 \f$ 112 113 \f$ 114 = \frac{\sum w_i(x_i^2-m^2)}{\sum w_i}= 115 \f$ 116 117 \f$ 118 = \frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}= 119 \f$ 120 121 \f$ 122 = \frac{\sum w_i(x_i-m)^2}{\sum w_i} 123 \f$ 124 144 125 This estimator fulfills that it is invariant under a rescaling and 145 126 having a weight equal to zero is equivalent to removing the data 146 point. Having all weights equal to unity we get $\sigma=\frac{\sum147 (x_i-m)^2}{N} $, which is the same as returned from Averager. Hence,127 point. Having all weights equal to unity we get \f$ \sigma=\frac{\sum 128 (x_i-m)^2}{N} \f$, which is the same as returned from Averager. Hence, 148 129 this estimator is slightly biased, but still very efficient. 149 130 150 \subsection {Standard Error}131 \subsection standard_error Standard Error 151 132 The standard error squared is equal to the expexted squared error of 152 the estimation of $m$. The squared error consists of two parts, the 153 variance of the estimator and the squared 154 bias. $<m-\mu>^2=<m-<m>+<m>-\mu>^2=<m-<m>>^2+(<m>-\mu)^2$. In the 155 case when weights are included in analysis due to varying measurement 156 errors and the weights can be treated as deterministic ,we have 157 \begin{equation} 133 the estimation of \f$m\f$. The squared error consists of two parts, the 134 variance of the estimator and the squared bias: 135 136 \f$ 137 <m-\mu>^2=<m-<m>+<m>-\mu>^2= 138 \f$ 139 \f$ 140 <m-<m>>^2+(<m>-\mu)^2 141 \f$. 142 143 In the case when weights are included in analysis due to varying 144 measurement errors and the weights can be treated as deterministic, we 145 have 146 147 \f$ 158 148 Var(m)=\frac{\sum w_i^2\sigma_i^2}{\left(\sum w_i\right)^2}= 149 \f$ 150 \f$ 159 151 \frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}= 152 \f$ 153 \f$ 160 154 \frac{\sigma_0^2}{\sum w_i}, 161 \end{equation} 162 where we need to estimate $\sigma_0^2$. Again we have the likelihood 163 $L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{-\frac{w_i(x-m)^2}{2\sigma_0^2}}$ 164 and taking the derivity with respect to $\sigma_o^2$, $\frac{d\ln 165 L}{d\sigma_i^2}=\sum 166 -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}$ which 167 yields an estimator $\sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2$. This 155 \f$ 156 157 where we need to estimate \f$ \sigma_0^2 \f$. Again we have the likelihood 158 159 \f$ 160 L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{(-\frac{w_i(x-m)^2}{2\sigma_0^2})} 161 \f$ 162 and taking the derivity with respect to 163 \f$\sigma_o^2\f$, 164 165 \f$ 166 \frac{d\ln L}{d\sigma_i^2}= 167 \f$ 168 \f$ 169 \sum -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2} 170 \f$ 171 172 which 173 yields an estimator \f$ \sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2 \f$. This 168 174 estimator is not ignoring weights equal to zero, because deviation is 169 175 most often smaller than the expected infinity. Therefore, we modify 170 the expression as follows $\sigma_0^2=\frac{\sum w_i^2}{\left(\sum171 w_i\right)^2}\sum w_i(x-m)^2 $ and we get the following estimator of172 the variance of the mean $\sigma_0^2=\frac{\sum w_i^2}{\left(\sum173 w_i\right)^3}\sum w_i(x-m)^2 $. This estimator fulfills the conditions176 the expression as follows \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum 177 w_i\right)^2}\sum w_i(x-m)^2\f$ and we get the following estimator of 178 the variance of the mean \f$\sigma_0^2=\frac{\sum w_i^2}{\left(\sum 179 w_i\right)^3}\sum w_i(x-m)^2\f$. This estimator fulfills the conditions 174 180 above: adding a weight zero does not change it: rescaling the weights 175 181 does not change it, and setting all weights to unity yields the same … … 178 184 In a case when it is not a good approximation to treat the weights as 179 185 deterministic, there are two ways to get a better estimation. The 180 first one is to linearize the expression $\left<\frac{\sum181 w_ix_i}{\sum w_i}\right> $. The second method when the situation is186 first one is to linearize the expression \f$\left<\frac{\sum 187 w_ix_i}{\sum w_i}\right>\f$. The second method when the situation is 182 188 more complicated is to estimate the standard error using a 183 189 bootstrapping method. 184 190 185 \section {AveragerPairWeighted}186 Here data points come in pairs (x,y). We are sampling from $f'_{XY}$187 but want to measure from $f_{XY}$. To compensate for this decrepency,188 averages of $g(x,y)$ are taken as $\sum \frac{f}{f'}g(x,y)$. Even189 though, $X$ and $Y$ are not independent $(f_{XY}\neq f_Xf_Y)$ we190 assume that we can factorize the ratio and get $\frac{\sum191 w_xw_yg(x,y)}{\sum w_xw_y} $192 \subsection {Covariance}191 \section AveragerPairWeighted 192 Here data points come in pairs (x,y). We are sampling from \f$f'_{XY}\f$ 193 but want to measure from \f$f_{XY}\f$. To compensate for this decrepency, 194 averages of \f$g(x,y)\f$ are taken as \f$\sum \frac{f}{f'}g(x,y)\f$. Even 195 though, \f$X\f$ and \f$Y\f$ are not independent \f$(f_{XY}\neq f_Xf_Y)\f$ we 196 assume that we can factorize the ratio and get \f$\frac{\sum 197 w_xw_yg(x,y)}{\sum w_xw_y}\f$ 198 \subsection Covariance 193 199 Following the variance calculations for AveragerWeighted we have 194 $Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}$ where 195 $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$ 196 197 \subsection{correlation} 198 199 As the mean is estimated as $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$, 200 the variance is estimated as $\sigma_x^2=\frac{\sum 201 w_xw_y(x-m_x)^2}{\sum w_xw_y}$. As in the non-weighted case we define 202 the correlation to be the ratio between the covariance and geometrical 203 average of the variances 204 205 $\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum 206 w_xw_y(y-m_y)^2}}$. 200 \f$Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}\f$ where 201 \f$m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}\f$ 202 203 \subsection Correlation 204 205 As the mean is estimated as 206 \f$ 207 m_x=\frac{\sum w_xw_yx}{\sum w_xw_y} 208 \f$, 209 the variance is estimated as 210 \f$ 211 \sigma_x^2=\frac{\sum w_xw_y(x-m_x)^2}{\sum w_xw_y} 212 \f$. 213 As in the non-weighted case we define the correlation to be the ratio 214 between the covariance and geometrical average of the variances 215 216 \f$ 217 \frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum 218 w_xw_y(y-m_y)^2}} 219 \f$. 220 207 221 208 222 This expression fulfills the following 209 \begin{itemize} 210 \item Having N weights the expression reduces to the non-weighted expression. 211 \item Adding a pair of data, in which one weight is zero is equivalent 223 - Having N equal weights the expression reduces to the non-weighted expression. 224 - Adding a pair of data, in which one weight is zero is equivalent 212 225 to ignoring the data pair. 213 \item Correlation is equal to unity if and only if $x$ is equal to 214 $y$. Otherwise the correlation is between -1 and 1. 215 \end{itemize} 216 \section{Score} 217 218 219 \subsection{Pearson} 220 221 $\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}$. 226 - Correlation is equal to unity if and only if \f$x\f$ is equal to 227 \f$y\f$. Otherwise the correlation is between -1 and 1. 228 229 \section Score 230 231 \subsection Pearson 232 233 \f$\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}\f$. 222 234 223 235 See AveragerPairWeighted correlation. 224 236 225 \subsection {ROC}237 \subsection ROC 226 238 227 239 An interpretation of the ROC curve area is the probability that if we 228 take one sample from class $+$ and one sample from class $-$, what is229 the probability that the sample from class $+$ has greater value. The240 take one sample from class \f$+\f$ and one sample from class \f$-\f$, what is 241 the probability that the sample from class \f$+\f$ has greater value. The 230 242 ROC curve area calculates the ratio of pairs fulfilling this 231 243 232 \ begin{equation}244 \f$ 233 245 \frac{\sum_{\{i,j\}:x^-_i<x^+_j}1}{\sum_{i,j}1}. 234 \ end{equation}246 \f$ 235 247 236 248 An geometrical interpretation is to have a number of squares where 237 249 each square correspond to a pair of samples. The ROC curve follows the 238 border between pairs in which the samples from class $+$ has a greater250 border between pairs in which the samples from class \f$+\f$ has a greater 239 251 value and pairs in which this is not fulfilled. The ROC curve area is 240 252 the area of those latter squares and a natural extension is to weight … … 242 254 area becomes 243 255 244 \ begin{equation}256 \f$ 245 257 \frac{\sum_{\{i,j\}:x^-_i<x^+_j}w^-_iw^+_j}{\sum_{i,j}w^-_iw^+_j} 246 \ end{equation}258 \f$ 247 259 248 260 This expression is invariant under a rescaling of weight. Adding a … … 250 262 all weight equal to unity yields the non-weighted ROC curve area. 251 263 252 \subsection {tScore}253 254 Assume that $x$ and $y$ originate from the same distribution255 $N(\mu,\sigma_i^2)$ where $\sigma_i^2=\frac{\sigma_0^2}{w_i}$. We then256 estimate $\sigma_0^2$ as257 \ begin{equation}264 \subsection tScore 265 266 Assume that \f$x\f$ and \f$y\f$ originate from the same distribution 267 \f$N(\mu,\sigma_i^2)\f$ where \f$\sigma_i^2=\frac{\sigma_0^2}{w_i}\f$. We then 268 estimate \f$\sigma_0^2\f$ as 269 \f$ 258 270 \frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2} 259 271 {\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+ 260 272 \frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2} 261 \ end{equation}273 \f$ 262 274 The variance of difference of the means becomes 263 \ begin{eqnarray}275 \f$ 264 276 Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum 265 277 w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}= 266 278 \frac{\sigma_0^2}{\sum w_i}+\frac{\sigma_0^2}{\sum w_i}, 267 \ end{eqnarray}279 \f$ 268 280 and consequently the t-score becomes 269 \ begin{equation}281 \f$ 270 282 \frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2} 271 283 {\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+ 272 284 \frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2} 273 285 \left(\frac{1}{\sum w_i}+\frac{1}{\sum w_i}\right), 274 \ end{equation}275 276 For a $w_i=w$ we this expression get condensed down to277 \ begin{equation}286 \f$ 287 288 For a \f$w_i=w\f$ we this expression get condensed down to 289 \f$ 278 290 \frac{w\sum (x-m_x)^2+w\sum (y-m_y)^2} 279 291 {n_x+n_y-2} 280 292 \left(\frac{1}{wn_x}+\frac{1}{wn_y}\right), 281 \ end{equation}293 \f$ 282 294 in other words the good old expression as for non-weighted. 283 295 284 \subsection {FoldChange}296 \subsection FoldChange 285 297 Fold-Change is simply the difference between the weighted mean of the 286 two groups //$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}$287 288 \subsection {WilcoxonFoldChange}289 Taking all pair samples (one from class $+$ and one from class $-$)298 two groups \f$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}\f$ 299 300 \subsection WilcoxonFoldChange 301 Taking all pair samples (one from class \f$+\f$ and one from class \f$-\f$) 290 302 and calculating the weighted median of the distances. 291 303 292 \section {Kernel}293 \subsection {Polynomial Kernel}294 The polynomial kernel of degree $N$ is defined as $(1+<x,y>)^N$, where295 $<x,y>$ is the linear kernel (usual scalar product). For the weighted296 case we define the linear kernel to be $<x,y>=\sum {w_xw_yxy}$ and the304 \section Kernel 305 \subsection Polynomial Kernel 306 The polynomial kernel of degree \f$N\f$ is defined as \f$(1+<x,y>)^N\f$, where 307 \f$<x,y>\f$ is the linear kernel (usual scalar product). For the weighted 308 case we define the linear kernel to be \f$<x,y>=\sum {w_xw_yxy}\f$ and the 297 309 polynomial kernel can be calculated as before 298 $(1+<x,y>)^N$. Is this kernel a proper kernel (always being semi299 positive definite). Yes, because $<x,y>$ is obviously a proper kernel310 \f$(1+<x,y>)^N\f$. Is this kernel a proper kernel (always being semi 311 positive definite). Yes, because \f$<x,y>\f$ is obviously a proper kernel 300 312 as it is a scalar product. Adding a positive constant to a kernel 301 yields another kernel so $1+<x,y>$ is still a proper kernel. Then also302 $(1+<x,y>)^N$ is a proper kernel because taking a proper kernel to the303 $Nth$ power yields a new proper kernel (see any good book on SVM).313 yields another kernel so \f$1+<x,y>\f$ is still a proper kernel. Then also 314 \f$(1+<x,y>)^N\f$ is a proper kernel because taking a proper kernel to the 315 \f$Nth\f$ power yields a new proper kernel (see any good book on SVM). 304 316 \subsection{Gaussian Kernel} 305 We define the weighted Gaussian kernel as $\exp\left(-\frac{\sum306 w_xw_y(x-y)^2}{\sum w_xw_y}\right) $, which fulfills the conditions317 We define the weighted Gaussian kernel as \f$\exp\left(-\frac{\sum 318 w_xw_y(x-y)^2}{\sum w_xw_y}\right)\f$, which fulfills the conditions 307 319 listed in the introduction. 308 320 309 321 Is this kernel a proper kernel? Yes, following the proof of the 310 non-weighted kernel we see that $K=\exp\left(-\frac{\sum322 non-weighted kernel we see that \f$K=\exp\left(-\frac{\sum 311 323 w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum 312 w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right) $,313 which is a product of two proper kernels. $\exp\left(-\frac{\sum324 w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)\f$, 325 which is a product of two proper kernels. \f$\exp\left(-\frac{\sum 314 326 w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum 315 w_xw_y}\right) $ is a proper kernel, because it is a scalar product and316 $\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$ is a proper327 w_xw_y}\right)\f$ is a proper kernel, because it is a scalar product and 328 \f$\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)\f$ is a proper 317 329 kernel, because it a polynomial of the linear kernel with positive 318 330 coefficients. As product of two kernel also is a kernel, the Gaussian 319 331 kernel is a proper kernel. 320 332 321 \section {Distance}322 323 \section {Regression}324 \subsection {Naive}325 \subsection {Linear}333 \section Distance 334 335 \section Regression 336 \subsection Naive 337 \subsection Linear 326 338 We have the model 327 339 328 \ begin{equation}340 \f$ 329 341 y_i=\alpha+\beta (x-m_x)+\epsilon_i, 330 \ end{equation}331 332 where $\epsilon_i$ is the noise. The variance of the noise is342 \f$ 343 344 where \f$\epsilon_i\f$ is the noise. The variance of the noise is 333 345 inversely proportional to the weight, 334 $Var(\epsilon_i)=\frac{\sigma^2}{w_i}$. In order to determine the346 \f$Var(\epsilon_i)=\frac{\sigma^2}{w_i}\f$. In order to determine the 335 347 model parameters, we minimimize the sum of quadratic errors. 336 348 337 \ begin{equation}349 \f$ 338 350 Q_0 = \sum \epsilon_i^2 339 \ end{equation}340 341 Taking the derivity with respect to $\alpha$ and $\beta$ yields two conditions342 343 \ begin{equation}351 \f$ 352 353 Taking the derivity with respect to \f$\alpha\f$ and \f$\beta\f$ yields two conditions 354 355 \f$ 344 356 \frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha - 345 357 \beta (x_i-m_x)=0 346 \ end{equation}358 \f$ 347 359 348 360 and 349 361 350 \ begin{equation}\frac{\partial Q_0}{\partial \beta} = -2 \sum362 \f$ \frac{\partial Q_0}{\partial \beta} = -2 \sum 351 363 w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0 352 \ end{equation}364 \f$ 353 365 354 366 or equivalently 355 367 356 \ begin{equation}368 \f$ 357 369 \alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y 358 \ end{equation}370 \f$ 359 371 360 372 and 361 373 362 \ begin{equation}\beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum374 \f$ \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum 363 375 w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)} 364 \ end{equation}376 \f$ 365 377 366 378 Note, by having all weights equal we get back the unweighted 367 379 case. Furthermore, we calculate the variance of the estimators of 368 $\alpha$ and $\beta$.369 370 \ begin{equation}380 \f$\alpha\f$ and \f$\beta\f$. 381 382 \f$ 371 383 \textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}= 372 384 \frac{\sigma^2}{\sum w_i} 373 \ end{equation}385 \f$ 374 386 375 387 and 376 \ begin{equation}388 \f$ 377 389 \textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}} 378 390 {(\sum w_i(x_i-m_x)^2)^2}= 379 391 \frac{\sigma^2}{\sum w_i(x_i-m_x)^2} 380 \ end{equation}381 382 Finally, we estimate the level of noise, $\sigma^2$. Inspired by the392 \f$ 393 394 Finally, we estimate the level of noise, \f$\sigma^2\f$. Inspired by the 383 395 unweighted estimation 384 396 385 \ begin{equation}397 \f$ 386 398 s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2} 387 \ end{equation}399 \f$ 388 400 389 401 we suggest the following estimator 390 402 391 \begin{equation} s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum 392 w_i-2\frac{\sum w_i^2}{\sum w_i}} \end{equation} 393 394 \section{Outlook} 395 \subsection{Hierarchical clustering} 396 A hierarchical clustering consists of two things: finding the two 397 closest data points, merge these two data points two a new data point 398 and calculate the new distances from this point to all other points. 399 400 In the first item, we need a distance matrix, and if we use Euclidean 401 distanses the natural modification of the expression would be 402 403 \begin{equation} 404 d(x,y)=\frac{\sum w_i^xw_j^y(x_i-y_i)^2}{\sum w_i^xw_j^y} 405 \end{equation} 406 407 For the second item, inspired by average linkage, we suggest 408 409 \begin{equation} 410 d(xy,z)=\frac{\sum w_i^xw_j^z(x_i-z_i)^2+\sum 411 w_i^yw_j^z(y_i-z_i)^2}{\sum w_i^xw_j^z+\sum w_i^yw_j^z} 412 \end{equation} 413 414 to be the distance between the new merged point $xy$ and $z$, and we 415 also calculate new weights for this point: $w^{xy}_i=w^x_i+w^y_i$ 416 417 \end{document} 418 419 420 403 \f$ s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum 404 w_i-2\frac{\sum w_i^2}{\sum w_i}} \f$ 405 406 */ 407 408 409 -
trunk/doc/doxygen.config.in
r1070 r1109 350 350 # with spaces. 351 351 352 INPUT = first_page.doxygen namespaces.doxygen concepts.doxygen ../yat352 INPUT = first_page.doxygen namespaces.doxygen concepts.doxygen Statistics.doxygen ../yat 353 353 354 354 # If the value of the INPUT tag contains directories, you can use the … … 874 874 DOTFILE_DIRS = 875 875 876 # The MAX_DOT_GRAPH_WIDTH tag can be used to set the maximum allowed width877 # (in pixels) of the graphs generated by dot. If a graph becomes larger than878 # this value, doxygen will try to truncate the graph, so that it fits within879 # the specified constraint. Beware that most browsers cannot cope with very880 # large images.881 882 MAX_DOT_GRAPH_WIDTH = 1024883 884 # The MAX_DOT_GRAPH_HEIGHT tag can be used to set the maximum allows height885 # (in pixels) of the graphs generated by dot. If a graph becomes larger than886 # this value, doxygen will try to truncate the graph, so that it fits within887 # the specified constraint. Beware that most browsers cannot cope with very888 # large images.889 890 MAX_DOT_GRAPH_HEIGHT = 1024891 892 876 # If the GENERATE_LEGEND tag is set to YES (the default) Doxygen will 893 877 # generate a legend page explaining the meaning of the various boxes and -
trunk/doc/first_page.doxygen
r1000 r1109 36 36 href="namespacemembers.html">Namespace Members</a> link above. 37 37 38 There is a document on the weighted statistics included in the39 package with underlying theory and more detailed motivations [ <a40 href="Statistics/index.html">html</a> | <a41 href="Statistics/Statistics.pdf">pdf</a> ].42 43 38 <b>Future development</b><br> 44 39 We use trac as issue tracking system. Through the <a
Note: See TracChangeset
for help on using the changeset viewer.