Changeset 492 for trunk/doc

Jan 9, 2006, 10:38:12 AM (16 years ago)

updated Weighted Statistics document and reimplemented few stuff. NOTE, this revision may not compile. Cannot test compilation on Le Mac.

1 edited


  • trunk/doc/Statistics.tex

    r489 r492  
    2 %\documentstyle[slidesonly]{seminar}
    4 % $Id: error.tex,v 1.11 2004/10/01 09:08:39 peter Exp $
    6 %\input{psfig}
    8 % --- This bit puts ``draft'' over everything! ---
    9 %\special{!userdict begin /bop-hook{gsave 200 30 translate
    10 %65 rotate /Times-Roman findfont 216 scalefont setfont
    11 %0 0 moveto 0.93 setgray (DRAFT) show grestore}def end}
    12 %  --- end of this bit that puts `draft' over everything ---
    68 \section{WeightedAverager}
    70 Building an estimator $T$ of the parameter $\Theta$ there
    71 is a number of criterias that is welcome to be fullfilled.
     54There are several different reasons why a statistical analysis needs to adjust for weighting. In literature reasons are mainly diveded in to groups.
     56The first group is when some of the measurements are known to be more precise than others. The more precise a measuremtns is the larger weight it is given. The simplest case is when the weight are given before the measurements and they can be treated as deterministic. It becomes more complicated when the weight can be determined not until afterwards, and even more complicated if the weight depends on the value of the observable.
     58The second group of situations is when calculating averages over one distribution and sampling from another distribution. Compensating for this discrepency weights are introduced to the analysis. A simple example may be that we are interviewing people but for economical reasons we choose to interview more people from the city than from the countryside. When summarizing the statistics the answers from the city are given a smaller weight. In this example we are choosing the proportions of people from countryside and people from city being intervied. Hence, we can determine the weights before and consider them to be deterministic. In other situations the proportions are not deterministic, but rather a result from the sampling and the weights must be treated as stochastic and only in rare situations the weights can be treated as independent of the observable.
     60Since there are various origin for a weight occuring in a statistical analysis, there are various way to treat the weights and in general the analysis should be tailored to treat the weights correctly. We have not chosen one situation for our implementations, so see specific function documentation for what assumtions are made. Though, common for implementationare the following:
    74 \item The bias $b=<T> - \Theta$ should be zero. The
    75 estimator is unbiased.
    77 \item The estimator is efficient if the mean squared error
    78 $<(T-\Theta)^2>$ is small. If the estimator is unbiased the
    79 mean squared error is equal to the variance
    80 $<(T-<T>)^2>$ of the estimator.
    82 \item The estimator is consistent if the mean squared error goes to
    83 zero in the limit of infinite number of data points.
    85 \item adding a data point with weight zero should not change the
    86 estimator.
     62\item Setting all weights to unity yields the same result as the non-weighted version.
     63\item Rescaling the weights does not change any function.
     64\item Setting a weight to zero is equivalent to removing the data point.
    90 We will use these criterias to find the estimator. We will minimize
    91 the variance with the constraint that the estimator must be
    92 unbiased. Also, we check that the estimator is consistent and handles
    93 zero weight in a desired way.
     66An important case is when weights are binary (either 1 or 0). Then we get same result using the weighted version as using the data with weight not equal to zero and the non-weighted version. Hence, using binary weights and the weighted version missing values can be treated in a proper way.
    97 We start building an estimator of the mean in the case with varying
    98 variance.
    100 The likelihood is
    102 \beq
    103 L(m)=\prod (2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}},
    104 \eeq
    106 which we want to maximize, but can equally well minimize
    108 \beq
    109 -\ln L(m) = \sum \frac{1}{2}\ln2\pi\sigma_i^2+\frac{(x_i-m)^2}{2\sigma_i^2}.
    110 \eeq
    112 Taking the derivity yields
    114 \beq
    115 \frac{d-\ln L(m)}{dm}=\sum - \frac{x_i-m}{\sigma_i^2}
    116 \eeq
    118 Hence, the Maximum Likelihood method yields the estimator
    120 \beq
    121 m=\frac{\sum x_i/\sigma_i^2}{\sum 1/\sigma_i^2}
    122 \eeq
    124 Let us check the criterias defined above.
    125 First, we want the estimator to be unbiased.
    127 \beq
    128 b=<m>-\mu=\frac{\sum <x_i>/\sigma_i^2}{\sum 1/\sigma_i^2}-\mu=\frac{\sum \mu/\sigma_i^2}{\sum 1/\sigma_i^2}-\mu=0,
    129 \eeq
    131 so the estimator is unbiased.
    133 Second, we examine how efficient the estimator is, \ie how small the
    134 variance is.
    136 \beq
    137 V(m)=\frac{\sum V(x_i)/\sigma_i^2}{\left(\sum 1/\sigma_i^2\right)^2}=\frac{1}{\sum 1/\sigma_i^2},
    138 \eeq
    140 which obviously goes to zero when number of samples (with finite
    141 $\sigma$) goes to infinity. The estimator is consistent.
    143 Trivially we can see that a zero weight data point does not change the
    144 estimator, which was our last condition.
     74For any situation the weight is always designed so the weighted mean is calculated as $m=\frac{\sum w_ix_i}{\sum w_i}$, which obviously fulfills the conditions above.
     76In the case of varying measurement error, it could be motivated that the weight shall be $w_i = 1/\sigma_i^2$. We assume measurement error to be Gaussian and the likelihood to get our measurements is
     77$L(m)=\prod (2\pi\sigma_i^2)^{-1/2}e^{-\frac{(x_i-m)^2}{2\sigma_i^2}}$.
     78We maximize the likelihood by taking the derivity with respect to $m$ on the logarithm of the likelihood
     79$\frac{d\ln L(m)}{dm}=\sum \frac{x_i-m}{\sigma_i^2}$. Hence, the Maximum Likelihood method yields the estimator
     80$m=\frac{\sum w_i/\sigma_i^2}{\sum 1/\sigma_i^2}$.
    147 Let us now examine the case when we do not know the variance, but only
    148 the weight $w_i$ that is proportional to the inverse
    149 of the variance $\sigma_i^2$
    151 \beq
    152 w_i=\frac{\kappa}{\sigma_i^2},
    153 \eeq
    155 and the variance is
    157 \beq
    158 V(m)=\frac{1}{\sum 1/\sigma_i^2}=\frac{\kappa}{\sum w_i}
    159 \eeq
    161 so we need to estimate $\kappa$. The likelihood is now
    163 \beq
    164 L(k)=P(k)\prod \frac{\sqrt{w_i}} {\sqrt{2\pi k}}e^{-\frac{w_i(x_i-m)^2}{2k}},
    165 \eeq
    167 where $P(k)$ is the prior probabilty distribution. If we have no prior knowledge
    168 about $k$, $P(k)$ is constant.
    170 Taking the derivity of the logarithm again yields
    172 \beq
    173 \frac{d-\ln L(k)}{dk}=-\frac{d\ln P(k)}{dk}+\sum \left(\frac{1}{2k}-\frac{w_i(x_i-m)^2}{2k^2}\right)
     84In case of varying variance, there is no point estimating a variance since it is different for each data point.
     86Instead we look at the case when we want to estimate the variance over $f$ but are sampling from $f'$. For the mean of an observable $O$ we have
     87$\widehat O=\sum\frac{f}{f'}O_i=\frac{\sum w_iO_i}{\sum w_i}$. Hence, an estimator of the variance of $X$ is
     90\\\frac{\sum w_ix_i^2}{\sum w_i}-\frac{(\sum w_ix_i)^2}{(\sum w_i)^2}=
     91\\\frac{\sum w_i(x_i^2-m^2)}{\sum w_i}
     92\\\frac{\sum w_i(x_i^2-2mx_i+m^2)}{\sum w_i}
     93\\\frac{\sum w_i(x_i-m)^2}{\sum w_i}
     95This estimator fulfills that it is invariant under a rescaling and having a weight equal to zero is equivalent to removing the data point. Having all weight equal to unity we get $\sigma=\frac{\sum (x_i-m)^2}{N}$, which is the same as returned from Averager. Hence, this estimator is slightly biased, but still very efficient.
     97\subsection{Standard Error}
     98The standard error squared is equal to the expexted squared error of the estimation of $m$. The squared error consists of two parts, the variance of the estimator and the squared bias. $<m-\mu>^2=<m-<m>+<m>-\mu>^2=<m-<m>>^2+(<m>-\mu)^2$.
     99In the case when weights are included in analysis due to varying measurement errors and the weights can be treated as deterministic ,we have
     101Var(m)=\frac{\sum w_i^2\sigma_i^2}{\left(\sum w_i\right)^2}=
     102\\\frac{\sum w_i^2\frac{\sigma_0^2}{w_i}}{\left(\sum w_i\right)^2}
     103\frac{\sigma_0^2}{\sum w_i},
     105where we need to estimate $\sigma_0^2$. Again we have the likelihood
     106$L(\sigma_0^2)=\prod\frac{1}{\sqrt{2\pi\sigma_0^2/w_i}}\exp{-\frac{w_i(x-m)^2}{2\sigma_0^2}}$ and taking the derivity with respect to $\sigma_o^2$,
     107$\frac{d\ln L}{d\sigma_i^2}=\sum -\frac{1}{2\sigma_0^2}+\frac{w_i(x-m)^2}{2\sigma_0^2\sigma_o^2}$
     108which yields an estimator $\sigma_0^2=\frac{1}{N}\sum w_i(x-m)^2$. This estimator is not ignoring weights equal to zero, because deviation is most often smaller than the expected infinity. Therefore, we modify the expression as follows $\sigma_i^2=\frac{\sum w_i^2}{\left(\sum w_i\right)^2}\sum w_i(x-m)^2$ and we get the following estimator of the variance of the mean
     109 $\sigma_i^2=\frac{\sum w_i^2}{\left(\sum w_i\right)^3}\sum w_i(x-m)^2$. This estimator fulfills the conditions above: adding a weight zero does not change it: rescaling the weights does not change it, and setting all weights to unity yields the same expression as in the non-weighted case.
     111In a case when it is not a good approximation to treat the weights as deterministic, there are two ways to get a better estimation. The first one is to linearize the expression $\left<\frac{\sum w_ix_i}{\sum w_i}\right>$. The second method when the situation is more complicated is to estimate the standard error using a bootstrapping method. 
     114Here data points come in pairs (x,y). We are sampling from $f'_{XY}$ but want to measure from $f_{XY}$. To compensate for this decrepency, averages of $g(x,y)$ are taken as $\sum \frac{f}{f'}g(x,y)$. Even though, $X$ and $Y$ are not independent $(f_{XY}\neq f_Xf_Y)$ we assume that we can factorize the ratio and get $\frac{\sum w_xw_yg(x,y)}{\sum w_xw_y}$ 
     116Following the variance calculations for AveragerWeighted we have $Cov=\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sum w_xw_y}$ where $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$
     120As the mean is estimated as $m_x=\frac{\sum w_xw_yx}{\sum w_xw_y}$, the variance is estimated as
     121$\sigma_x^2=\frac{\sum w_xw_y(x-m_x)^2}{\sum w_xw_y}$. As in the non-weighted case we define the correlation to be the ratio between the covariance and geomtrical avergae of the variances 
     123$\frac{\sum w_xw_y(x-m_x)(y-m_y)}{\sqrt{\sum w_xw_y(x-m_x)^2\sum w_xw_y(y-m_y)^2}}$.
     125This expression fulfills the following
     127\item Having N weights the expression reduces to the non-weighted expression.
     128\item Adding a pair of data, in which one weight is zero is equivalent to ignoring the data pair.
     129\item Correlation is equal to unity if and only if $x$ is equal to $y$. Otherwise the correlation is between -1 and 1.
     136$\frac{\sum w(x-m_x)(y-m_y)}{\sqrt{\sum w(x-m_x)^2\sum w(y-m_y)^2}}$.
     138See AveragerPairWeighted correlation.
     142An interpretation of the ROC curve area is the probability that if we take one sample from class $+$ and one sample from class $-$, what is the probability that the sample from class $+$ has greater value. The ROC curve area calculates the ratio of pairs fulfilling this
     148An geometrical interpretation is to have a number of squares where each square correspond to a pair of samples. The ROC curve follows the border between pairs in which the samples from class $+$ has a greater value and pairs in which this is not fulfilled. The ROC curve area is the area of those latter squares and a natural extension is to weight each pair with its two weights and consequently the weighted ROC curve area becomes
     154This expression is invariant under a rescaling of weight. Adding a data value with weight zero adds nothing to the exprssion, and having all weight equal to unity yields the non-weighted ROC curve area.
     158Assume that $x$ and $y$ originate from the same distribution $N(\mu,\sigma_i^2)$ where $\sigma_i^2=\frac{\sigma_0^2}{w_i}$. We then estimate $\sigma_0^2$ as
     160\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
     161{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+
     162\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2}
     164The variance of difference of the means becomes
     166Var(m_x)+Var(m_y)=\\\frac{\sum w_i^2Var(x_i)}{\left(\sum w_i\right)^2}+\frac{\sum w_i^2Var(y_i)}{\left(\sum w_i\right)^2}=
     167\frac{\sigma_0^2}{\sum w_i}+\frac{\sigma_0^2}{\sum w_i},
     169and consequently the t-score becomes
     171\frac{\sum w(x-m_x)^2+\sum w(y-m_y)^2}
     172{\frac{\left(\sum w_x\right)^2}{\sum w_x^2}+
     173\frac{\left(\sum w_y\right)^2}{\sum w_y^2}-2}
     174\left(\frac{1}{\sum w_i}+\frac{1}{\sum w_i}\right),
     177For a $w_i=w$ we this expression get condensed down to
     179\frac{w\sum (x-m_x)^2+w\sum (y-m_y)^2}
     183in other words the good old expression as for non-weighted.
     186Fold-Change is simply the difference between the weighted mean of the two groups //$\frac{\sum w_xx}{\sum w_x}-\frac{\sum w_yy}{\sum w_y}$
     189Taking all pair samples (one from class $+$ and one from class $-$) and calculating the weighted median of the distances.
     192\subsection{Polynomial Kernel}
     193The polynomial kernel of degree $N$ is defined as $(1+<x,y>)^N$, where $<x,y>$ is the linear kenrel (usual scalar product). For weights we define the linear kernel to be $<x,y>=\frac{\sum w_xw_yxy}{\sum w_xw_y}$ and the polynomial kernel can be calculated as before $(1+<x,y>)^N$. Is this kernel a proper kernel (always being semi positive definite). Yes, because $<x,y>$ is obviously a proper kernel as it is a scalar product. Adding a positive constant to a kernel yields another kernel so $1+<x,y>$ is still a proper kernel. Then also $(1+<x,y>)^N$ is a proper kernel because taking a proper kernel to the $Nth$ power yields a new proper kernel (see any good book on SVM).
     194\subsection{Gaussian Kernel}
     195We define the weighted Gaussian kernel as
     196$\exp\left(-\frac{\sum w_xw_y(x-y)^2}{\sum w_xw_y}\right)$, which fulfills the conditions listed in the introduction.
     198Is this kernel a proper kernel? Yes, following the proof of the non-weighted kernel we see that $K=\exp\left(-\frac{\sum w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum w_xw_y}\right)\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$, which is a product of two proper kernels. $\exp\left(-\frac{\sum w_xw_yx^2}{\sum w_xw_y}\right)\exp\left(-\frac{\sum w_xw_yy^2}{\sum w_xw_y}\right)$ is a proper kernel, because it is a scalar product and $\exp\left(\frac{\sum w_xw_yxy}{\sum w_xw_y}\right)$ is a proper kernel, because it a polynomial of the linear kernel with positive coefficients. As product of two kernel also is a kernel, the Gaussian kernel is a proper kernel.
     205We have the model
     208y_i=\alpha+\beta (x-m_x)+\epsilon_i,
     211where $\epsilon_i$ is the noise. The variance of the noise is
     212inversely proportional to the weight,
     213$Var(\epsilon_i)=\frac{\sigma^2}{w_i}$. In order to determine the
     214model parameters, we minimimize the sum of quadratic errors.
     217Q_0 = \sum \epsilon_i^2
     220Taking the derivity with respect to $\alpha$ and $\beta$ yields two conditions
     223\frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha - \beta (x_i-m_x)=0
     229\frac{\partial Q_0}{\partial \beta} = -2 \sum w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0
    179 k=\frac{1}{N}\sum w_i(x_i-m)^2+2k^2\frac{d\ln P(k)}{dk}
    180 \eeq
    182 In principle, any prior probabilty distribution $P(k)$ could be
    183 used. Here we, for simplicity, focus on the one where the last term
    184 becomes a constant, namely
    186 \beq
    187 P(k)=\exp(-\lambda/k).
    188 \eeq
    190 One problem with this choice is that we have to truncate the
    191 distribution in order to normalize it.
    193 The estimator $k$ becomes
    195 \beq
    196 k=\frac{1}{N}\sum w_i(x_i-m)^2+A,
    197 \eeq
    199 where $A$ is constant (depending on $\lambda$ and the truncation point).
    201 Having an estimation of $\kappa$ we can calculate the variance of $m$
    203 \beq
    204 V(m)=\frac{\frac{1}{N}\sum w_i(x_i-m)^2+A}{\sum w_i}=\frac{\frac{1}{N}\sum (x_i-m)^2/\sigma_i^2}{\sum 1/\sigma_i^2}+\frac{A}{\sum 1/\sigma_i^2}
    205 \eeq
    207 Let us now look at estimation of $\kappa$. Is the criterias above
    208 fullfilled? We start looking at the bias
    210 \beq
    211 b=<k>-\kappa=\frac{1}{N}\sum w_i\left<(x_i-m)^2\right>-\kappa
    212 \eeq
    214 Let us look at
    216 \bea
    217 \left<(x_i-m)^2\right>=
    218 \\\left<(x_i-\mu)^2+(m-\mu)^2-2(x_i-\mu)(m-\mu)\right>
    219 \\V(x_i)+V(m)-2\left<(x_i-\mu)(m-\mu)\right>
    220 \\V(x_i)+V(m)-2\left<(x_i-\mu)(\frac{\sum_j x_j/\sigma_j^2}{\sum_k 1/\sigma_k^2}-\mu)\right>
    221 \\V(x_i)+V(m)-2\left<(\frac{(x_i-\mu)^2/\sigma_j^2}{\sum_k 1/\sigma_k^2})\right>
    222 \\V(x_i)+V(m)-2\frac{1}{\sum_k 1/\sigma_k^2}
    223 \\\sigma_i^2-\frac{1}{\sum_k 1/\sigma_k^2}
    224 \eea
    226 so the bias is
    228 \bea
    229 b=<k>-\kappa=
    230 \\\frac{1}{N}\sum_i (w_i\sigma_i^2-\frac{w_i}{\sum_k 1/\sigma_k^2})-\kappa=
    231 \\\frac{\kappa}{N}\sum_i (1-\frac{w_i}{\sum_k w_k})-\kappa=
    232 \\\frac{\kappa}{N}(N-1)-\kappa
    233 \eea
    235 so the estimator is asymptotically unbiased. If we want the estimation
    236 to be unbiased we could \eg modify $N$ to $N-1$ , exactly as in the
    237 unweighted case, and we get the following estimator of $\kappa$
    239 \beq
    240 k=\frac{1}{N-1}\sum w_i(x_i-m)^2
    241 \eeq
    243 One problem with this estimator is that it is sensitive to weight zero
    244 samples due to the $N$. To solve that we have to express $N$ using
    245 $w$, wich will make the estimator biased. We suggest the substitution
    247 \beq
    248 N\rightarrow\frac{(\sum w_i)^2}{\sum w_i^2}
    249 \eeq
    251 so the estimator finally becomes
    253 \beq
    254 k=\frac{\sum w_i^2}{(\sum w_i)^2-\sum w_i^2}\sum w_i(x_i-m)^2
    255 \eeq
    257 and the variance is
    259 \beq
    260 V(m)=\frac{\sum w_i^2}{(\sum w_i)^2-\sum w_i^2}\frac{\sum w_i(x_i-m)^2}{\sum w_i}+\frac{A}{\sum 1/\sigma_i^2}
    261 \eeq
    264 \section{Score}
    265 \subsection{Pearson}
    267 Pearson correlation is defined as:
    268 \beq
    269 \frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2\sum_i (x_i-\bar{x})^2}}.
    270 \eeq
    271 The weighted version should satisfy the following conditions:
    273 \begin{itemize}
    274 \item Having N equal weights the expression reduces to the unweighted case.
    275 \item Adding a pair of data where one the weight is zero does not change the expression.
    276 \item When $x$ and $y$ are identical, the correlation is one.
    277 \end{itemize}
    279 Therefore we define the weighted correlation to be
    280 \beq
    281 \frac{\sum_iw_i^xw_i^y(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_iw_i^xw_i^y(x_i-\bar{x})^2\sum_iw_i^xw_i^y(x_i-\bar{x})^2}},
    282 \eeq
    283 where
    284 \beq
    285 \bar{x}=\frac{\sum_i w^x_iw^y_ix_i}{\sum_i w^x_iw^y_i}
    286 \eeq
     235\alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y
    288 \beq
    289 \bar{y}=\frac{\sum_i w^x_iw^y_iy_i}{\sum_i w^x_iw^y_i}.
    290 \eeq
    292 \subsection{ROC}
    293 If we have a set of values $x^+$ from class + and a set of values
    294 $x^-$ from class -, the ROC curve area is equal to
    296 \beq
    297 \frac{\sum_{\{i,j\}:x^-_i<x^+_j}1}{\sum_{i,j}1}
    298 \eeq
    300 so a natural extension using weights could be
    302 \beq
    303 \frac{\sum_{\{i,j\}:x^-_i<x^+_j}w^-_iw^+_j}{\sum_{i,j}w^-_iw^+_j}
    304 \eeq
    306 \section{Hierarchical clustering}
     241\beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum w_i(x_i-m_x)^2}=\frac{Cov(x,y)}{Var(x)}
     244Note, by having all weights equal we get back the unweighted
     245case. Furthermore, we calculate the variance of the estimators of
     246$\alpha$ and $\beta$.
     249\textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}=
     250\frac{\sigma^2}{\sum w_i}
     255\textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}}
     256{(\sum w_i(x_i-m_x)^2)^2}=
     257\frac{\sigma^2}{\sum w_i(x_i-m_x)^2}
     260Finally, we estimate the level of noise, $\sigma^2$. Inspired by the
     261unweighted estimation
     264s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2}
     267we suggest the following estimator
     270s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum w_i-2\frac{\sum w_i^2}{\sum w_i}}
     274\subsection{Hierarchical clustering}
    308276A hierarchical clustering consists of two things: finding the two
    325293also calculate new weights for this point: $w^{xy}_i=w^x_i+w^y_i$
    327 \section{Regression}
    328 We have the model
    330 \beq
    331 y_i=\alpha+\beta (x-m_x)+\epsilon_i,
    332 \eeq
    334 where $\epsilon_i$ is the noise. The variance of the noise is
    335 inversely proportional to the weight,
    336 $Var(\epsilon_i)=\frac{\sigma^2}{w_i}$. In order to determine the
    337 model parameters, we minimimize the sum of quadratic errors.
    339 \beq
    340 Q_0 = \sum \epsilon_i^2
    341 \eeq
    343 Taking the derivity with respect to $\alpha$ and $\beta$ yields two conditions
    345 \beq
    346 \frac{\partial Q_0}{\partial \alpha} = -2 \sum w_i(y_i - \alpha - \beta (x_i-m_x)=0
    347 \eeq
    349 and
    351 \beq
    352 \frac{\partial Q_0}{\partial \beta} = -2 \sum w_i(x_i-m_x)(y_i-\alpha-\beta(x_i-m_x)=0
    353 \eeq
    355 or equivalently
    357 \beq
    358 \alpha = \frac{\sum w_iy_i}{\sum w_i}=m_y
    359 \eeq
    361 and
    363 \beq
    364 \beta=\frac{\sum w_i(x_i-m_x)(y-m_y)}{\sum w_i(x_i-m_x)^2}
    365 \eeq
    367 Note, by having all weights equal we get back the unweighted
    368 case. Furthermore, we calculate the variance of the estimators of
    369 $\alpha$ and $\beta$.
    371 \beq
    372 \textrm{Var}(\alpha )=\frac{w_i^2\frac{\sigma^2}{w_i}}{(\sum w_i)^2}=
    373 \frac{\sigma^2}{\sum w_i}
    374 \eeq
    376 and
    377 \beq
    378 \textrm{Var}(\beta )= \frac{w_i^2(x_i-m_x)^2\frac{\sigma^2}{w_i}}
    379 {(\sum w_i(x_i-m_x)^2)^2}=
    380 \frac{\sigma^2}{\sum w_i(x_i-m_x)^2}
    381 \eeq
    383 Finally, we estimate the level of noise, $\sigma^2$. Inspired by the
    384 unweighted estimation
    386 \beq
    387 s^2=\frac{\sum (y_i-\alpha-\beta (x_i-m_x))^2}{n-2}
    388 \eeq
    390 we suggest the following estimator
    392 \beq
    393 s^2=\frac{\sum w_i(y_i-\alpha-\beta (x_i-m_x))^2}{\sum w_i-2\frac{\sum w_i^2}{\sum w_i}}
    394 \eeq
Note: See TracChangeset for help on using the changeset viewer.