yat: Ticket Query
https://dev.thep.lu.se/yat/query?status=!closed&type=enhancement&order=priority
yat used by Jari and Peteren-USyat/yat/chrome/site/images/yat.png
https://dev.thep.lu.se/yat/query?status=!closed&type=enhancement&order=priority
Trac 1.2.3
https://dev.thep.lu.se/yat/ticket/215
https://dev.thep.lu.se/yat/ticket/215#215: weight information in InputRankerTue, 20 Mar 2007 11:22:58 GMTPeter<p>
Either provide sum of weights for each input, or perhaps average of weights is more appropriate, or an Averager.
</p>
<p>
Averager is probably most general. Could be stored as vector<Averager> where size of vector is number of inputs and Averager is updated when calculating the score.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/215#changelog
https://dev.thep.lu.se/yat/ticket/216
https://dev.thep.lu.se/yat/ticket/216#216: PCA is lacking functionalitySat, 24 Mar 2007 14:20:53 GMTJari Häkkinen<p>
Currently, PCA only performs a decomposition if the number of rows is not smaller than the number of columns.
</p>
<p>
PCA should work also for this case choose a functional algorithm automatically.
</p>
<p>
In addition, there is a lot of functionality missing that is useful.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/216#changelog
https://dev.thep.lu.se/yat/ticket/258
https://dev.thep.lu.se/yat/ticket/258#258: Permutation testThu, 27 Sep 2007 21:40:20 GMTPeter<p>
I tend to write the same code over and over again
</p>
<pre class="wiki">double threshold = score(target);
Averager p = 0;
for (size_t i=0; i<N; ++i){
if (score(target)>=threshold)
p.add(1);
else
p.add(0);
target.shuffle()
}
cout << p.mean() << endl;
</pre><p>
Perhaps we should have a function or class for doing it. The main component is calling score here which should be a functor class. Sometimes these kind of calculations can be quite computational so perhaps one should have the ability to write some intermediate result to an ostream* (if NULL no reporting???).
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/258#changelog
https://dev.thep.lu.se/yat/ticket/302
https://dev.thep.lu.se/yat/ticket/302#302: allow different normalizations in PCAThu, 31 Jan 2008 14:00:43 GMTPeter<p>
This is an offspring from <a class="new ticket" href="https://dev.thep.lu.se/yat/ticket/216" title="#216: enhancement: PCA is lacking functionality (new)">ticket:216</a>
</p>
<p>
Currently (pre0.4) only row-mean-centering is allowed in PCA. Perhaps we could allow other normalizations such as no normalization, median centralizing, z-score, as well as iterative row column normalization.
</p>
<p>
This should be implemented as a functor. The PCA can then be templatized on this functor, with default being the current one.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/302#changelog
https://dev.thep.lu.se/yat/ticket/346
https://dev.thep.lu.se/yat/ticket/346#346: Generalize NNI to utilize different metricsTue, 11 Mar 2008 15:21:30 GMTPeter<p>
It has been reported that is might be beneficial to use a correlation based distance rather than an Euclidean one (see e.g. <a class="ext-link" href="http://www.biomedcentral.com/1471-2105/9/12/abstract/"><span class="icon"></span>Brock et al.</a>).
</p>
<p>
These seems trivial to simply add a Distance class and thereby achieve the generalization. However, one'd better be careful here, because changing the metric should not only be reflected in calculation of distance between rows (i.e. genes). It should also be reflected in the imputation equation (see eq. 10 in WeNNI paper).
</p>
<p>
In equation 10, imputation value is simply a weighted average of the values of the nearest neighbors. This is motivated because you assume that nearest neighbors should also be close in the sample we are imputing. Using a semi-positive definite distance measure such as PearsonDistance, this is suboptimal. The reason is that two vectors can have a very small distance even though element values are very different. The distance can actually be zero also for non-identical vectors. In other words, the distance is translational and scale invariant. Say for instance that we have the following small matrix
</p>
<pre class="wiki">0 12 0 12 mv
4 8 4 8 4
</pre><p>
Obviously imputing the missing value to 4 here is not optimal - the vectors have a zero distance prior imputation and post imputation the distance would no longer be zero. Rather we would like to set the value to 0. How can that be achieved? Well, the key is in the invariants mentioned above. We remember that correlation based distance is equivalent to a z-score normalization followed by Euclidean distance. Therefore, it would be tempting to z-score each row and then use Euclidean distance to impute values. However, there are some disadvantages with that approach basically because a correlation calculation is based solely on pairs of data present in both vectors. Therefore, the average and variance used in correlation calculation would be different from the one in the z-score normalization, which would yield unwanted behavior such as mentioned above. A better approach would be to perform the z-score normalization based on the same data that is used in calculation of the distance.
</p>
<blockquote>
<p>
<code>y' = (y-m)/s</code>
</p>
</blockquote>
<p>
The missing value can then be imputed from the neighbor: <code>y' = x'</code> and the z-score is reversed to get back the original average and scale <code>y = s*y'+m</code> (well technically they will be different due to imputed values, but almost...).
</p>
<p>
Implementation-wise there is probably no reason to perform the normalization back and forth. Instead one could calculate the correlation distance using AveragerPairWeighted and thereby getting the nearest neighbors. the trick is then to calculate the imputation value directly using averages and variances of x and y.
</p>
<p>
<code>y = s_y * y' + m_y = s_y * x' + m_y = s_y * (x-m_x) / s_x + m_y </code>
</p>
<p>
which is similar to equation for linear regression
</p>
<p>
<code>y = s_y/s_x * r * (x-m_x) + m_y</code>.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/346#changelog
https://dev.thep.lu.se/yat/ticket/364
https://dev.thep.lu.se/yat/ticket/364#364: Histogram cumulative distributionTue, 13 May 2008 12:21:31 GMTPeter<p>
related to <a class="closed ticket" href="https://dev.thep.lu.se/yat/ticket/290" title="#290: enhancement: Histogram output operator (closed: invalid)">ticket:290</a>
</p>
<p>
I use the Histogram class to collect an empirical cumulative distribution. In other words, I wanna know how many data points are <= X. To do that I need to know the upper-bound of each bin. Currently it is possible to calculate that, but it would be more convenient if there was a function to get the upper-bound of bin <code>i</code>.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/364#changelog
https://dev.thep.lu.se/yat/ticket/433
https://dev.thep.lu.se/yat/ticket/433#433: predict in regression::LocalTue, 02 Sep 2008 19:30:20 GMTPeter<p>
After training a <code>Local</code> using the fit function I would like to use the trained model to predict the value of an independent data point. A predict function!!!
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/433#changelog
https://dev.thep.lu.se/yat/ticket/711
https://dev.thep.lu.se/yat/ticket/711#711: speed up exact p-value for small n in KendallTue, 10 Jul 2012 12:51:49 GMTPeter<p>
Class Kendall currently uses a naive algorithm that parses through all N! permutations and calculates the score. Since the score calculation scales (N<sup>2</sup>, see <a class="closed ticket" href="https://dev.thep.lu.se/yat/ticket/710" title="#710: enhancement: speed up Kendall::score using NlogN algorithm (closed: fixed)">#710</a>) this becomes expensive for not very large N. It should be possible to calculate the P-value in an recursive fashion similar to ROC. <a class="ext-link" href="http://www.statsdirect.com/help/nonparametric_methods/kend.htm"><span class="icon"></span>statsdirect</a> tabulates values for small N and then use an Egenworth expansion for larger N (approximation?).
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/711#changelog
https://dev.thep.lu.se/yat/ticket/31
https://dev.thep.lu.se/yat/ticket/31#31: add Permutation test for weighted ScoreSat, 17 Dec 2005 04:53:07 GMTPeter<p>
Add this function in base class taking target, value, weight, and number of permutations, and perhaps a treshold telling us that we are only interested in p-values below this limit. The perumtation run shall of course be exit when we can be sure that the final p-value will not be below the treshold, e.g. quit when (O-E)<sup>2/E goes above some limit (depending on p-treshold).
</sup></p>
Resultshttps://dev.thep.lu.se/yat/ticket/31#changelog
https://dev.thep.lu.se/yat/ticket/231
https://dev.thep.lu.se/yat/ticket/231#231: read, digest, and adopt APR devel guidelinesWed, 20 Jun 2007 15:35:50 GMTPeter<p>
<a class="ext-link" href="http://apr.apache.org/versioning.html"><span class="icon"></span>http://apr.apache.org/versioning.html</a>
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/231#changelog
https://dev.thep.lu.se/yat/ticket/373
https://dev.thep.lu.se/yat/ticket/373#373: Paramaterize how to calculate the average in Quantile NormalizationTue, 27 May 2008 15:50:29 GMTPeterResultshttps://dev.thep.lu.se/yat/ticket/373#changelog
https://dev.thep.lu.se/yat/ticket/382
https://dev.thep.lu.se/yat/ticket/382#382: Create an Index object from a string in Target classWed, 18 Jun 2008 16:46:17 GMTPeter<p>
The class could be used to easily get which samples that belong to group <em>A</em>.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/382#changelog
https://dev.thep.lu.se/yat/ticket/422
https://dev.thep.lu.se/yat/ticket/422#422: add(first1, last1, first2) in regression::LocalMon, 25 Aug 2008 14:40:40 GMTPeter<p>
In regression::local there is a member function `add(double, double); it would be convenient to allow addition of ranges as well.
</p>
<p>
This is the same feature that exists, for example, in <a class="missing wiki">AveragerPair?</a>.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/422#changelog
https://dev.thep.lu.se/yat/ticket/428
https://dev.thep.lu.se/yat/ticket/428#428: yat.m4 and yat-config only work when installedWed, 27 Aug 2008 22:51:40 GMTPeter<p>
Perhaps we should add support for having yat as a bundled package???
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/428#changelog
https://dev.thep.lu.se/yat/ticket/127
https://dev.thep.lu.se/yat/ticket/127#127: Use Distance::Euclidean in GaussianKernelFunctionTue, 05 Sep 2006 05:07:15 GMTPeter<p>
K = exp(-(x-y)<sup>2</sup>/width ) where (x-y)<sup>2</sup> is the Euclidean distance so it makes sense to use that class here.
</p>
<p>
needs <a class="closed ticket" href="https://dev.thep.lu.se/yat/ticket/126" title="#126: request: Euclidean distance class (closed: fixed)">#126</a>
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/127#changelog
https://dev.thep.lu.se/yat/ticket/400
https://dev.thep.lu.se/yat/ticket/400#400: add functions taking ranges should return object that is modifiedFri, 18 Jul 2008 20:25:56 GMTPeterResultshttps://dev.thep.lu.se/yat/ticket/400#changelog
https://dev.thep.lu.se/yat/ticket/618
https://dev.thep.lu.se/yat/ticket/618#618: redundancy in 'required' and dependency in Makefile.amSun, 04 Apr 2010 16:05:14 GMTPeter<p>
The variable 'required' and the dependencies for .log files declared in Makefile.am are very redundant, so in principle it should be possible to generate those dependencies from the run of a test.
</p>
<p>
I'm thinking of something similar to how gcc and automake generates files in dir '.deps' that is included into main Makefile.
</p>
Resultshttps://dev.thep.lu.se/yat/ticket/618#changelog