#671 closed enhancement (fixed)
Increase accuracy in Averager
Reported by: | Peter | Owned by: | Peter |
---|---|---|---|
Priority: | minor | Milestone: | yat 0.8 |
Component: | statistics | Version: | trunk |
Keywords: | Cc: |
Description
It's been discussed on help-gsl how to best calculate mean and variance. Our approach in Averager using variable n
, sum_x
and sum_xx
is likely the fastest because the update only involves three additions. Unfortunately, it is not the most accurate and variance may in some cases become negative which yields invalid standard deviation. Knuth suggests using variables n
, mean
and M2
where the latter is the centered sum squared. The update rules (including weighted case) can be found here. Only downside is that it is slightly slower as they involve a couple of divisions and multiplications. Any opinions whether this is worthwhile to implement?
The Averager classes are heavily used in, e.g., the classifiers and thus this change would influence not only the Averager classes. Speed versus accuracy. The fact that GSL uses the accurate method make lean towards that, but I'm not sure.
Change History (13)
comment:1 Changed 12 years ago by
comment:2 Changed 12 years ago by
Milestone: | yat 0.x+ → yat 0.8 |
---|---|
Type: | discussion → enhancement |
OK, let's include this in milestone:"yat 0.8"
comment:3 Changed 12 years ago by
Owner: | changed from Jari Häkkinen to Peter |
---|---|
Status: | new → assigned |
comment:4 Changed 12 years ago by
comment:6 Changed 12 years ago by
comment:8 Changed 12 years ago by
Just for the record, here's a paper on the subject: http://www.janinebennett.org/index_files/ParallelStatisticsAlgorithms.pdf
comment:12 Changed 12 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I think the accurate way is better. When it is time to optimize speed then we could create a faster class.