Opened 12 years ago

Closed 12 years ago

Last modified 10 years ago

#671 closed enhancement (fixed)

Increase accuracy in Averager

Reported by: Peter Owned by: Peter
Priority: minor Milestone: yat 0.8
Component: statistics Version: trunk
Keywords: Cc:

Description

It's been discussed on help-gsl how to best calculate mean and variance. Our approach in Averager using variable n, sum_x and sum_xx is likely the fastest because the update only involves three additions. Unfortunately, it is not the most accurate and variance may in some cases become negative which yields invalid standard deviation. Knuth suggests using variables n, mean and M2 where the latter is the centered sum squared. The update rules (including weighted case) can be found here. Only downside is that it is slightly slower as they involve a couple of divisions and multiplications. Any opinions whether this is worthwhile to implement?

The Averager classes are heavily used in, e.g., the classifiers and thus this change would influence not only the Averager classes. Speed versus accuracy. The fact that GSL uses the accurate method make lean towards that, but I'm not sure.

Change History (13)

comment:1 Changed 12 years ago by Jari Häkkinen

I think the accurate way is better. When it is time to optimize speed then we could create a faster class.

comment:2 Changed 12 years ago by Peter

Milestone: yat 0.x+yat 0.8
Type: discussionenhancement

OK, let's include this in milestone:"yat 0.8"

comment:3 Changed 12 years ago by Peter

Owner: changed from Jari Häkkinen to Peter
Status: newassigned

comment:4 Changed 12 years ago by Peter

(In [2558]) refs #671. Follow Knuth's algorithm to calculate average and variance online

comment:5 Changed 12 years ago by Peter

(In [2559]) minor speed-up. refs #671

comment:6 Changed 12 years ago by Peter

(In [2561]) refs #671. Correct so mean is NaN when there is no data

comment:7 Changed 12 years ago by Peter

(In [2562]) refs #671. Reimplement AveragerWeighted?

comment:8 Changed 12 years ago by Peter

Just for the record, here's a paper on the subject: http://www.janinebennett.org/index_files/ParallelStatisticsAlgorithms.pdf

comment:9 Changed 12 years ago by Peter

(In [2565]) refs #671

comment:10 Changed 12 years ago by Peter

(In [2568]) refs #671

comment:11 Changed 12 years ago by Peter

(In [2569]) refs #671 optimization

comment:12 Changed 12 years ago by Peter

Resolution: fixed
Status: assignedclosed

comment:13 Changed 10 years ago by Peter

(In [3039]) avoid building static when static is disabled which might indicate static archives are not available on system. refs #671

Note: See TracTickets for help on using tickets.