Opened 12 years ago

Closed 12 years ago

#461 closed defect (fixed)

Fisher::p_value is incorrect

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.4.3
Component: statistics Version: 0.4.2
Keywords: Cc:


The p-value behaves strange - and the code looks even stranger. For instance, p_value_one_sided calls p_value_exact and in p_value_exact there is a comment: this makes the p_value two-sided

Change History (3)

comment:1 Changed 12 years ago by Peter

Status: newassigned

comment:2 Changed 12 years ago by Peter

Looking into this I've realized that it is not obvious how to define two-tailed p-value in case of non-symmetric distribution. A sloppy definition would be that the p-value is the probability to get the observed outcome or more extreme. The problem when the distribution is not symmetrix is to choose where to start the summation in the other tail.

Browsing the web there seem to be three alternatives, which, of course, converges when the distribution becomes symmetric.

1) The first alternative is based on one-sided p-values, or specifically the right-sided p-value P(X>=x) and the left-sided p-value P(X<=x). The two-tailed p-value is calculated as p2 = 2 * min(rp, 0.5, lp) where rp and lp are the one-sided p-values.

2) The second alternative is based on the odds ratio or the logarithm of it. Because the middle outcome gives a log oddsratio equal to zero one can run the sum over outcomes that have an absolute log oddsratio larger than the absolute value of the observed one.

3) The third alternative focuses more on the probabilities of the different outcomes. It runs the sums over outcomes that have a probability smaller than the observed one.

Strategy 3 is a bit weird because the strategy only makes sense when dealing with one-peak distributions, which hypergeometric distribution indeed is, but it makes it hard to generalize to other tests (although most distribution in practice are one-peaked)

comment:3 Changed 12 years ago by Peter

Resolution: fixed
Status: assignedclosed

(In [1624]) fixes #461. Also modified implementation of cdf_hypergeometric_P, which may cause conflict with modifications done in trunk (refs #87). If so, go with the trunk version (which uses GSL 1.8).

Note: See TracTickets for help on using tickets.