Opened 15 years ago
Closed 15 years ago
#461 closed defect (fixed)
Fisher::p_value is incorrect
Reported by: | Peter | Owned by: | Peter |
---|---|---|---|
Priority: | major | Milestone: | yat 0.4.3 |
Component: | statistics | Version: | 0.4.2 |
Keywords: | Cc: |
Description
The p-value behaves strange - and the code looks even stranger. For instance, p_value_one_sided
calls p_value_exact
and in p_value_exact
there is a comment: this makes the p_value two-sided
Change History (3)
comment:1 Changed 15 years ago by
Status: | new → assigned |
---|
comment:2 Changed 15 years ago by
comment:3 Changed 15 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Note: See
TracTickets for help on using
tickets.
Looking into this I've realized that it is not obvious how to define two-tailed p-value in case of non-symmetric distribution. A sloppy definition would be that the p-value is the probability to get the observed outcome or more extreme. The problem when the distribution is not symmetrix is to choose where to start the summation in the other tail.
Browsing the web there seem to be three alternatives, which, of course, converges when the distribution becomes symmetric.
1) The first alternative is based on one-sided p-values, or specifically the right-sided p-value P(X>=x) and the left-sided p-value P(X<=x). The two-tailed p-value is calculated as p2 = 2 * min(rp, 0.5, lp) where rp and lp are the one-sided p-values.
2) The second alternative is based on the odds ratio or the logarithm of it. Because the middle outcome gives a log oddsratio equal to zero one can run the sum over outcomes that have an absolute log oddsratio larger than the absolute value of the observed one.
3) The third alternative focuses more on the probabilities of the different outcomes. It runs the sums over outcomes that have a probability smaller than the observed one.
Strategy 3 is a bit weird because the strategy only makes sense when dealing with one-peak distributions, which hypergeometric distribution indeed is, but it makes it hard to generalize to other tests (although most distribution in practice are one-peaked)