Opened 2 years ago

Closed 2 years ago

#955 closed defect (fixed)

incorrect Kendall's tau p-value

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.17.2
Component: statistics Version: 0.17
Keywords: Cc:

Description

With statistics::Kendall I get the following result Score: 0.260358 One-sided P: 0.447281 Two-sided P: 0.894562

whereas via https://astatsa.com/CorrelationTest I get Kendall's rank correlation sample estimate τ= : 0.249107 alternate hypothesis: true τ ≠0 z -statistic: 4.11244903545 p-value: 0.000039

The difference between score and tau is acceptable, but something is wrong for the p-value.

Attachments (1)

data.txt (891 bytes) - added by Peter 2 years ago.
data

Download all attachments as: .zip

Change History (6)

Changed 2 years ago by Peter

Attachment: data.txt added

data

comment:1 Changed 2 years ago by Peter

In 3972:

add test data for statistics::Kendall (refs #955)

comment:2 Changed 2 years ago by Peter

In 3973:

add dependency for lazycheck (refs #955)

comment:3 Changed 2 years ago by Peter

The problem is that the wrong formula is used in the correction of variance.

There is a term counting triplets

\sum t(t-1)(t-2) * \sum u(u-1)(u-2)

which should be divided by 9n(n-1)(n-2). Not doing that overestimates the variance and since that factor growths like n cubed, the error becomes disastrous for large n.

comment:4 Changed 2 years ago by Peter

Also, note that the term above is only non-zero when there is at least one triplet in both x and y (i.e. one x-value is populated by at least three data points and the same for y). If there are not ties in both variables or the tie is only between two data points, this term is zero and the bug does not come into play.

comment:5 Changed 2 years ago by Peter

Resolution: fixed
Status: newclosed

In 3975:

fixes #955.

The correction term involving count_triple() was not divided with the
correct factor (9n(n-1)(n-2)) resulting in overestimated variance and
thus overestimated p-values. On top of changing that single line, also
rewriting the most of the function to make it easier to read.

Note: See TracTickets for help on using tickets.