Opened 7 years ago

Closed 5 years ago

## #848 closed request (fixed)

# class performing chi^2 test

Reported by: | Peter | Owned by: | Peter |
---|---|---|---|

Priority: | major | Milestone: | yat 0.16 |

Component: | statistics | Version: | |

Keywords: | Cc: |

### Description

Input could be a matrix calculating *expected* as in Fisher or possibly also a use case for taking a data range and a expectation range.

### Change History (6)

### comment:1 Changed 6 years ago by

Summary: | class performin chi^2 test → class performing chi^2 test |
---|

### comment:2 Changed 5 years ago by

### comment:3 Changed 5 years ago by

Milestone: | yat 0.x+ → yat 0.16 |
---|

**Note:**See TracTickets for help on using tickets.

Some notes:

Expectation matrix m is defined as: M_ij = sum row i(X) * sum col j(X) / n

chi2 is calculated as sum ((X-M)

^{2}/ M) where the sum runs over all elementsIf all elements in m is large (>5 or >10) chi2 follows a chi2(df) distribution and p-value can be calculated from that. df is number of degrees of freedom, (#rows - 1) * (#cols - 1)

If M contains small values, p-value can be inferred using monte-carlo. Generate a randomly permuted variant of X, calculate chi2, and collect stats whether chi2 is greater (or equal) that the real chi2. The random matrix should have the same row sums (rs) and columns sums (cs) as the X. The most straight forward method is to generate a vector of size n, with the rs[0] first elements set to 0, the next rs[1] elements set to 1, et.c.; then random_shuffle the vector. Number of 0s in the first cs[0] samples defines element R(0, 0), number of 1s define element in R(0, 1) etc.

A possible faster method to generate a random matrix is to use the hypergeometric distribution repeatedly to fill the matrix.