Opened 13 years ago

Closed 13 years ago

#301 closed discussion (wontfix)

NCC - how to calculate the centroid

Reported by: Peter Owned by: Markus Ringnér
Priority: major Milestone:
Component: classifier Version: trunk
Keywords: Cc:

Description (last modified by Peter)

I read on http://www-stat.stanford.edu/~tibs/PAM/Rdist/howwork.html

that Stanford people calculate the standardized centroid as the average gene expression for each gene in each class divided by the within-class standard deviation for that gene.

Is that something we should allow?

Change History (6)

comment:1 Changed 13 years ago by Peter

Description: modified (diff)

comment:2 Changed 13 years ago by Peter

Well well, I guess using the standardized centroid it becomes very similar to NBC, so this is not really needed.

Another thing is that in the PNAS paper they implicitely say that data is centralized. They interchange between shrink towards zero and shrink toward overall centroid. I think we should mention in NCC dox that it could be a good idea to centralize data (in particular when using Pearson distance.

comment:3 Changed 13 years ago by Markus Ringnér

I agree regarding the importance of documenting centralization recommendations. How to centralize is crucial. Also how to make the normalization of a training data set comparable to a test data set is crucial. Sometimes you want to predict a large test data set and you want to centralize each gene across the entire test data set as this will correspond to your opinion of the training data. Sometimes you predict a test data set not reflecting an entire relevant cohort of samples and you can not centralize it separately to obtain relevant results (in which case you need to centralize the data in some other way so it is comparable to the training data).

comment:4 Changed 13 years ago by Jari Häkkinen

Milestone: To Be Determined0.5

comment:5 Changed 13 years ago by Markus Ringnér

I have changed my mind regarding documenting that it is a good idea to centralize data. Because it this kind of documentation would be enormous to write for the entire yat and is hard to maintain. I think we should document functionality. But to make good recommendations one essentially may need to write text books and articles.

comment:6 Changed 13 years ago by Jari Häkkinen

Milestone: yat 0.5
Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.