Opened 12 years ago

Closed 12 years ago

#464 closed defect (fixed)

WeNNI does not handle NaNs well

Reported by: Jari Häkkinen Owned by: Jari Häkkinen
Priority: major Milestone: yat 0.5
Component: utility Version: trunk
Keywords: Cc:

Description (last modified by Peter)

NaNs in the data matrix will stay NaNs. Even when the weight is zero for the corresponding element in the data matrix.

Change History (11)

comment:1 Changed 12 years ago by Peter

Description: modified (diff)
Milestone: yat 0.5

comment:2 Changed 12 years ago by Jari Häkkinen

I suppose we want 0.0*NaN to be 0? ... or should WeNNI really care at all? It should be the callers problem to make sure that NaNs? are removed before calling WeNNI? I think this is an invalid ticket. Any thoughts?

comment:3 in reply to:  2 Changed 12 years ago by Peter

Replying to jari:

I suppose we want 0.0*NaN to be 0? ... or should WeNNI really care at all? It should be the callers problem to make sure that NaNs? are removed before calling WeNNI? I think this is an invalid ticket. Any thoughts?

I think WeNNI should take of this. Not only because I would expect her to do so, but also to be in line with other classes in yat. AveragerWeighted, for example, ignores data points where the weight is zero (also when the data value is NaN). What causes your hesitation?

comment:4 Changed 12 years ago by Jari Häkkinen

If other classes ignore data with weight zero then probably WeNNI should also do so.

My reluctance to take care on NaNs for zero weights is

  1. The algorithms must check for zero weight all the time.
  2. Algorithmic problems/bugs may be hidden by the fact that a zero weight will remove NaNs that shouldn't possible in the algorithm (i.e. there is a bug).
  3. Is 0*NaN well specified? Is 0*Infinity well specified.
  4. In my specific use case, I actually feed the NaNs to WeNNI in my data matrix and have set a zero weight for the NaN elements. I could also set the data matrix values to zero to avoid WeNNI issues.

However, we must be internally consistent in yat. Does this mean that we change WeNNI or the others?

comment:5 in reply to:  4 ; Changed 12 years ago by Jari Häkkinen

Replying to jari:

However, we must be internally consistent in yat. Does this mean that we change WeNNI or the others?

I just want to clarify, I think we should change WeNNI to comply with the rest of the classes. However, we need to document our choice because it is not an obviously natural choice.

comment:6 Changed 12 years ago by Peter

Replying to jari:

If other classes ignore data with weight zero then probably WeNNI should also do so.

Yeah, well I think we have to see this question in a bigger picture because it is related to how to handle missing values over all. If I remember correctly, we (all three!) had a discussion spring 2007 on how to deal with missing values. Some (the third man) wanted that missing values should be represented by NaNs and that yat should take care of them accordingly. You argued that was stupid with speed as main argument. Eventually we agreed on a compromise saying that unweighted methods should not check for NaNs but should be as fast as possible. When having missing values one should instead use the weighted method with a weight zero, and thus the function, nan(const Matrix&, Matrix&), to simplify this workflow.

To be honest, I can't remember if we explicitly discussed whether weighted algorithms should deal with NaNs, but I think that was what the third man and me concluded. That is how we have designed the classifiers and related statistical classes such as Distance.

My reluctance to take care on NaNs for zero weights is

  1. The algorithms must check for zero weight all the time.

I see the cost.

  1. Algorithmic problems/bugs may be hidden by the fact that a zero weight will remove NaNs that shouldn't possible in the algorithm (i.e. there is a bug).

Getting NaN unexpectedly is a good way to detect that something is wrong. However, I as I would expect WeNNI to impute NaNs, I would get unexpected NaNs although nothing is wrong. This could of course be changed with proper docs so I know better what to expect. Anyhow, we are only hiding in the case of {NaN, 0} so in all cases when weight is non-zero you will still detect your bugs.

  1. Is 0*NaN well specified? Is 0*Infinity well specified.

I would guess that some ISO document says that any operator with a NaN as argument will return NaN. Same story with 0*Inf that it is defined to NaN. In math it is another story as infinity does not really exist other than as a limit case. But I guess your question was retorical!

This is not about how some multiplication is defined. This is about how we want to deal with missing values in yat. I'm saying that a missing value should be represented by a zero weight. You're saying that a missing value should be represented by a zero weight but the data value must not be NaN, Inf, or -Inf.

  1. In my specific use case, I actually feed the NaNs to WeNNI in my data matrix and have set a zero weight for the NaN elements. I could also set the data matrix values to zero to avoid WeNNI issues.

Sure. That workaround would, of course, work.

However, we must be internally consistent in yat. Does this mean that we change WeNNI or the others?

Changing the others would have some great impact. It would change the behaviour of all weighted statistics including regression and classifiers. I'm obviously against that and I hope the argument that it would break compatibility with yat 0.4 is enough to convince you that it is the wrong way to go.

If you insist that we should not change WeNNI, I'd prefer to be inconsistent rather than changing everything else.

comment:7 in reply to:  5 Changed 12 years ago by Peter

Replying to jari:

Replying to jari:

However, we must be internally consistent in yat. Does this mean that we change WeNNI or the others?

I just want to clarify, I think we should change WeNNI to comply with the rest of the classes. However, we need to document our choice because it is not an obviously natural choice.

OK, I could extent the introduction in Weighted Statistics with some comment that Setting a weight to zero is equivalent to removing the data point. should also be true when dealing with NaNs.

comment:8 Changed 12 years ago by Peter

(In [1639]) adding a comment that data with zero weight should be ignored also when data value is NaN. refs #464

comment:9 Changed 12 years ago by Jari Häkkinen

Status: newassigned

I am changing WeNNI to treat weight=0 such that irrespective what the data value is, the result is 0.

comment:10 Changed 12 years ago by Jari Häkkinen

(In [1725]) Addresses #464. Weight zero will kill NaNs? and Infs.

comment:11 Changed 12 years ago by Jari Häkkinen

Resolution: fixed
Status: assignedclosed

(In [1727]) Fixes #464. Final tests.

Note: See TracTickets for help on using tickets.