Opened 14 years ago

Closed 13 years ago

#223 closed defect (fixed)

test contain comparison of doubles with the == operator

Reported by: Jari Häkkinen Owned by: Peter
Priority: major Milestone: yat 0.4
Component: test Version: trunk
Keywords: Cc:

Description (last modified by Jari Häkkinen)

This is poor programming style. Use fabs(a-b)<some_small_number

averager_test.cc and regression_test.cc are two test that needs to be fixed, more may exist.

All tests should be checked for double comparisons with operator == and changed accordingly. There may be cases when exact equality is required.

Change History (12)

comment:1 Changed 14 years ago by Jari Häkkinen

Description: modified (diff)

comment:2 Changed 13 years ago by Markus Ringnér

Perhaps we should do something like:

  • If we compare doubles calculated in the same test program we should have a slack_bound related to slack_bound=std::numeric_limits<double>::epsilon(). Perhaps slac_bound=10*epsilon()? So we get tests that scale with the precision of systems.
  • Sometimes we compare doubles with doubles read from files distributed with the test suite. Here we should use a slack_bound that reflects the precision in the files.
  • We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.
  • Also we should change all fabs to std::abs.

comment:3 in reply to:  2 ; Changed 13 years ago by Markus Ringnér

Replying to markus:

  • We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.

Oops, in the example I meant if we do sum(fabs(x_i-y_i)), this sum should be related to N*slack_bound.

comment:4 Changed 13 years ago by Peter

This is a lot of functionality. I just wanna remind of ticket:224 (Test Suite Class) that could host all these.

comment:5 in reply to:  3 ; Changed 13 years ago by Peter

Replying to markus:

Replying to markus:

  • We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.

Oops, in the example I meant if we do sum(fabs(x_i-y_i)), this sum should be related to N*slack_bound.

Many tests are not simple sums. For example what if we test correlation calculated from Averager and AveragerWeighted?. Or even more complicated if you test the prediction from a Classifier, then the actual calculation is way more complicated than just a simple sum. What should N be in this case?

Also remember that epsilon is the relative precision, i.e., the precision around unity. If you compare numbers around 1,000 the error is typically 1000 larger.

comment:6 in reply to:  5 Changed 13 years ago by Markus Ringnér

Replying to peter:

Replying to markus:

Replying to markus:

  • We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.

Oops, in the example I meant if we do sum(fabs(x_i-y_i)), this sum should be related to N*slack_bound.

Many tests are not simple sums. For example what if we test correlation calculated from Averager and AveragerWeighted?. Or even more complicated if you test the prediction from a Classifier, then the actual calculation is way more complicated than just a simple sum. What should N be in this case?

Also remember that epsilon is the relative precision, i.e., the precision around unity. If you compare numbers around 1,000 the error is typically 1000 larger.

With N, I meant the number of differences between doubles that are summed up. So what I meant was that if we compare two vectors, tests such as vector_error=sum(abs(x_i-y_i)) should be scaled so vector_error is compared with N*cutoff. In our tests there were examples of where abs(a-b) were compared to the same cutoff as vector_error. So as I said N should be used so that the cutoff always reflects the error of a single double. If we calculate a correlation and compare it with some expected value we have N=1.

My suggestion with epsilon was to scale it with some value, e.g. use 10*epsilon, as mentioned above. While this has drawbacks as you point out I think it is an improvement compared to having a mixture of different relatively arbitrary cutoffs that do not even scale with the precision of the system. Of course using a relative error may be better: abs((calculated-"true")/"true")<X*epsilon.

comment:7 Changed 13 years ago by Peter

(in [1210]) refs #223 change fabs to std::abs

comment:8 Changed 13 years ago by Peter

operator== is not necessarily bad. There is some functionality that should be equal, for example, median should just return percentile(50), so having a percentile(50)==median() should never fail.

It is of course another story when we start to write tests of type x==sqrt(x*x), which is bad.

comment:9 Changed 13 years ago by Peter

(in [1214]) fixed for averager_test.cc

comment:10 Changed 13 years ago by Peter

Owner: changed from Jari Häkkinen to Peter
Status: newassigned

comment:11 Changed 13 years ago by Peter

There are two examples when scaling the error allowance with epsilon is bad

1 Static comparison such as equal( pi(), 3.14). For this comparison the precision on your system is not really relevant (as long as it can handle 3 digits). On the contrary if we would let the error allowance scale with system precision, the test will fail on a system with very very good precision.

2 comparison of sqr: For example if we compare variance it makes perhaps sense to scale the error allowance with precision epsilon. But if we compare the square root of variance, i.e. standard deviation, we would expect the relative error to be much larger. We would expect the relative error to be at least the square root of the previous error allowance. And then we have not taken into account the error due to sqr calculation.

Currently, there are a few tests with very large N because of this. That is not good. Instead we should have a function when comparing these kind of numbers.

comment:12 Changed 13 years ago by Peter

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.