Opened 14 years ago
Closed 13 years ago
#223 closed defect (fixed)
test contain comparison of doubles with the == operator
Reported by: | Jari Häkkinen | Owned by: | Peter |
---|---|---|---|
Priority: | major | Milestone: | yat 0.4 |
Component: | test | Version: | trunk |
Keywords: | Cc: |
Description (last modified by )
This is poor programming style. Use fabs(a-b)<some_small_number
averager_test.cc and regression_test.cc are two test that needs to be fixed, more may exist.
All tests should be checked for double comparisons with operator == and changed accordingly. There may be cases when exact equality is required.
Change History (12)
comment:1 Changed 14 years ago by
Description: | modified (diff) |
---|
comment:2 follow-up: 3 Changed 13 years ago by
comment:3 follow-up: 5 Changed 13 years ago by
Replying to markus:
- We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.
Oops, in the example I meant if we do sum(fabs(x_i-y_i)), this sum should be related to N*slack_bound.
comment:4 Changed 13 years ago by
This is a lot of functionality. I just wanna remind of ticket:224 (Test Suite Class) that could host all these.
comment:5 follow-up: 6 Changed 13 years ago by
Replying to markus:
Replying to markus:
- We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.
Oops, in the example I meant if we do sum(fabs(x_i-y_i)), this sum should be related to N*slack_bound.
Many tests are not simple sums. For example what if we test correlation calculated from Averager and AveragerWeighted?. Or even more complicated if you test the prediction from a Classifier, then the actual calculation is way more complicated than just a simple sum. What should N be in this case?
Also remember that epsilon is the relative precision, i.e., the precision around unity. If you compare numbers around 1,000 the error is typically 1000 larger.
comment:6 Changed 13 years ago by
Replying to peter:
Replying to markus:
Replying to markus:
- We should also make sure that slack_bound always is scaled to correspond to a single double. I.e. if we compare the sum of two vectors we should divide the sums with the number of elements (N) or compare with N*slack_bound.
Oops, in the example I meant if we do sum(fabs(x_i-y_i)), this sum should be related to N*slack_bound.
Many tests are not simple sums. For example what if we test correlation calculated from Averager and AveragerWeighted?. Or even more complicated if you test the prediction from a Classifier, then the actual calculation is way more complicated than just a simple sum. What should N be in this case?
Also remember that epsilon is the relative precision, i.e., the precision around unity. If you compare numbers around 1,000 the error is typically 1000 larger.
With N, I meant the number of differences between doubles that are summed up. So what I meant was that if we compare two vectors, tests such as vector_error=sum(abs(x_i-y_i)) should be scaled so vector_error is compared with N*cutoff. In our tests there were examples of where abs(a-b) were compared to the same cutoff as vector_error. So as I said N should be used so that the cutoff always reflects the error of a single double. If we calculate a correlation and compare it with some expected value we have N=1.
My suggestion with epsilon was to scale it with some value, e.g. use 10*epsilon, as mentioned above. While this has drawbacks as you point out I think it is an improvement compared to having a mixture of different relatively arbitrary cutoffs that do not even scale with the precision of the system. Of course using a relative error may be better: abs((calculated-"true")/"true")<X*epsilon.
comment:8 Changed 13 years ago by
operator== is not necessarily bad. There is some functionality that should be equal, for example, median should just return percentile(50), so having a percentile(50)==median()
should never fail.
It is of course another story when we start to write tests of type x==sqrt(x*x)
, which is bad.
comment:10 Changed 13 years ago by
Owner: | changed from Jari Häkkinen to Peter |
---|---|
Status: | new → assigned |
comment:11 Changed 13 years ago by
There are two examples when scaling the error allowance with epsilon is bad
1 Static comparison such as equal( pi(), 3.14). For this comparison the precision on your system is not really relevant (as long as it can handle 3 digits). On the contrary if we would let the error allowance scale with system precision, the test will fail on a system with very very good precision.
2 comparison of sqr: For example if we compare variance it makes perhaps sense to scale the error allowance with precision epsilon. But if we compare the square root of variance, i.e. standard deviation, we would expect the relative error to be much larger. We would expect the relative error to be at least the square root of the previous error allowance. And then we have not taken into account the error due to sqr calculation.
Currently, there are a few tests with very large N because of this. That is not good. Instead we should have a function when comparing these kind of numbers.
comment:12 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Perhaps we should do something like:
slack_bound=std::numeric_limits<double>::epsilon()
. Perhapsslac_bound=10*epsilon()
? So we get tests that scale with the precision of systems.