Opened 16 years ago

Closed 16 years ago

#245 closed defect (fixed)

Distance should take Iterators

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.4
Component: statistics Version: trunk
Keywords: Cc:

Description (last modified by Markus Ringnér)

needs ticket:244

Well, in fact it should not take Iterators, but be templatized such that Iterators are accepted. But also std::vector<double>::const_iterator should be accepted.

In [889] this was implemented for iterators to unweighted containers. Extending the solution to iterators to weighted containers depends on ticket:247 and ticket:246.

In [890] this was implemented for iterators to weighted containers.

Change History (8)

comment:1 Changed 16 years ago by Peter

There are two ways to do this

  1. pure virtual template function that is implmented in inherited classes
  1. Using a two step rocket, in which we first have a free template function that fill upp an AveragerPair? that is passed to a virtual function.

I admit neither of these solutions are perfect, so if someone has a better solution, spread the word. The bad side of 2) is that it forces a Distance to be calculated grom an AveragerPair?. That is perhaps not such a big limitation, but it makes, e.g., rank-based distances such as Spearman correlation impossible. The bad side about 1) is that we have a virtual templatized function, which seems a bit fishy to me and I've read somewhere that "Avoid having..."

comment:2 Changed 16 years ago by Markus Ringnér

Well I guess then there is only one way to this becuase the ANSI/ISO Standard says (14.5.2 p 3): "A member function template shall not be virtual." So alternative 1. will not compile.

But I also do not like being limited to AveragerPair? based distances. So I will try to figure out a third alternative for discussion.

comment:3 Changed 16 years ago by Peter

Ok, I have looked at the code example (provided off the record), and it looks good.

Still to decide though is where the actual calculation of the distance will be performed. If we trash the Distance inheritance structure, how will it be done? Switch case blocks on a distance_tag?

comment:4 Changed 16 years ago by Markus Ringnér

No the traits pattern avoids needing such things as switch case blocks (I guess you can think of it as if the switch/case is done at compile and not run-time). Here are the key components, briefly. I ignore any additional traits for the iterator types to separate for example weighted and unweighted implementations.

A function that needs to calculate a distance takes an object of type distance_tag (all names are tentative; I will try to check in a version tomorrow).

So for example

void f(const distance_tag& disttype) {

Do some things including calculating the following distance: double dist=vector_distance(begin1,begin1,end2,disttype);

}

The general distance function vector_distance only finds out what trait disttype has and calls the corresponding specialized vector_distance accordingly:

template <typename Iter, typename Dist> double vector_distance(Iter beg1,Iter end1, Iter beg2,const Dist& disttype) {

return vector_distance(beg1,end1,beg2,

typename distance_traits<Dist>::distance());

the word typename before distance_traits is needed by the compiler to have a chance to parse properly

}

Based on which object distance_traits<Dist>::distance() returns, different vector_distance functions are called, and each of them implements a specific algorithm.

Since all traits in this case only are typedefs the only overhead is an additional call to the general vector_distance function (which perhaps is optimized away sometimes?). "Everyone" should call the general vector_distance. Either way this is comparable to calling a virtual function in the solution "virtual template member function" that would not work anyway.

For classes such as NCC they would take nad keep as a private member a const distance_tag just as the function f.

Finally for completeness here is what distance_traits would look like together with two different distance tags:

template <class T> struct distance_traits {

typedef typename T::distance distance;

};

struct distance_type {

typedef distance_type distance;

}; struct euclidean_type

: public distance_type { typedef euclidean_type distance;

}; struct pearson_type

: public distance_type { typedef pearson_type distance;

};

And all we would have to do to call f in a main program with pearson distance would be to do:

f(pearson_type());

Now f will call the general vector_distance and the general vector_distance will call:

template <typename Iter> double vector_distance(Iter beg1,Iter beg2,Iter end, const pearson_type& disttype) {

Calculate and return pearson distance using beg1, beg2 and beg3

}

because distance_traits<Dist>::distance() will in this case inside f return an object of type pearson_type.

comment:5 in reply to:  4 Changed 16 years ago by Markus Ringnér

Replying to markus:

Based on which object distance_traits<Dist>::distance() returns, different vector_distance functions are called, and each of them implements a specific algorithm.

It should say creates instead of returns above.

comment:6 in reply to:  4 Changed 16 years ago by Markus Ringnér

distance_tag should be replaced with distance_type everywhere:

Here ...

A function that needs to calculate a distance takes an object of type distance_tag (all names are tentative; I will try to check in a version tomorrow).

So for example

void f(const distance_tag& disttype)

... and here ...

For classes such as NCC they would take nad keep as a private member a const distance_tag just as the function f.

... and here

Finally for completeness here is what distance_traits would look like together with two different distance tags:

comment:7 Changed 16 years ago by Markus Ringnér

Description: modified (diff)

comment:8 Changed 16 years ago by Markus Ringnér

Description: modified (diff)
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.