Opened 8 years ago

Closed 8 years ago

#784 closed request (fixed)

BamPairIterator

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.12
Component: omic Version:
Keywords: Cc:

Description

I need something similar to bam_pair_analyse, but rather than process the entire range in one go, I'd like to control when to iterate forward. In other words, an iterator with value type std::pair<BamRead?, BamRead?> that works on a sorted range of BamnRead? very much like bam_pair_analyse but where we can control when to iterate and when to access the data. Basically split the functionality of bam_pair_analyse into the iterating part and the analysing part. Having this iterator bam_pair_analyse should be reimplemented using it, which will be trivial.

My use case is that I have an app that works on two bam files. I wanna work on the read pairs in these files, and I wanna do it in such a way that they are almost in-synch, i.e., the position of the bam file is almost equal in the two files, and to accomplish that I need to control which file to iterate.

Change History (7)

comment:1 Changed 8 years ago by Peter

Status: newassigned

comment:2 Changed 8 years ago by Peter

(In [3173]) a first version of BamPairIterator?. refs #784

comment:3 Changed 8 years ago by Peter

(In [3174]) Speed up copy and assignment by keeping pointers of maps rather than hard copies. Incrementing input iterators is expected to [possibly] modify underlying resource. Take istream_iterator as an example, which obviously modifies the istream it points to when it's incremented and thus effects other iterators that point to the same istream. This is described in concept description as that input iterators are only guaranteed to be a single pass iterator. Given that users do not expect input iterators to be independent, we can use that to speed up copy constructor by only copiding a [smart] pointer rather than the whole map, which might get quite heavy if there are a lot of pairs mapped far away from each other.

comment:4 Changed 8 years ago by Peter

The member variable pair<BamRead, BamRead> x_ is annoying as it introduces a copy of two BamRead? for every increment(). We want however return something similar to pair<BamRead, BamRead> as it is what I expect as a user. It might be possible to have proxy class to be returned as reference_type. Especially since operator* is not used explicitly but typically the user calls iterator->first and iterator->second. One problem is that std::pair have public access to first and second (not via public function), otherwise one could have had a proxy class that have a function first() which returns a BamRead. Perhaps operator* could return our own novel class BamPairRead that only holds pointers to relevant info and have functions first(void) and second(void) that return what you expect.

Inputs are welcome.

comment:5 Changed 8 years ago by Peter

My thoughts are messy on this, which is very well reflected in comment above. I'll try to itemize the objections more structured:

1) value_type should be a class that is stand-alone and independent from the iterator, i.e., it should hold its own copies of the BamReads. This is fulfilled currently with a std::pair<BamRead, BamRead>

2) Avoid keeping a value_type privately in iterator as it enforces copying two BamReads every time incremented.

3) reference_type has to be convertible to value_type, so we allow: value_type val = *iterator as well as func(*iterator) where func takes a value_type or const value_type.

4) reference_type must have the same interface as value_type or at least almost. If value_type is std::pair<BamRead, BamRead>, e.g., we expect that we can write BamRead b = iterator->first or BamRead b = iterator->second or similarly if we have a function func(const BamRead&) that we can write func(iterator->first). If value_type is std::pair, this enforces reference_type to have a public member first and second that are convertible to BamRead. This is obviously fulfilled if reference_type is const std::pair<BamRead, BamRead>&.

comment:6 Changed 8 years ago by Peter

(In [3175]) new class BamPair?. refs #784

comment:7 Changed 8 years ago by Peter

Resolution: fixed
Status: assignedclosed

(In [3176]) closes #784. New class BamPairProxy? to finalize implementation of BamPairIterator?.

Note: See TracTickets for help on using tickets.