Opened 7 years ago

Closed 4 months ago

#798 closed request (fixed)

output sorted bam buffer

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.19
Component: omic Version:
Keywords: Cc:

Description

A common pattern is to read a InBamFile, find pairs and process, and send them to a new OutBamFile. Finding pairs is easily done with bam_pair_analyse, but there is no easy way to tranform the bam pairs to a sorted stream of bam reads.

I suggest a buffer class that you can feed with bam pairs and the class sort the reads on fly and send them to an OutBamFile or perhaps copy to an iterator (more general). We don't wanna buffer the whole range of reads, so there should be a way to tell the buffer that "I will not feed you with any reads smaller than POS" and then the buffer can flush those reads.

Obviously there is a great danger here. Say for example that we have a pair with one read mapped to first chr and one read mapped to last chr, then the buffer will need to store all reads in between in order to create its sorted output. That could potentially be way too many for any reasonable memory, in which case user probably needs to cache on file before creating the final output. One could e.g. have one cache for properly paired pairs, one for..., and in writing such an algorithm this suggested buffer could be a useful building block.

Change History (2)

comment:1 Changed 4 months ago by Peter

Milestone: yat 0.x+yat 0.19
Status: newaccepted

comment:2 Changed 4 months ago by Peter

Resolution: fixed
Status: acceptedclosed

In 4057:

closes #798; new classes BamPairBuffer? and a more general SortedBuffer?.

Note: See TracTickets for help on using tickets.