Opened 8 years ago

Last modified 3 years ago

#774 new discussion

iterate bam file over set of regions

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.x+
Component: omic Version:
Keywords: Cc:

Description

Creating a bam read iterator is relatively expensive because it has to seek the location of the first read (utilizing the index but still). When doing that repeatedly, say to iterate over all exons, it might be cheaper to just read through to the next region rather than create a new BamReadIterator which seeks the location etc. This depends on how many reads there are between current and next region. The idea here is to have an iterator that reads through the bam file when that is fastest and seeks via the bam index when that is fastest.

Change History (1)

comment:1 Changed 3 years ago by Peter

From htslib v1.7 there is a multi-region iterator:

The new structure takes a list of regions and
iterates over all, deduplicating reads in the process, and producing a
full list of file offset intervals. This is usually much faster than
repeatedly using the old single-region iterator on a series of regions. 
Note: See TracTickets for help on using tickets.