Opened 11 years ago
Closed 10 years ago
#746 closed request (fixed)
non-const access to data in BamRead
Reported by: | Peter | Owned by: | Peter |
---|---|---|---|
Priority: | major | Milestone: | yat 0.11 |
Component: | omic | Version: | trunk |
Keywords: | Cc: |
Description
I want to modify my bam reads. Everything in core
is modifiable via core()
, but data in data
is not accessible via the interface other than some const functions. Specifically I want to modify quality, but should think of modifying any part of the void* data
. For the other variables (int l_aux, data_len, m_data
), I see no need to allow modifications, ATM.
Change History (15)
comment:1 Changed 11 years ago by
comment:3 Changed 10 years ago by
Status: | new → assigned |
---|
comment:4 Changed 10 years ago by
comment:5 Changed 10 years ago by
The uint8_t* data
is concatenated from the following strings:
name
cigar
sequence
quality
aux
where we currently (in trunk) can modify the cigar, a sequence element (not the size), and quality element (not size). There is no way to change the name, modify size of sequence (or quality), or modify the aux [sequence]. There are several ways to allow these things. One would be to give full access to the data pointer, which probably is a bad idea because I think the fact that these five strings are concatenated into one jumbo data array should be considered an implementation detail and hidden from public interface. We should remember that since data is concatenated into data*, modifying name
etc implies copying the entire array into a new array (see the cigar function for example). We could have a big function taking name, cigar, etc which updates the data array accordingly, but I'm not sure how to interface that. Should one take vector<uint8_t> or should one take uint8_t* with an int telling length of array. Difficult to say without having any use cases from real world. I can see that I would like to modify aux, so I think we should have a function doing that similar to the cigar function. Then I could see that one would like to trim the sequence, but doing that for sure means trimming the quality as well - and changing the length of sequence implies the cigar is invalid. Just thinking loud here, but I think the conclusion is that I add an aux function and then wait closer to release to see if learn what would make sense. Feedback and suggestions are obviously most welcome here.
comment:6 Changed 10 years ago by
Bam API has append and del functions, so probably better to wrap them than a single set function.
comment:7 Changed 10 years ago by
comment:8 Changed 10 years ago by
comment:10 Changed 10 years ago by
The aux interface is almost complete. There is not way to access a specific tag and modify in place, but one has to access, remove and re-append modified data. A bit cumbersome, but at least doable. The other thought I had implementing this was if one should have type specific get and append functions. For get I decided no because it is easy to use bam_aux2? functions to convert uint8_t* to corresponding type. For the append case, I'm still considering adding append functions for supported types. The for argument is that it is cumbersome to convert the data to uint8_t* and is easier to just add pass a double
or whather type the data is. Also it is redundant to calculate the length of the data ant the argument char that describes which type
data is. The against argument is that int8_t
and char
are typically same type on systems and can therefore not be overloaded while in SAM spec char
is a printable character and `int8_t' is a one-byte integer. Also tt's a bit bloating to add a function for each type but that could shaped up by a templatized interface. It's easier to add later than to remove later so I'm probably going for "wait and see..." option.
comment:12 Changed 10 years ago by
comment:13 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:14 Changed 10 years ago by
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Since everything else can be modified, I think we should allow modifying the name as well.
comment:15 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
(In [2985]) refs #746. added function to modify one element in sequence