#377 closed request (fixed)
class for nucleotides
Reported by: | Peter | Owned by: | Peter |
---|---|---|---|
Priority: | major | Milestone: | yat 0.7 |
Component: | omic | Version: | trunk |
Keywords: | Cc: |
Description
I need some utilities for dealing with sequences of nucleotides both DNA and RNA. DNA and RNA should be supported in two different classes (possibly inherited or encapsulating shared functionality in a member variable). Typically a DNA is represented by A,C,G, and T, but I'd like support also for
N = any R = A | G W = A | T Y = C | T M = A | C K = G | T S = G | C H = !G B = !A V = !T D = !C
Support for translating RNA->DNA and DNA->RNA.
Comparison operators
For equality, most important to compare in such a way that equal means: could be equal i.e. 'N'=='A', 'S'=='G', but 'S'!='A'. When it comes to LessThanComparison it is not obvious how to define it so it is in line with equality definition and also fulfills Strict Weak Ordering.
Sequance
When it comes to the sequence i.e. an array of nucleotides either a std::vector could be used, or a wrapper class around vector, or possibly string_stream<T> (??).
Iterators are essential (to enable std::algorithm). Both normal and reverse iterators, as well as complement and reverse_complement should be supported. The latter pair of iterators transform according to WC-pairing such that A becomes T, and C becomes G, and vice versa. Complement iterators should probably only support const iterator, since a mutable transform iterator is a bit tricky (when they appear as on LHS in assignment).
Change History (11)
comment:1 Changed 15 years ago by
comment:3 follow-up: 4 Changed 13 years ago by
I have a DNA class declared as follows:
class DNA : boost::operators<DNA> { public: DNA(void); DNA(char); DNA complement(void) const; char get(void) const; DNA& operator&=(const DNA& other); DNA& operator|=(const DNA& other); DNA& operator^=(const DNA& other); }; bool operator==(const DNA& lhs, const DNA& rhs); std::ostream& operator<<(std::ostream&, const DNA&);
The behavior is a not exacly the same as described above, but is there any interest for such a class?
comment:4 Changed 13 years ago by
Replying to peter:
The behavior is a not exacly the same as described above, but is there any interest for such a class?
I will probably become interested after the summer so please add it.
comment:5 follow-up: 6 Changed 13 years ago by
Milestone: | yat 0.x+ → yat 0.7 |
---|---|
Owner: | changed from Jari Häkkinen to Peter |
Status: | new → assigned |
Summary: | sequence of nucleotides → class for nucleotides |
Type: | discussion → request |
which namespace? utility?
comment:6 Changed 13 years ago by
Replying to peter:
which namespace? utility?
Sure. If we find a better namespace later we can moved (prior to 1.0 ;-) )
comment:7 Changed 13 years ago by
I think I put this in a new namespace omic together with a small class for GenomicPosition.
comment:8 Changed 13 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:9 Changed 13 years ago by
Component: | utility → omic |
---|
It is not possible to define LessThanComparison obeying Strict Weak Ordering when having those fuzzy nucleotides. The reason is easiest illustrated by
N
which should be equivalent with any other nucleotide. However, Strict Weak Ordering requires thatIf x and y are equivalent as well as x and z, then it follows that y and z are equivalent.
Replacing x with N, obviously anything would become equivalent with anything, in other words, operator< would return false all the time, pretty meaningless.