Opened 15 years ago

Closed 13 years ago

Last modified 13 years ago

#377 closed request (fixed)

class for nucleotides

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.7
Component: omic Version: trunk
Keywords: Cc:

Description

I need some utilities for dealing with sequences of nucleotides both DNA and RNA. DNA and RNA should be supported in two different classes (possibly inherited or encapsulating shared functionality in a member variable). Typically a DNA is represented by A,C,G, and T, but I'd like support also for

N = any
R = A | G
W = A | T
Y = C | T
M = A | C
K = G | T
S = G | C
H = !G
B = !A
V = !T
D = !C

Support for translating RNA->DNA and DNA->RNA.

Comparison operators

For equality, most important to compare in such a way that equal means: could be equal i.e. 'N'=='A', 'S'=='G', but 'S'!='A'. When it comes to LessThanComparison it is not obvious how to define it so it is in line with equality definition and also fulfills Strict Weak Ordering.

Sequance

When it comes to the sequence i.e. an array of nucleotides either a std::vector could be used, or a wrapper class around vector, or possibly string_stream<T> (??).

Iterators are essential (to enable std::algorithm). Both normal and reverse iterators, as well as complement and reverse_complement should be supported. The latter pair of iterators transform according to WC-pairing such that A becomes T, and C becomes G, and vice versa. Complement iterators should probably only support const iterator, since a mutable transform iterator is a bit tricky (when they appear as on LHS in assignment).

Change History (11)

comment:1 Changed 15 years ago by Peter

It is not possible to define LessThanComparison obeying Strict Weak Ordering when having those fuzzy nucleotides. The reason is easiest illustrated by N which should be equivalent with any other nucleotide. However, Strict Weak Ordering requires that

If x and y are equivalent as well as x and z, then it follows that y and z are equivalent.

Replacing x with N, obviously anything would become equivalent with anything, in other words, operator< would return false all the time, pretty meaningless.

comment:2 Changed 14 years ago by Jari Häkkinen

How about sequence of amino acids?

comment:3 Changed 13 years ago by Peter

I have a DNA class declared as follows:

class DNA : boost::operators<DNA>
{
public:
	DNA(void);
	DNA(char);
	DNA complement(void) const;
	char get(void) const;
	DNA& operator&=(const DNA& other);
	DNA& operator|=(const DNA& other);
	DNA& operator^=(const DNA& other);
};  
bool operator==(const DNA& lhs, const DNA& rhs);
std::ostream& operator<<(std::ostream&, const DNA&);

The behavior is a not exacly the same as described above, but is there any interest for such a class?

comment:4 in reply to:  3 Changed 13 years ago by Jari Häkkinen

Replying to peter:

The behavior is a not exacly the same as described above, but is there any interest for such a class?

I will probably become interested after the summer so please add it.

comment:5 Changed 13 years ago by Peter

Milestone: yat 0.x+yat 0.7
Owner: changed from Jari Häkkinen to Peter
Status: newassigned
Summary: sequence of nucleotidesclass for nucleotides
Type: discussionrequest

which namespace? utility?

comment:6 in reply to:  5 Changed 13 years ago by Jari Häkkinen

Replying to peter:

which namespace? utility?

Sure. If we find a better namespace later we can moved (prior to 1.0 ;-) )

comment:7 Changed 13 years ago by Peter

I think I put this in a new namespace omic together with a small class for GenomicPosition.

comment:8 Changed 13 years ago by Peter

Resolution: fixed
Status: assignedclosed

(In [2341]) New namespace omic and new class omic::DNA. closes #377

comment:9 Changed 13 years ago by Peter

Component: utilityomic

comment:10 Changed 13 years ago by Peter

(In [2342]) adding missing files. refs #377

comment:11 Changed 13 years ago by Peter

(In [2343]) adding missing file. refs #377

Note: See TracTickets for help on using tickets.