Opened 12 years ago

Closed 9 years ago

#568 closed request (fixed)

random::RNG is not multi-thread safe

Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.9
Component: random Version: trunk
Keywords: Cc:

Description (last modified by Jari Häkkinen)

The problem is

if (!instance_)
  instance_=new RNG;
return instance_;

where multiple threads in theory can reach inside the if block simultaneously.

A multi-thread safe singleton could be implemented as

if (!instance_) {
  LOCK;
  if (instance_)
    instance_=new RNG;
  UNLOCK;
}
return instance_;

where LOCK and UNLOCK comes from e.g. boost::thread and does the obvious thing. The purpose of the outer if(!instance_) is to avoid extra calls to LOCK and UNLOCK that likely are expensive compared to an if statement.

Getting this in place, however, would be rather tedious because it involves inclusion of a thread library such as the one mentioned above from boost or the one likely coming in future std. How to do that and do it in an optional way (that is the builder can choose to ignore all this) is not obvious to me.

ticket:714 related (compile problems on Mac OS X Lion)

Change History (26)

comment:1 Changed 10 years ago by Peter

IIUC the next std will contain some sort of thread support so perhaps implement it with that.

comment:2 Changed 10 years ago by Jari Häkkinen

Are the underlying GSL functions thread safe when a new random number is generated? If not, then making RNG thread safe is only a first step for thread safe random number generation in yat. If yes, then solving this ticket is the last step for thread safe random numbers in yat. However, thread safe random numbers probably costs a lot for heavy users of random numbers. Maybe we should allow for several RNG objects in yat? Currently RNG is a singleton.

comment:3 in reply to:  2 ; Changed 10 years ago by Peter

Replying to jari:

Are the underlying GSL functions thread safe when a new random number is generated? If not, then making RNG thread safe is only a first step for thread safe random number generation in yat. If yes, then solving this ticket is the last step for thread safe random numbers in yat.

I'm not following here. I thought the guard showed in the description would be sufficient.

However, thread safe random numbers probably costs a lot for heavy users of random numbers. Maybe we should allow for several RNG objects in yat? Currently RNG is a singleton.

I cannot see how this would be expensive. The code change only affects the constructor of RNG and as RNG is a singleton this the code is only reached once per execution. Well, potentially the code inside

if (!instance_) {
...
}

can be reached once per thread but only if it happens simultaneously. Then after that when the RNG has been constructed, the code will look exactly as before i.e.

if (!instance_) -> false, so return instance_

comment:4 in reply to:  3 Changed 10 years ago by Peter

Replying to peter:

Replying to jari:

Are the underlying GSL functions thread safe when a new random number is generated? If not, then making RNG thread safe is only a first step for thread safe random number generation in yat. If yes, then solving this ticket is the last step for thread safe random numbers in yat.

I'm not following here. I thought the guard showed in the description would be sufficient.

After reading GSL documentation

http://www.gnu.org/software/gsl/manual/gsl-ref.html#Random-Number-Generation

I would say GSL is not thread-safe and I also start to understand what you are talking about. And yes that makes this ticket rather useless without a deeper and broader discussion on RNG and multi-threads.

comment:5 Changed 10 years ago by Peter

Haven't had a detailed look yet but this paper looks interesting: http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/prng.pdf

comment:6 Changed 9 years ago by Peter

Boost provides a thread_specific_ptr that IUUC when used as a global variable will have its own value for each thread. Perhaps could be useful when redesigning/creating a new RNG class for multi-threading.

comment:7 Changed 9 years ago by Peter

Milestone: yat 0.x+yat 0.9
Type: discussionrequest

I've tested the boost::thread_specific_ptr and we could simply replace

static RNG* instance_;

with

static boost::thread_specific_ptr<RNG> instance_;

Doing that will change the behavior and each thread will get their own RNG. And, in fact, if they're not reseeded they will generate the same series of random number. This is perhaps anted sometimes and not sometimes, but I think we leave to the user to reseed. Using boost::thread_specific_ptr will add a link/runtime dependency to libboost_thread-mt. This should be checked for in configure, provided in yat-config, and linked into libyat.la.

Another design would be to have a true singleton so there is only one long series of random numbers. But that implies making that RNG thread safe, i.e., locking it every time a number is generated and that seems very expensive for applications that use RNG heavily, and more importantly it doesn't scale but the more threads an application uses the more crowded it will get outside the RNG office.

Should perhaps mention that doing this research I've bumped into some very strong opinions essentially saying that Singletons are worse than global variables while other say that it very much depends on the specific case and that the pattern could be justified in certain cases.

Also, I should say that there was a submission of a Singleton Library to Boost, which was rejected after a +4/-4 vote, and the author wasn't willing to follow up on the feedback.

Any thoughts?

comment:8 in reply to:  7 ; Changed 9 years ago by Jari Häkkinen

Replying to peter:

Doing that will change the behavior and each thread will get their own RNG. And, in fact, if they're not reseeded they will generate the same series of random number. This is perhaps anted sometimes and not sometimes, but I think we leave to the user to reseed.

How do you imagine the interface for reseeding? What should the default behaviour be? I'd say random seeds drawn from a previous RNG ... because I think having the same default for all RNGs created will probably cause headache for users. Also, normally you do not want to reseed you RNG once you started to use it, so how does a user know that a reseed is needed. How is a new RNG created when it is needed (for a new thread)? By cloning or a completely new RNG. I like cloning because then the seed automatically becomes "random". I try to argue for that a RNG object should know when a new RNG is created for another thread.

Another design would be to have a true singleton so there is only one long series of random numbers. But that implies making that RNG thread safe, i.e., locking it every time a number is generated and that seems very expensive for applications that use RNG heavily, and more importantly it doesn't scale but the more threads an application uses the more crowded it will get outside the RNG office.

We should not take this patth. The CPU cost of having a single RNG for all threads is the reason why we haven't spent time on resolving this issue. How important is it with a single RNG for all threads? We could thread safety as you suggest and open another ticket for a inherent thread safe RNG ... or forget about it until we actually need a single multithread RNG

Should perhaps mention that doing this research I've bumped into some very strong opinions essentially saying that Singletons are worse than global variables while other say that it very much depends on the specific case and that the pattern could be justified in certain cases.

I feel that random numbers is a use case for singletons.

Any thoughts?

I think the solution you suggest is the way to go.

comment:9 in reply to:  8 ; Changed 9 years ago by Peter

Status: newassigned

Replying to jari:

Replying to peter:

Doing that will change the behavior and each thread will get their own RNG. And, in fact, if they're not reseeded they will generate the same series of random number. This is perhaps anted sometimes and not sometimes, but I think we leave to the user to reseed.

How do you imagine the interface for reseeding?

It will be as before via functions seed(), seed_from_dev_urandom(), and set_state().

What should the default behaviour be? I'd say random seeds drawn from a previous RNG ... because I think having the same default for all RNGs created will probably cause headache for users. Also, normally you do not want to reseed you RNG once you started to use it,

OK, all make sense to me. We should try to get first RNG behave as before and secondaries to behave differently, i.e., with a different seed. How deterministic do we wanna be? Using a primary rng to seed secondaries, might cause randomness as it depends on speed of the different threads.

so how does a user know that a reseed is needed. How is a new RNG created when it is needed (for a new thread)? By cloning or a completely new RNG. I like cloning because then the seed automatically becomes "random". I try to argue for that a RNG object should know when a new RNG is created for another thread.

The creation is done in function RNG::instance just as before. What do you mean by cloning? Using the copy ctor which is not implemented in current code??? In the straightforward version there is no access to other threads RNGs. All that is hidden in boost::thread_specific_ptr. It's probably more productive if I check in a first version that we can use as starting point (in our discussion).

comment:10 in reply to:  9 Changed 9 years ago by Jari Häkkinen

Replying to peter:

The point with my comments concerns the state of new RNGs with the future thread safe RNG. With a single monolithic RNG it is straightforward to seed it at program initialization. If you write a threaded program, you of course know that you create new threads and can during thread initialization seed new RNGs as you wish. We could require the above strategy (and maybe this is how it is done in real world programming) but I'd prefer to safeguard against naive use of RNGs.

How deterministic do we wanna be? Using a primary rng to seed secondaries, might cause randomness as it depends on speed of the different threads.

Hm, is this resolved by others? I have too little experience with threaded programs to attack this issue.

The creation is done in function RNG::instance just as before. What do you mean by cloning? Using the copy ctor which is not implemented in current code??? In the straightforward version there is no access to other threads RNGs. All that is hidden in boost::thread_specific_ptr. It's probably more productive if I check in a first version that we can use as starting point (in our discussion).

Sure, currently RNG::instance creates an object if there is none but it does not do anything else. My understanding is that the instance function does not have to be changed with the boost thread safe pointer. My concern is relates to what state the new RNG should be in when delivered to the callee, today all of the new RNG will be reset to default state ... in principle but since only one RNG is created (today) but with thread safe RNG there will be several.

Anyhow, let us continue this discussion when you start committing.

comment:11 Changed 9 years ago by Peter

(In [2767]) look for boost thread library. refs #568

comment:12 Changed 9 years ago by Peter

(In [2768]) First version of redesigned RNG that provides one instance per thread rather than one global instance. refs #568

comment:13 Changed 9 years ago by Peter

There is a problem if user creates an RNG instance in one thread and passes it to another thread, or passes it to several thread, then we are back to the same problem that several threads are accessing the same state. This perhaps seems unlikely but consider that you have a visitor pattern and you create different visitors for your different threads, further imagine that your visitor has a member random::Discrete*, which in turn holds an RNG*, and voila you are in effect passing around an RNG instance. Seems like the root of this problem is that RNG::instance returns something that is thread specific (that can be passed across threads). Perhaps better if RNG remains a global singleton, and holds thread_specific_ptr<gsl_rng> rather than gsl_rng.

comment:14 Changed 9 years ago by Peter

(In [2769]) refs #568. test using multithreaded RNG

comment:15 Changed 9 years ago by Peter

(In [2770]) change back to RNG being a singleton, which now stores gsl_rng* in a thread_specific_ptr<gsl_rng>. refs #568

comment:16 Changed 9 years ago by Peter

I suppose with current design we could just add an int counting how many gsl_rng we allocated and use that count to seed the gsl_rng. How sensitive is the seeding; is it OK with a simple

++count_;
if (count_>1)
  seed(count_)

in RNG::rng(void)? where the if-statement is because we dont wanna seed the first gsl_rng.

comment:17 Changed 9 years ago by Jari Häkkinen

Description: modified (diff)

comment:18 in reply to:  16 ; Changed 9 years ago by Jari Häkkinen

Replying to peter:

How sensitive is the seeding; is it OK with a simple

The question is how much we want to help the caller. With the first RNG we do not set any seed, we simply use the default value. What you suggest is maybe the minimum amount of help for the caller since it avoids using the same seed for all RNGs. And possibly allows for deterministic random numbers. More random number savvy users will seed their RNGs at new thread initialization. In short, let us start with tour simplistic seeding scheme.

comment:19 Changed 9 years ago by Peter

(In [2789]) adding a comment. refs #568

comment:20 in reply to:  18 ; Changed 9 years ago by Peter

Replying to jari:

Replying to peter:

How sensitive is the seeding; is it OK with a simple

The question is how much we want to help the caller. With the first RNG we do not set any seed, we simply use the default value. What you suggest is maybe the minimum amount of help for the caller since it avoids using the same seed for all RNGs. And possibly allows for deterministic random numbers. More random number savvy users will seed their RNGs at new thread initialization. In short, let us start with tour simplistic seeding scheme.

Looking at the code, I realized my suggestion is not very good. A typical use case must be that you set the seed in the main thread and use the generate random number in daughter threads. With my suggestion you would get the same series of random number regardless of how you seed (in main thread). I would like to see the seed propagate to all threads without making the identical.

comment:21 in reply to:  20 ; Changed 9 years ago by Peter

Replying to peter:

shall we just record the latest seed used and then seed new gsl_rng* with seed_+count_?

comment:22 in reply to:  20 ; Changed 9 years ago by Jari Häkkinen

Replying to peter:

Looking at the code, I realized my suggestion is not very good. A typical use case must be that you set the seed in the main thread and use the generate random number in daughter threads. With my suggestion you would get the same series of random number regardless of how you seed (in main thread). I would like to see the seed propagate to all threads without making the identical.

I suppose you can reseed each thread RNG independently? We are discussing the default behaviour?

comment:23 in reply to:  21 Changed 9 years ago by Jari Häkkinen

Replying to peter:

Replying to peter:

shall we just record the latest seed used and then seed new gsl_rng* with seed_+count_?

This is better than the previous suggestion.

comment:24 in reply to:  22 Changed 9 years ago by Peter

Replying to jari:

I suppose you can reseed each thread RNG independently? We are discussing the default behaviour?

Allowing users to seed each thread independantly is definitely a design criterion.

comment:25 Changed 9 years ago by Peter

(In [2800]) adding variables seed_ and count_ to make random numbers different in different threads. refs #568

comment:26 Changed 9 years ago by Peter

Resolution: fixed
Status: assignedclosed

(In [2802]) adding documentation and tests. Remove private member count_ (not needed). closes #568

Note: See TracTickets for help on using tickets.