Opened 9 years ago

Closed 8 years ago

#800 closed specification (fixed)


Reported by: Peter Owned by: Peter
Priority: major Milestone: yat 0.13
Component: utility Version: trunk
Keywords: Cc:


It would be nice to have a class that can take care of and handle tasks when running on multiple threads. It should have a user-definable number of workers (one per thread). It should be possible to submit jobs to the scheduler and it will take care of passing it to appropriate worker. It should be possible to define dependencies, so that task A is not started until task B and C are completed. A bit like a Makefile, but dynamic in the sense that jobs can create sub-jobs, which are passed to the scheduler and completed when a worker is available for work.

Change History (12)

comment:1 Changed 9 years ago by Peter

Milestone: yat 0.13

If we allow jobs to create sub-jobs, would that imply that the super-job (creating sub-jobs) is not considered completed until its sub-jobs are completed? My gut feeling is that yes that is more intuitive, but also more complicated to implement. While waiting for sub-jobs one wants the super-job's thread utilized as much as possible, so if sub-jobs are available run them in super-job thread. When all sub-jobs' are either finished or already running, one could wait but it's important to be careful to avoid a total freeze when all threads are waiting. How does GNU make work with a rules like this:

   cd foo && make all -jN

comment:2 Changed 8 years ago by Peter

Status: newassigned

comment:3 Changed 8 years ago by Peter

(In [3346]) first version of Scheduler class. refs #800.

comment:4 Changed 8 years ago by Peter

Note that in the current implementation sub-jobs created jobs created within a job are submitted to the queue first when the mother job has finished. See comment above for why this is not optimal.

comment:5 Changed 8 years ago by Peter

I'm a bit undecided when it comes to recursively creating Jobs. Painting with broad brushes I see three designs:

1) No recursive job creation. 2) Job creation as is, i.e., a Job has an interface to submit jobs but there is a lag and these jobs are not propagated to the Scheduler until *this Job has completed. 3) Try to mimick GNU Make which within a Job can call another Makefile with e.g.: cd subdir && make...

I'm a bit foggy about the different pro and cons for the different options, so will try to list them here to [hopefully] accomplish some clarity.

First of all, an obvious pro with 1) is that it's minimalistic and allows to be extended without breaking any API. That goes without saying. Second, I realize while writing this that if going with 1), the user can still have jobs creating jobs by letting the Job have access to the Scheduler by holding e.g. holding a pointer to it. That kills option to as I can see it because that means that a Job can submit jobs to the Scheduler and one can choose whether that Jobs is runnable immediately or first when this Job is completed. In other words, that solution is more flexible than 2) in which Jobs are runnable first when this Job has completed. Also by scrapping 2) we get rid of some hard-wired API such as Job's protected function submit which smells this class does too many things.

So thinking loud here, it means it stands between 1) and 3). A very attractive alternative is to start with 1) and extend towards 3) in the future. The attractive thing with 3) is the use case when a Job the result of its sub-jobs to calculate its result. Say for example one want to implement a multithreaded version of function

double brown(int n, int x)
  if (n)
    return 0.5*(brown(n-1, x-1) + brown(n-1, x+1));

  if (x>0)
    return 1;
  if (x<0)
    return 0;
  return 0.5

This is just a stupid example that could benefit from some improvements. My point, however, is that when translating this to MT would be to implement each function call as a Job, which is not possible with neither 1) nor 2) because there is no way to say: OK hold here until Job x and y have finished. And implementing that naively creates first of all a hold on a thread, i.e., the Job is sleeping and we are not using the CPU as the user would hope. Second, putting a Job on hold implies the risk that all running Jobs are put on hold and that they are all waiting for some jobs in the queue to finish. That is worse than a bottleneck it's a stopper. So those two problems have to be solved when designing 3). Normally one has one worker per thread so if one has 8 threads there are 8 workers. If a worker can be held waiting for subjobs, one solution would be to dynamically increase number of workers, just temporarily until the worker wakes up and use the CPU again. Another alternative would be to, when a Job is waiting for subjobs to finish, let the worker work on jobs in the queue.

So in summary, I think I'll change the API to a more minimalist a la 1) and then we can discuss how to accomplish support for 3). But of course input are very welcome [as usual].

comment:6 Changed 8 years ago by Peter

(In [3347]) remove protected function Scheduler::Job::submit. refs #800.

comment:7 Changed 8 years ago by Peter

(In [3348]) Replace Scheduler::launch() with a function Scheduler::wait(). Before Jobs did not run until launch() was called, now the Scheduler tries to run them as soon as they are submitted.

refs #800

comment:8 Changed 8 years ago by Peter

(In [3349]) avoid having a std::set just to count number of jobs. refs #800

comment:9 Changed 8 years ago by Peter

(In [3401]) refs #800

Avoid using parents and children terminologoy to describe Job relationships as it was confusing. Move add_dependency function from class Scheduler::Job to class Scheduler. Likewise move observers to class Job rather than holding an extra map<Job, observers> in Scheduler.

comment:10 Changed 8 years ago by Peter

(In [3402]) add interface to set priority of Scheduler::Job and make the order the tasks are executed closer to FIFO. refs #800

comment:11 Changed 8 years ago by Peter

(In [3406]) docs. refs #800

comment:12 Changed 8 years ago by Peter

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.