Lunarc Info
Submitting jobs
There are several ways of running jobs on Lunarc, the following has information that is relevant for the cluster named milleotto (milleotto.lunarc.lu.se). See more info at http://www.lunarc.lu.se.
Lunarc uses Torque, which is the same qeueing system as on Selma, so everything that works on Selma should work at Lunarc.
The system also supports submitting a batch of unique jobs using mpiexec
.
Each job has access to three different directories:
/home/username/
-- this is backed up, but only 700 MB./disk/global/username/
-- no backup./disk/local/
-- the discs belonging to the individual nodes. On milleotto you don't seem to have access to this directory, your job can work here but you can't see what it's doing until it's finished.
Jobs that use the disc a lot should be run on /disk/local/
,
jobs that don't use the disc much can be run on /disk/global/
instead.
You shouldn't run jobs directly from /home/
. The network has been unstable at times so it is
not a bad idea to use the local discs, these jobs won't crash as easily if the network
goes down.
The following scripts can be used to do the above.
- source:trunk/lunarc/globaljob.pbs -- a script for running jobs from
/disk/global/
. - source:trunk/lunarc/localjob.pbs -- a script for running jobs from
/disk/local/
. - source:trunk/lunarc/localjob2.pbs -- another script for running jobs from
/disk/local/
. The difference from localjob.pbs is that a separate script, source:trunk/lunarc/runlocal.sh is called within the pbs-script. This script is where everything happens. This separation allows for more complicated instructions, in this example, however, nothing fancy happens.
Job Status
On top of Torque, there is an additional layer of administration at Lunarc called Maui. This allows for more information about your jobs, here are the two most useful commands:
showq
-- shows the whole qeue and gives you an idea of when your job will start, finish, etc.checkjob
-- gives you detailed information about a job: elapsed time, remaining time, which nodes are used, etc. Pass the job's id-number as argument.
Docenten specifics
If you are running a job from /disk/local/
on Docenten (the older cluster) and want to have a look at your files use rsh
(instead of ssh) to access the individual nodes, you can find out which nodes are used with checkjob
(see above). If you want to copy the files to /disk/global/
before the job is finished use rcp
. The script source:trunk/lunarc/cpjob.sh takes a jobid as argument and copies data from all nodes used by the job to the directory from where it is called.
mpiexec
The command mpiexec
used in the scripts above requires that your program either uses an MPI-package or "manually" handles the arguments passed by mpiexec
. The script source:trunk/lunarc/runlocal.sh is an example of the latter and the code in source:trunk/lunarc/mpich_main.cc shows what it might look like using the C++ MPI-package.
A program like mpich_main.cc should be compiled with mpiCC
, which can only be used after it has been activated by the lines
> . use_modules > module load mpich-gcc3
If your program doesn't take any arguments you can of course just ignore the whole thing. If the different runs in a batch differ in a complex way you can either start them individually as separate jobs or use a solution like localjob2.pbs.
Compiling
If you want to use a new version of gcc (at the time of writing 4.1.2 instead of 3.4.6) the following commands are required
> . use_modules > module load gcc