Launcher

On this page we describe the different launchers for experiments. It is possible to use the local machine or batch schedulers like Slurm or Oracle Grid Engine (formerly SGE) to run experiments. The launchers can be selected by adding further arguments to the simex experiments launch command or by setting up a launchers.yml file. The structure of this file will also be explained on this page.

In the following sections, we consider the sorting expample from the Quick Start guide.

Fork Launcher

Simexpal uses the fork launcher if no further configurations are made. The fork launcher launches experiments on the local machine. In contrast to the Queue Launcher, the fork launcher launches the experiments immediately in the current terminal and blocks until completion of the experiments.

Launching Experiments

To launch experiments through the fork launcher we can simply use the simex experiments launch command (if no launcher configurations were made), set up a launchers.yml file or add the --launch-through=fork argument to the simex experiments launch command:

$ simex experiments launch   # no launcher configurations were made beforehand

and

$ simex experiments launch --launch-through=fork

will output

Launching run bubble-sort/uniform-n1000-s1[0] on local machine
Launching run bubble-sort/uniform-n1000-s2[0] on local machine
Launching run bubble-sort/uniform-n1000-s3[0] on local machine
Launching run insertion-sort/uniform-n1000-s1[0] on local machine
Launching run insertion-sort/uniform-n1000-s2[0] on local machine
Launching run insertion-sort/uniform-n1000-s3[0] on local machine

Queue Launcher

The queue launcher is mainly useful in scenarios where no sophisticated batch scheduler (e.g. Slurm) is available, especially when launching experiments on a remote machine over ssh. In this case it does not require an active connection. As the name suggests, the queue launcher puts the jobs in a queue and executes them sequentially. It is possible to start a queue launcher as a daemon or interactively in a terminal. You can launch experiments and monitor/stop/kill the launcher.

Starting the Launcher

As Daemon

To start the queue launcher as a daemon you need to have the systemd service manager installed. Then, use simex queue daemon to start the launcher:

$ simex queue daemon
Running as unit run-rdc0128bd4cd343e393681898359e5c69.service. # systemd output

Terminal

You can start the queue launcher in a terminal so that it is possible to directly see internal states of the launcher. You will then work from another terminal window in order to send commands to the launcher. To start the launcher, use simex queue interactive:

$ simex queue interactive
Serving on /home/username/.extl.sock  # internal message of the queue launcher

Troubleshooting

If the daemon was not closed properly via the stop or kill command, a UNIX socket remains on the file system. In this case you need to add the --force argument to the simex queue interactive or simex queue daemon command to start the launcher. Alternatively, you can delete the socket manually. The default path for the socket is ~/.extl.sock.

Launching Experiments

To launch experiments through the queue launcher we need to set up a launchers.yml file or add the --launch-through=queue argument to the simex experiments launch command:

$ simex experiments launch --launch-through=queue
Submitting run bubble-sort/uniform-n1000-s1[0] on to local queue launcher
Submitting run bubble-sort/uniform-n1000-s2[0] on to local queue launcher
Submitting run bubble-sort/uniform-n1000-s3[0] on to local queue launcher
Submitting run insertion-sort/uniform-n1000-s1[0] to local queue launcher
Submitting run insertion-sort/uniform-n1000-s2[0] to local queue launcher
Submitting run insertion-sort/uniform-n1000-s3[0] to local queue launcher

Show the Status

It is possible to query the launcher for the current run, pending runs and the number of completed runs. To do that, you can use simex queue show:

$ simex queue show
Currently running:   bubble-sort/uniform-n1000-s3[0]
Pending runs:        insertion-sort/uniform-n1000-s1[0]
                     insertion-sort/uniform-n1000-s2[0]
                     insertion-sort/uniform-n1000-s3[0]
Completed runs:      2

Terminate

The launcher can be terminated by using the stop or kill command. The differences of those commands are stated below.

  • simex queue stop: terminate the launcher after finishing all pending jobs

  • simex queue kill: terminate the launcher immediately

Slurm Launcher

Slurm is a cluster management and job scheduling system, which simexpal can use to submit experiments.

Launching Experiments

To launch experiments through the Slurm launcher we need to set up a launchers.yml file. Alternatively, we can add the --launch-through=slurm and --queue=<queue> (where <queue> is the name of the partition to use) argument to the simex experiments launch command. If --queue is omitted, the default slurm partition will be used:

$ simex e launch --launch-through=slurm
Submitting run bubble-sort/uniform-n1000-s1[0] to default slurm partition
Submitting run bubble-sort/uniform-n1000-s2[0] to default slurm partition
Submitting run bubble-sort/uniform-n1000-s3[0] to default slurm partition
Submitted batch job 287454

Submitting run insertion-sort/uniform-n1000-s1[0] to default slurm partition
Submitting run insertion-sort/uniform-n1000-s2[0] to default slurm partition
Submitting run insertion-sort/uniform-n1000-s3[0] to default slurm partition
Submitted batch job 287455

Show the Status

The status of experiments can be listed via the simex experiments (or short: simex e) command. When encountering experiments that are currently submitted to or started by Slurm, simexpal will additionally query Slurm using its squeue command to verify the status. If the squeue command outputs an unexpected status, simexpal will return the broken status.

$ simex e
Experiment                              Instance                            Status
----------                              --------                            ------
bubble-sort                             uniform-n1000-s1                    [0] started
bubble-sort                             uniform-n1000-s2                    [0] started
bubble-sort                             uniform-n1000-s3                    [0] started
insertion-sort                          uniform-n1000-s1                    [0] submitted
insertion-sort                          uniform-n1000-s2                    [0] submitted
insertion-sort                          uniform-n1000-s3                    [0] submitted

Terminate

It is possible to terminate experiments that are already submitted to or started by Slurm. To do so, we can use simex experiments kill (internally scancel will be used). To confirm the termination of experiments, we need to add the -f argument to the command.

$ simex experiments kill -f
Killing run bubble-sort/uniform-n1000-s1[0] with Slurm jobid 287454_0
Killing run bubble-sort/uniform-n1000-s2[0] with Slurm jobid 287454_1
Killing run bubble-sort/uniform-n1000-s3[0] with Slurm jobid 287454_2
Killing run insertion-sort/uniform-n1000-s1[0] with Slurm jobid 287455_0
Killing run insertion-sort/uniform-n1000-s2[0] with Slurm jobid 287455_1
Killing run insertion-sort/uniform-n1000-s3[0] with Slurm jobid 287455_2

Troubleshooting

When encountering unexpected statuses, we can use simex experiments print to check the experiments error output and manually check the error/output files of Slurm located in aux/_slurm/. Each Slurm job has a respective <job_id>-<array_id>.err and <job_id>-<array_id>.out file. If the job is not part of a job array -<array_id> is omitted in the name.

“launchers.yml” File

The launchers.yml file contains a list of launchers. By setting up a launchers.yml file we can omit additional arguments in the simex experiments launch command or select launchers, which are defined in it. In the following sections, we will see how to setup and use our launchers.yml. First, we need to create the launchers.yml file in the ~/.simexpal/ folder:

$ mkdir ~/.simexpal     # Create the ~/.simexpal/ folder if it does not exist already
$ cd ~/.simexpal        # Navigate into ~/.simexpal/
$ touch launchers.yml   # Create an empty launchers.yml file

Launchers

To specify launchers in the launchers.yml file we need to set the

  • launchers: list of dictionaries containing launchers

key. Each dictionary contains the

  • name: name of the launcher

  • default: boolean (true/false) - whether this is the default launcher or not

  • scheduler: type of the launcher

keys.

Fork Launcher

To define a fork launcher we need to add a list entry to the launchers key, which contains a dictionary with the

  • name: name of the launcher

  • default: boolean (true/false) - whether this is the default launcher or not

  • scheduler: type of the launcher

key.

How to specify a fork launcher in the launchers.yml file.
1 launchers:
2     - name: local-fork
3       default: true
4       scheduler: fork

In this way we created a fork launcher with the name local-fork. We also set it to be the default launcher.

Queue Launcher

To define a queue launcher we need to set scheduler: queue:

How to specify a queue launcher in the launchers.yml file.
1 launchers:
2     - name: local-queue
3       default: true
4       scheduler: queue

In this way we created a queue launcher with the name local-queue. We also set it to be the default launcher.

Slurm Launcher

To define a Slurm launcher we need to set scheduler: slurm. Also, we can choose the partition the experiments are run on by setting the

  • queue: name of the partition to use

key. If queue is not specified, Slurm will use the default partition.

How to specify a Slurm launcher in the launchers.yml file.
1 launchers:
2     - name: local-cluster
3       default: true
4       scheduler: slurm
5       queue: fat-nodes

In this way we created a Slurm launcher with the name local-cluster. We also set it to be the default launcher and use the partition fat-nodes.

Command Line Interface

Default Launcher

When setting default: true for a launcher, simex experiments launch will run experiments with this launcher.

Warning

There can only be one launcher with default: true. Having multiple launchers with default: true will lead to a RuntimeError.

Selecting the Launcher

When launching experiments using simex experiments launch, you can specify the --launcher option to select a certain launcher defined in the launchers.yml file. For example:

Assume you have a launchers.yml file set up as in the Queue Launcher section, then simex experiments launch --launcher local-queue will select the launcher named local-queue from the launchers.yml file to run experiments.