Quick Start
The following will introduce you to the most fundamental mechanics of simexpal. The quick start guide will explain simexpal installation, as well as the features: instance and input parameter management, launch experiments, monitor experiments, and evaluate experiments using the output data. Thereafter, we will have a closer look at the different phases of experiments set up using simexpal including running experiments, evaluating results, managing instances, parameterize algorithms, builds and revisions, the run matrix, and launchers and support for batch schedulers.
Installation
simexpal requires Python 3.
Either you install it via pip3 using:
$ pip3 install simexpal
Or you install it using the latest commit of our default branch from GitHub. Of course, you can also choose to install a specific version (using our tags) or branch.
$ git clone https://github.com/hu-macsy/simexpal.git
$ cd simexpal
$ pip3 install -e .
Minimal Example and Fundamentals
The simexpal repository contains a small example which can be used to quickly get to know the tool. For this section, we compare different sorting algorithms using multiple instances as input.
Install simexpal as detailed above.
Clone the simexpal repository and navigate to the
examples/sorting/
directory:$ git clone https://github.com/hu-macsy/simexpal.git $ cd simexpal/examples/sorting/
This directory contains an
experiments.yml
file which describes the configuration of all instances and experiments. In this case each instance will be generated using thegenerate.py
script. Each experiment callsmy_sort.py
implementing the experiment given the parameters.eval.py
is a script that uses simexpal’s Python interface to evaluate the experiment results.Generate some instances for the benchmark. First, we list all instances:
# List the instances declared in experiments.yml. # Note that missing instances will appear in red. $ simex instances
Output:
Instance short name Instance sets ------------------- ------------- uniform-n1000-s1 uniform-n1000-s2 uniform-n1000-s3
Then, install all instances:
# Generate missing instance files. $ simex instances install
Output:
Generating instance 'uniform-n1000-s1' Generating instance 'uniform-n1000-s2' Generating instance 'uniform-n1000-s3'
In the
experiments.yml
the instances are declared as part of theinstances
stanza. This stanza also declares how to invoke the generator scriptgenerate.py
.Launch the algorithms on all instances:
List experiment configurations from experiments.yml.
$ simex e
Output:
Experiment Instance Status ---------- -------- ------ bubble-sort uniform-n1000-s1 [0] bubble-sort uniform-n1000-s2 [0] bubble-sort uniform-n1000-s3 [0] insertion-sort uniform-n1000-s1 [0] insertion-sort uniform-n1000-s2 [0] insertion-sort uniform-n1000-s3 [0] 6 experiments in total
Launch all experiments using process forks.
$ simex e launch --launch-through=fork
Output:
Launching run bubble-sort/uniform-n1000-s1[0] on local machine Launching run bubble-sort/uniform-n1000-s2[0] on local machine Launching run bubble-sort/uniform-n1000-s3[0] on local machine Launching run insertion-sort/uniform-n1000-s1[0] on local machine Launching run insertion-sort/uniform-n1000-s2[0] on local machine Launching run insertion-sort/uniform-n1000-s3[0] on local machine
View the status of the experiments.
$ simex e list
Output:
Experiment Instance Status ---------- -------- ------ bubble-sort uniform-n1000-s1 [0] finished bubble-sort uniform-n1000-s2 [0] finished bubble-sort uniform-n1000-s3 [0] finished insertion-sort uniform-n1000-s1 [0] finished insertion-sort uniform-n1000-s2 [0] finished insertion-sort uniform-n1000-s3 [0] finished 6 experiments in total
Evaluate the results:
To evaluate the experiment results, we call the
eval.py
script which uses the pandas package to aggregate the results. Please make sure that the Python package pandas is installed on your machine, or install it viapip3 install pandas
. The script also uses the simexpal Python interface (i.e., the functionscollect_successful_results()
andopen_output_file()
) to gather all results.$ python3 eval.py
Output:
comparisons swaps time experiment bubble-sort 499500.0 253437.333333 0.053750 insertion-sort 241891.0 257609.000000 0.027219
Tip
Simexpal supports autocomplete via argcomplete.
To enable autocomplete, install argcomplete
and enable global completion:
$ pip3 install argcomplete
$ activate-global-python-argcomplete
Running Experiments
Let us now take a closer look at running experiments. As an example, we compare the sorting algorithms insertion sort and bubble sort using a set of instances. Please find the resources in our example folder.
If you look at the sorting directory in our examples, you will find my_sort.py including an implementation of the two algorithms insertion sort and bubble sort.
my_sort.py
expects two arguments:
The algorithm name (i.e. insertion-sort or bubble-sort).
The path to the instance.
In the above example, we generated a bunch of instances which were written to
the instances folder at root level where the experiments.yml
file is
present.
For this example, we generate a new instance using the generate.py
script.
First, we navigate to the sorting
folder:
$ cd examples/sorting
Thereafter, we generate a new instance file which contains 500 randomly generated integers in list form with one integer per line:
$ python3 generate.py -o ./instances/random-500 500
Now we can run insertion-sort on the newly generate instance:
python3 my_sort.py --algo=insertion-sort ./instances/random-500
Output:
...
- 9949624
- 9984385
- 9984385
swaps: 65114
time: 0.006702899932861328
Let us continue with configuring simexpal to automatize the experimental pipeline.
Consider the experiments.yml
file
in the root directory. It represents the simexpal configuration read in to run
the experiments on the desired instances. The file is structured as below:
1instances:
2 - generator:
3 args: ['./generate.py', '--seed=1', '1000']
4 items:
5 - uniform-n1000-s1
6 - generator:
7 args: ['./generate.py', '--seed=2', '1000']
8 items:
9 - uniform-n1000-s2
10 - generator:
11 args: ['./generate.py', '--seed=3', '1000']
12 items:
13 - uniform-n1000-s3
14
15experiments:
16 - name: insertion-sort
17 args: ['./my_sort.py', '--algo=insertion-sort', '@INSTANCE@']
18 stdout: out
19 - name: bubble-sort
20 args: ['./my_sort.py', '--algo=bubble-sort', '@INSTANCE@']
21 stdout: out
The structure of the configuration file will get more attention later on. At
this point, our sorting
directory looks like this:
sorting
├── my_sort.py
├── experiments.yml
└── instances
└── random_500.list
To add the instance to be used in our experiments, we must add a local file with it’s name random-500 (no extension) to our instances stanza:
instances:
- generator:
args: ['./generate.py', '--seed=1', '1000']
items:
- uniform-n1000-s1
- generator:
args: ['./generate.py', '--seed=2', '1000']
items:
- uniform-n1000-s2
- generator:
args: ['./generate.py', '--seed=3', '1000']
items:
- uniform-n1000-s3
- repo: local
items:
- random-500
experiments:
- name: insertion-sort
args: ['./my_sort.py', '--algo=insertion-sort', '@INSTANCE@']
stdout: out
- name: bubble-sort
args: ['./my_sort.py', '--algo=bubble-sort', '@INSTANCE@']
stdout: out
After having completed this step, we can start using simexpal to run our
experiments including the new instance. The @-Variable @INSTANCE@
will be
use for all instances. A complete list of experiments and their status can be
seen by:
$ simex e list
The color of each line represents the status of the experiment:
green 🮆 represents finished
yellow 🮆 represents running
red 🮆 represents failed
and the default color represents not executed
Experiments can be launched with:
$ simex e launch --launch-through=fork
This instruction will launch the non-executed experiments on the local machine. After all experiments have ben run, all experiment entries should be finished:
$ simex e list
Output:
Experiment Instance Status
---------- -------- ------
bubble-sort random-500 [0] finished
bubble-sort uniform-n1000-s1 [0] finished
bubble-sort uniform-n1000-s2 [0] finished
bubble-sort uniform-n1000-s3 [0] finished
insertion-sort random-500 [0] finished
insertion-sort uniform-n1000-s1 [0] finished
insertion-sort uniform-n1000-s2 [0] finished
insertion-sort uniform-n1000-s3 [0] finished
8 experiments in total
Evaluating Results
After the experiments have been run, simexpal can assist with locating and collecting output data. For this purpose, simexpal can be imported as a Python package. As simexpal is output format and algorithm agnostic, you need to provide functionality to parse output files and evaluate results. Parsing output files can usually be greatly simplified by using standardized formats and appropriate libraries.
The example below (i.e., eval.py
from examples/sorting/
)
demonstrates this concept. It uses the simexpal Python package
to obtain all output files and meta data about them. In particular,
it uses the functions collect_successful_results()
and
open_output_file()
for this purpose.
A user-supplied parsing function is employed to parse the output files.
1def parse(run, f):
2 output = yaml.load(f, Loader=yaml.Loader)
3 return {
4 'experiment': run.experiment.name,
5 'instance': run.instance.shortname,
6 'comparisons': output['comparisons'],
7 'swaps': output['swaps'],
8 'time': output['time']
9 }
10
11cfg = simexpal.config_for_dir()
12results = []
13for successful_run in cfg.collect_successful_results():
14 with successful_run.open_output_file() as f:
15 results.append(parse(successful_run, f))
16
17df = pandas.DataFrame(results)
18print(df.groupby('experiment')[['comparisons', 'swaps', 'time']].agg('mean'))
Run this Python script to evaluate the experiments:
$ python3 eval.py
Output:
comparisons swaps time
experiment
bubble-sort 405812.5 205647.25 0.043447
insertion-sort 196834.5 208978.00 0.021463
Managing Instances
Before launching the experiments, we need to make sure that all our instances are available. Instances can be checked with:
$ simex instances list
An example output:
Instance short name Instance sets
------------------- -------------
random-500
uniform-n1000-s1
uniform-n1000-s2
uniform-n1000-s3
Unavailable instances will be shown in red, available instances will be shown in green.
If instances are taken from a public repository, they can be downloaded
automatically. We configured the YAML file
under examples/download_instances
below to use instances from SNAP.
1instances:
2 - repo: snap
3 items:
4 - 'facebook_combined'
5 - 'cit-HepTh'
6
7instdir: "./graphs"
Please navigate to the subfolder examples/download_instances
and list all
instances:
$ simex instances list
Output:
Instance short name Instance sets
------------------- -------------
cit-HepTh
facebook_combined
The listed instances are shown as not available. To download them, use the following command:
$ simex instances install
Output:
Downloading instance 'cit-HepTh' from snap repository
[==================================================]100% (1.26MB/1.26MB)
Downloading instance 'facebook_combined' from snap repository
[==================================================]100% (0.21MB/0.21MB)
Now when repeating the simex instances list
command, you’ll see that all
instances are available.
Parameterize Algorithms
When benchmarking algorithms, it is often useful to compare different variants
or parameter configurations. simexpal can manage those variants without
requiring you to duplicate the experiments
stanza multiple times.
As an example, imagine that you want to benchmark the running time of a merge
sort algorithm using different minimum block sizes and sorting algorithms for
these blocks. The my_sort.py
script under examples/sorting
provides an
implementation of merge sort.
The following extends the original experiments.yml
file with additional
variants for merge sort after we defined our instances:
experiments:
- name: 'merge-sort'
stdout: out
args: ['python3', 'my_sort.py', '--algo=merge-sort', '@EXTRA_ARGS@', '@INSTANCE@']
variants:
- axis: 'block-size'
items:
- name: 'bs2'
extra_args: ['--block_size=2']
- name: 'bs20'
extra_args: ['--block_size=20']
- name: 'bs200'
extra_args: ['--block_size=200']
- axis: 'block-algo'
items:
- name: 'bba-insertion'
extra_args: ['--block_algorithm=insertion-sort']
- name: 'bba-selection'
extra_args: ['--block_algorithm=bubble-sort']
matrix:
include:
- experiments: [insertion-sort, bubble-sort]
axes: []
- experiments: [merge-sort]
axes: [block-size, block-algo]
First, we define the new experiment merge-sort
adding the @-Variable
@EXTRA_ARGS@
. This will allow simexpal to insert additional program
arguments. Second, we define variants
as possible parameter groups for our
experiment. Finally, we must add groups of experiments where for
insertion-sort
and bubble-sort
the axes remain empty - as we do not want
to parameterize these experiments. The experiment group with merge-sort
will
be using the variant axes block-size
and block-algo
.
After successfully running all (remaining) experiments, listing the experiments with:
simex e list --full
prints:
Experiment started finished failures other
---------- ------- -------- -------- -----
bubble-sort 4/4
insertion-sort 4/4
merge-sort ~ bba-insertion, bs2 4/4
merge-sort ~ bba-insertion, bs20 4/4
merge-sort ~ bba-insertion, bs200 4/4
merge-sort ~ bba-selection, bs2 4/4
merge-sort ~ bba-selection, bs20 4/4
merge-sort ~ bba-selection, bs200 4/4
32 experiments in total
Builds and Revisions
To make sure that experiments always run using the same program binaries, simexpal can manage internal projects as well as external Git repositories, automatizing the build process.
Automated builds are controlled by the builds
and revisions
stanzas
in the experiments.yml
.
For the remainder of this section, we will will use the C++ implementation of the sorting example. We will use simexpal to resolve the dependency and to configure and compile the C++ project.
simexpal will invoke CMake commands to build the program; these steps are merely a list of shell input strings, thus you may use any build environment..
To enable automated builds, we need to add builds
and revisions
stanzas
to experiments.yml
. For experiments, to use the correct project, we must use
the simexpal
build.
1builds:
2 - name: simexpal
3 git: 'https://github.com/hu-macsy/simexpal'
4 configure:
5 - args:
6 - 'cmake'
7 - '-DCMAKE_INSTALL_PREFIX=@THIS_PREFIX_DIR@'
8 - '@THIS_CLONE_DIR@/examples/sorting_cpp/'
9 compile:
10 - args:
11 - 'make'
12 - '-j@PARALLELISM@'
13 install:
14 - args:
15 - 'make'
16 - 'install'
17
18revisions:
19 - name: main
20 build_version:
21 'simexpal': 'd8d421e3c2eaa32311a6c678b15e9e22ea0d8eac'
22
23instances:
24 - generator:
25 args: ['./generate.py', '--seed=1', '10000']
26 items:
27 - uniform-n1000-s1
28 - generator:
29 args: ['./generate.py', '--seed=2', '10000']
30 items:
31 - uniform-n1000-s2
32 - generator:
33 args: ['./generate.py', '--seed=3', '10000']
34 items:
35 - uniform-n1000-s3
36
37experiments:
38 - name: quick-sort
39 use_builds: [simexpal]
40 args: ['quicksort', '@INSTANCE@', '@EXTRA_ARGS@']
41 stdout: out
42
43variants:
44 - axis: 'block-algo'
45 items:
46 - name: 'ba-insert'
47 extra_args: ['insertion_sort']
48 - name: 'ba-bubble'
49 extra_args: ['bubble_sort']
50 - axis: 'block-size'
51 items:
52 - name: 'bs32'
53 extra_args: ['32']
54 - name: 'bs64'
55 extra_args: ['64']
56
57matrix:
58 include:
59 - experiments: [quick-sort]
60 variants: [ba-insert, bs32]
61 revisions: [main]
62 - experiments: [quick-sort]
63 variants: [ba-bubble]
64 revisions: [main]
After navigating to examples/sorting_cpp
, we must first generate the
instances:
$ simex instances install
Now we build the C++ project:
$ simex builds make
Once the build process is finished, the experiments can be started as usual using:
$ simex e launch --launch-through=fork
Run Matrix
In the Parameterize Algorithms section we saw how we can use simexpal to
specify axes and variants of parameters. For the following example we will take
a look at the variants
stanza of the C++ Sorting Example:
1builds:
2 - name: simexpal
3 git: 'https://github.com/hu-macsy/simexpal'
4 configure:
5 - args:
6 - 'cmake'
7 - '-DCMAKE_INSTALL_PREFIX=@THIS_PREFIX_DIR@'
8 - '@THIS_CLONE_DIR@/examples/sorting_cpp/'
9 compile:
10 - args:
11 - 'make'
12 - '-j@PARALLELISM@'
13 install:
14 - args:
15 - 'make'
16 - 'install'
17
18revisions:
19 - name: main
20 build_version:
21 'simexpal': 'd8d421e3c2eaa32311a6c678b15e9e22ea0d8eac'
22
23instances:
24 - generator:
25 args: ['./generate.py', '--seed=1', '10000']
26 items:
27 - uniform-n1000-s1
28 - generator:
29 args: ['./generate.py', '--seed=2', '10000']
30 items:
31 - uniform-n1000-s2
32 - generator:
33 args: ['./generate.py', '--seed=3', '10000']
34 items:
35 - uniform-n1000-s3
36
37experiments:
38 - name: quick-sort
39 use_builds: [simexpal]
40 args: ['quicksort', '@INSTANCE@', '@EXTRA_ARGS@']
41 stdout: out
42
43variants:
44 - axis: 'block-algo'
45 items:
46 - name: 'ba-insert'
47 extra_args: ['insertion_sort']
48 - name: 'ba-bubble'
49 extra_args: ['bubble_sort']
50 - axis: 'block-size'
51 items:
52 - name: 'bs32'
53 extra_args: ['32']
54 - name: 'bs64'
55 extra_args: ['64']
56
57matrix:
58 include:
59 - experiments: [quick-sort]
60 variants: [ba-insert, bs32]
61 revisions: [main]
62 - experiments: [quick-sort]
63 variants: [ba-bubble]
64 revisions: [main]
simexpal will build every permutation of the experiment, instance, variant and revision sets. However, there are cases where this is not desired. For example, you might only want to run certain instance/variant combinations, not all.
Assume you want to run the quick sort algorithm with insertion sort as base
block algorithm and 32
as minimal block size. Additionally you want to run
quick sort with bubble sort as base block algorithm and use both 32
and
64
as minimal block sizes.
To achieve this, we need to add a matrix
stanza to experiments.yml
. In
our example, this looks like:
matrix:
include:
- experiments: [quick-sort]
variants: [ba-insert, bs32]
revisions: [main]
- experiments: [quick-sort]
variants: [ba-bubble]
revisions: [main]
We could explicitly specify [ba-bubble, bs32, bs64]
for the variants of
quick-sort
. In this case however, it is not necessary as bs32
and
bs64
are all the possible values for the block-size
axis.
Using simex experiments list --full
we can confirm that we got our desired
experiments:
Experiment Instance Status
---------- -------- ------
quick-sort ~ ba-bubble, bs32 @ main uniform-n1000-s1 [0] not submitted
quick-sort ~ ba-bubble, bs32 @ main uniform-n1000-s2 [0] not submitted
quick-sort ~ ba-bubble, bs32 @ main uniform-n1000-s3 [0] not submitted
quick-sort ~ ba-bubble, bs64 @ main uniform-n1000-s1 [0] not submitted
quick-sort ~ ba-bubble, bs64 @ main uniform-n1000-s2 [0] not submitted
quick-sort ~ ba-bubble, bs64 @ main uniform-n1000-s3 [0] not submitted
quick-sort ~ ba-insert, bs32 @ main uniform-n1000-s1 [0] not submitted
quick-sort ~ ba-insert, bs32 @ main uniform-n1000-s2 [0] not submitted
quick-sort ~ ba-insert, bs32 @ main uniform-n1000-s3 [0] not submitted
9 experiments in total
Launchers and Support for Batch Schedulers
To submit experiments to a batch scheduler, simexpal allows you to define
“launchers”. A launcher specifies where and how simexpal should submit
experiments. If no launcher (not even a default launcher or
--launch-through
) is specified, simexpal launches experiments on the local
machine.
Launchers must be defined in the launchers.yml
file located under
~/.simexpal/launchers.yml
. For example, to submit jobs to the Slurm
partition cluster9
, a launcher configuration could look like this:
launchers:
- name: local-cluster
default: true
scheduler: slurm
queue: cluster9
When launching experiments using simex experiments launch
, you can specify
the --launcher
option (e.g., simex experiments launch --launcher
local-cluster
) to select a certain launcher.
Note
The default: true
attribute of a launcher overrides the default behavior
of launching on the local machine. Hence, there can only be one launcher
with default: true
.