.. _QuickStart:
Quick Start
===========
.. highlight:: none
The following will introduce you to the most fundamental mechanics of simexpal.
The quick start guide will explain simexpal installation, as well as the
features: instance and input parameter management, launch experiments, monitor
experiments, and evaluate experiments using the output data. Thereafter, we will
have a closer look at the different phases of experiments set up using simexpal
including running experiments, evaluating results, managing instances,
parameterize algorithms, builds and revisions, the run matrix, and launchers and
support for batch schedulers.
Installation
------------
simexpal requires Python 3.
Either you install it via pip3 using:
.. code-block:: bash
$ pip3 install simexpal
Or you install it using the latest commit of our default branch from GitHub. Of
course, you can also choose to install a specific version (using our tags) or
branch.
.. code-block:: bash
$ git clone https://github.com/hu-macsy/simexpal.git
$ cd simexpal
$ pip3 install -e .
.. _sortingExample:
Minimal Example and Fundamentals
-------------
The simexpal repository contains a small example which can be used to quickly
get to know the tool. For this section, we compare different sorting algorithms
using multiple instances as input.
1. Install simexpal as detailed above.
2. Clone the simexpal repository and navigate to the ``examples/sorting/`` directory:
.. code-block:: bash
$ git clone https://github.com/hu-macsy/simexpal.git
$ cd simexpal/examples/sorting/
This directory contains an ``experiments.yml`` file which describes the
configuration of all instances and experiments. In this case each instance
will be generated using the ``generate.py`` script. Each experiment calls
``my_sort.py`` implementing the experiment given the parameters. ``eval.py`` is
a script that uses simexpal's Python interface to evaluate the experiment
results.
3. Generate some instances for the benchmark. First, we list all instances:
.. code-block:: bash
# List the instances declared in experiments.yml.
# Note that missing instances will appear in red.
$ simex instances
Output:
::
Instance short name Instance sets
------------------- -------------
uniform-n1000-s1
uniform-n1000-s2
uniform-n1000-s3
Then, install all instances:
.. code-block:: bash
# Generate missing instance files.
$ simex instances install
Output:
::
Generating instance 'uniform-n1000-s1'
Generating instance 'uniform-n1000-s2'
Generating instance 'uniform-n1000-s3'
In the ``experiments.yml`` the instances are declared as part of the
``instances`` stanza. This stanza also declares how to invoke the generator
script ``generate.py``.
4. Launch the algorithms on all instances:
List experiment configurations from experiments.yml.
.. code-block:: bash
$ simex e
Output:
::
Experiment Instance Status
---------- -------- ------
bubble-sort uniform-n1000-s1 [0]
bubble-sort uniform-n1000-s2 [0]
bubble-sort uniform-n1000-s3 [0]
insertion-sort uniform-n1000-s1 [0]
insertion-sort uniform-n1000-s2 [0]
insertion-sort uniform-n1000-s3 [0]
6 experiments in total
Launch all experiments using process forks.
.. code-block:: bash
$ simex e launch --launch-through=fork
Output:
::
Launching run bubble-sort/uniform-n1000-s1[0] on local machine
Launching run bubble-sort/uniform-n1000-s2[0] on local machine
Launching run bubble-sort/uniform-n1000-s3[0] on local machine
Launching run insertion-sort/uniform-n1000-s1[0] on local machine
Launching run insertion-sort/uniform-n1000-s2[0] on local machine
Launching run insertion-sort/uniform-n1000-s3[0] on local machine
View the status of the experiments.
.. code-block:: bash
$ simex e list
Output:
::
Experiment Instance Status
---------- -------- ------
bubble-sort uniform-n1000-s1 [0] finished
bubble-sort uniform-n1000-s2 [0] finished
bubble-sort uniform-n1000-s3 [0] finished
insertion-sort uniform-n1000-s1 [0] finished
insertion-sort uniform-n1000-s2 [0] finished
insertion-sort uniform-n1000-s3 [0] finished
6 experiments in total
5. Evaluate the results:
To evaluate the experiment results, we call the ``eval.py`` script which
uses the pandas package to aggregate the results. Please make sure that the
Python package pandas is installed on your machine, or install it via ``pip3
install pandas``. The script also uses the simexpal Python interface (i.e.,
the functions ``collect_successful_results()`` and ``open_output_file()``)
to gather all results.
.. code-block:: bash
$ python3 eval.py
Output:
::
comparisons swaps time
experiment
bubble-sort 499500.0 253437.333333 0.053750
insertion-sort 241891.0 257609.000000 0.027219
.. tip::
Simexpal supports autocomplete via `argcomplete `_.
To enable autocomplete, install ``argcomplete`` and enable global completion:
.. code-block:: bash
$ pip3 install argcomplete
$ activate-global-python-argcomplete
Running Experiments
-------------------
Let us now take a closer look at running experiments. As an example, we compare
the sorting algorithms *insertion sort* and *bubble sort* using a set of
instances. Please find the resources in our `example folder
`_.
If you look at the `sorting directory
`_ in our
examples, you will find `my_sort.py
`_
including an implementation of the two algorithms *insertion sort* and *bubble
sort*.
``my_sort.py`` expects two arguments:
1. The algorithm name (i.e. *insertion-sort* or *bubble-sort*).
2. The path to the instance.
In the above example, we generated a bunch of instances which were written to
the *instances* folder at root level where the ``experiments.yml`` file is
present.
For this example, we generate a new instance using the ``generate.py`` script.
First, we navigate to the ``sorting`` folder:
.. code-block:: bash
$ cd examples/sorting
Thereafter, we generate a new instance file which contains 500 randomly
generated integers in list form with one integer per line:
.. code-block:: bash
$ python3 generate.py -o ./instances/random-500 500
Now we can run *insertion-sort* on the newly generate instance:
.. code-block:: bash
python3 my_sort.py --algo=insertion-sort ./instances/random-500
Output:
::
...
- 9949624
- 9984385
- 9984385
swaps: 65114
time: 0.006702899932861328
Let us continue with configuring simexpal to automatize the experimental pipeline.
Consider the ``experiments.yml`` `file
`_
in the root directory. It represents the simexpal configuration read in to run
the experiments on the desired instances. The file is structured as below:
.. literalinclude:: ../examples/sorting/experiments.yml
:linenos:
:language: yaml
:caption: Example of an experiments.yml simexpal configuration file.
The structure of the configuration file will get more attention later on. At
this point, our ``sorting`` directory looks like this:
::
sorting
├── my_sort.py
├── experiments.yml
└── instances
└── random_500.list
To add the instance to be used in our experiments, we must add a local file with
it's name *random-500* (no extension) to our instances stanza:
.. code-block:: yaml
instances:
- generator:
args: ['./generate.py', '--seed=1', '1000']
items:
- uniform-n1000-s1
- generator:
args: ['./generate.py', '--seed=2', '1000']
items:
- uniform-n1000-s2
- generator:
args: ['./generate.py', '--seed=3', '1000']
items:
- uniform-n1000-s3
- repo: local
items:
- random-500
experiments:
- name: insertion-sort
args: ['./my_sort.py', '--algo=insertion-sort', '@INSTANCE@']
stdout: out
- name: bubble-sort
args: ['./my_sort.py', '--algo=bubble-sort', '@INSTANCE@']
stdout: out
After having completed this step, we can start using simexpal to run our
experiments including the new instance. The @-Variable ``@INSTANCE@`` will be
use for all instances. A complete list of experiments and their status can be
seen by:
.. code-block:: bash
$ simex e list
The color of each line represents the status of the experiment:
.. role:: green
.. role:: yellow
.. role:: red
.. raw:: html
- green :green:`🮆` represents *finished*
- yellow :yellow:`🮆` represents *running*
- red :red:`🮆` represents *failed*
- and the default color represents *not executed*
Experiments can be launched with:
.. code-block:: bash
$ simex e launch --launch-through=fork
This instruction will launch the non-executed experiments on the local machine.
After all experiments have ben run, all experiment entries should be finished:
.. code-block:: bash
$ simex e list
Output:
::
Experiment Instance Status
---------- -------- ------
bubble-sort random-500 [0] finished
bubble-sort uniform-n1000-s1 [0] finished
bubble-sort uniform-n1000-s2 [0] finished
bubble-sort uniform-n1000-s3 [0] finished
insertion-sort random-500 [0] finished
insertion-sort uniform-n1000-s1 [0] finished
insertion-sort uniform-n1000-s2 [0] finished
insertion-sort uniform-n1000-s3 [0] finished
8 experiments in total
Evaluating Results
------------------
After the experiments have been run, simexpal can assist with locating and
collecting output data. For this purpose, simexpal can be imported as a Python
package. As simexpal is output format and algorithm agnostic, you need to
provide functionality to parse output files and evaluate results. Parsing output
files can usually be greatly simplified by using standardized formats and
appropriate libraries.
The example below (i.e., ``eval.py`` from ``examples/sorting/``)
demonstrates this concept. It uses the simexpal Python package
to obtain all output files and meta data about them. In particular,
it uses the functions ``collect_successful_results()`` and
``open_output_file()`` for this purpose.
A user-supplied parsing function is employed to parse the output files.
.. literalinclude:: ../examples/sorting/eval.py
:linenos:
:lines: 7-24
:language: python
Run this Python script to evaluate the experiments:
.. code-block:: bash
$ python3 eval.py
Output:
::
comparisons swaps time
experiment
bubble-sort 405812.5 205647.25 0.043447
insertion-sort 196834.5 208978.00 0.021463
Managing Instances
------------------
Before launching the experiments, we need to make sure that all our instances
are available. Instances can be checked with:
.. code-block:: bash
$ simex instances list
An example output:
::
Instance short name Instance sets
------------------- -------------
random-500
uniform-n1000-s1
uniform-n1000-s2
uniform-n1000-s3
Unavailable instances will be shown in red, available instances will be shown in
green.
If instances are taken from a public repository, they can be downloaded
automatically. We configured the `YAML file
`_
under ``examples/download_instances`` below to use instances from `SNAP
`_.
.. literalinclude:: ../examples/download_instances/experiments.yml
:linenos:
:language: yaml
:caption: experiments.yml with instances from public repositories.
Please navigate to the subfolder ``examples/download_instances`` and list all
instances:
.. code-block:: bash
$ simex instances list
Output:
::
Instance short name Instance sets
------------------- -------------
cit-HepTh
facebook_combined
The listed instances are shown as not available. To download them, use the
following command:
.. code-block:: bash
$ simex instances install
Output:
::
Downloading instance 'cit-HepTh' from snap repository
[==================================================]100% (1.26MB/1.26MB)
Downloading instance 'facebook_combined' from snap repository
[==================================================]100% (0.21MB/0.21MB)
Now when repeating the ``simex instances list`` command, you'll see that all
instances are available.
.. _parametersAndVariants:
Parameterize Algorithms
-----------------------
When benchmarking algorithms, it is often useful to compare different variants
or parameter configurations. simexpal can manage those variants without
requiring you to duplicate the ``experiments`` stanza multiple times.
As an example, imagine that you want to benchmark the running time of a *merge
sort* algorithm using different minimum block sizes and sorting algorithms for
these blocks. The ``my_sort.py`` script under ``examples/sorting`` provides an
implementation of *merge sort*.
The following extends the original ``experiments.yml`` file with additional
variants for *merge sort* after we defined our instances:
.. code-block:: YAML
experiments:
- name: 'merge-sort'
stdout: out
args: ['python3', 'my_sort.py', '--algo=merge-sort', '@EXTRA_ARGS@', '@INSTANCE@']
variants:
- axis: 'block-size'
items:
- name: 'bs2'
extra_args: ['--block_size=2']
- name: 'bs20'
extra_args: ['--block_size=20']
- name: 'bs200'
extra_args: ['--block_size=200']
- axis: 'block-algo'
items:
- name: 'bba-insertion'
extra_args: ['--block_algorithm=insertion-sort']
- name: 'bba-selection'
extra_args: ['--block_algorithm=bubble-sort']
matrix:
include:
- experiments: [insertion-sort, bubble-sort]
axes: []
- experiments: [merge-sort]
axes: [block-size, block-algo]
First, we define the new experiment ``merge-sort`` adding the @-Variable
``@EXTRA_ARGS@``. This will allow simexpal to insert additional program
arguments. Second, we define ``variants`` as possible parameter groups for our
experiment. Finally, we must add groups of experiments where for
``insertion-sort`` and ``bubble-sort`` the axes remain empty - as we do not want
to parameterize these experiments. The experiment group with ``merge-sort`` will
be using the variant axes ``block-size`` and ``block-algo``.
After successfully running all (remaining) experiments, listing the experiments
with:
.. code-block:: bash
simex e list --full
prints:
::
Experiment started finished failures other
---------- ------- -------- -------- -----
bubble-sort 4/4
insertion-sort 4/4
merge-sort ~ bba-insertion, bs2 4/4
merge-sort ~ bba-insertion, bs20 4/4
merge-sort ~ bba-insertion, bs200 4/4
merge-sort ~ bba-selection, bs2 4/4
merge-sort ~ bba-selection, bs20 4/4
merge-sort ~ bba-selection, bs200 4/4
32 experiments in total
Builds and Revisions
--------------------
To make sure that experiments always run using the same program binaries,
simexpal can manage internal projects as well as external Git repositories,
automatizing the build process.
Automated builds are controlled by the ``builds`` and ``revisions`` stanzas
in the ``experiments.yml``.
For the remainder of this section, we will will use the `C++ implementation
`_ of the
sorting example. We will use simexpal to resolve the dependency and to configure
and compile the C++ project.
simexpal will invoke CMake commands to build the program; these steps are merely
a list of shell input strings, thus you may use any build environment..
To enable automated builds, we need to add ``builds`` and ``revisions`` stanzas
to ``experiments.yml``. For experiments, to use the correct project, we must use
the ``simexpal`` build.
.. literalinclude:: ../examples/sorting_cpp/experiments.yml
:emphasize-lines: 1, 18, 39
:linenos:
:language: yaml
:caption: experiments.yml for the C++ example of sorting algorithms.
After navigating to ``examples/sorting_cpp``, we must first generate the
instances:
.. code-block:: bash
$ simex instances install
Now we build the C++ project:
.. code-block:: bash
$ simex builds make
Once the build process is finished, the experiments can be started as usual
using:
.. code-block:: bash
$ simex e launch --launch-through=fork
Run Matrix
----------
In the :ref:`parametersAndVariants` section we saw how we can use simexpal to
specify axes and variants of parameters. For the following example we will take
a look at the ``variants`` stanza of the `C++ Sorting Example
`_:
.. literalinclude:: ../examples/sorting_cpp/experiments.yml
:emphasize-lines: 43
:linenos:
:language: yaml
:caption: experiments.yml for the C++ example of sorting algorithms.
simexpal will build every permutation of the experiment, instance, variant and
revision sets. However, there are cases where this is not desired. For example,
you might only want to run certain instance/variant combinations, not all.
Assume you want to run the *quick sort* algorithm with *insertion sort* as base
block algorithm and ``32`` as minimal block size. Additionally you want to run
*quick sort* with *bubble sort* as base block algorithm and use both ``32`` and
``64`` as minimal block sizes.
To achieve this, we need to add a ``matrix`` stanza to ``experiments.yml``. In
our example, this looks like:
.. code-block:: YAML
matrix:
include:
- experiments: [quick-sort]
variants: [ba-insert, bs32]
revisions: [main]
- experiments: [quick-sort]
variants: [ba-bubble]
revisions: [main]
We could explicitly specify ``[ba-bubble, bs32, bs64]`` for the variants of
``quick-sort``. In this case however, it is not necessary as ``bs32`` and
``bs64`` are all the possible values for the ``block-size`` axis.
Using ``simex experiments list --full`` we can confirm that we got our desired
experiments:
.. code-block:: bash
Experiment Instance Status
---------- -------- ------
quick-sort ~ ba-bubble, bs32 @ main uniform-n1000-s1 [0] not submitted
quick-sort ~ ba-bubble, bs32 @ main uniform-n1000-s2 [0] not submitted
quick-sort ~ ba-bubble, bs32 @ main uniform-n1000-s3 [0] not submitted
quick-sort ~ ba-bubble, bs64 @ main uniform-n1000-s1 [0] not submitted
quick-sort ~ ba-bubble, bs64 @ main uniform-n1000-s2 [0] not submitted
quick-sort ~ ba-bubble, bs64 @ main uniform-n1000-s3 [0] not submitted
quick-sort ~ ba-insert, bs32 @ main uniform-n1000-s1 [0] not submitted
quick-sort ~ ba-insert, bs32 @ main uniform-n1000-s2 [0] not submitted
quick-sort ~ ba-insert, bs32 @ main uniform-n1000-s3 [0] not submitted
9 experiments in total
Launchers and Support for Batch Schedulers
----------------------------------------
To submit experiments to a batch scheduler, simexpal allows you to define
"launchers". A launcher specifies where and how simexpal should submit
experiments. If no launcher (not even a default launcher or
``--launch-through``) is specified, simexpal launches experiments on the local
machine.
Launchers must be defined in the ``launchers.yml`` file located under
``~/.simexpal/launchers.yml``. For example, to submit jobs to the Slurm
partition ``cluster9``, a launcher configuration could look like this:
.. code-block:: YAML
launchers:
- name: local-cluster
default: true
scheduler: slurm
queue: cluster9
When launching experiments using ``simex experiments launch``, you can specify
the ``--launcher`` option (e.g., ``simex experiments launch --launcher
local-cluster``) to select a certain launcher.
.. note::
The ``default: true`` attribute of a launcher overrides the default behavior
of launching on the local machine. Hence, there can only be one launcher
with ``default: true``.