Instances

You might want to take a look at the following pages before exploring instances:

On this page we describe how to specify instances in the experiments.yml file. You can list local instances that consist of zero or more files. More over simexpal can download remote instances from the SNAP repository, Git repositories and arbitrary URLs. It is also possible to assign instances to instance sets that enable a more efficient usage of the command line interface and are useful when defining the run matrix.

Instance Directory

The instance directory is the directory that stores all the instances. The path can be set via the instdir key:

How to set the instance directory in the experiments.yml file.
1 instdir: "<path_to_instance_directory>"

If instdir is not set, it will default to <path_to_experiments.yml_directory>/instances. The instance directory will be created if it does not exist already.

Local Instances

To add local instances to the instances key, we add a list of dictionaries with two keys and an optional third key to its value:

  • repo: source of the instances

  • subdir: subdirectory of the instances in the instdir directory

  • items: a list of instances.

An example of how to list a local set of instances is:

How to list local instances in the experiments.yml file.
 1instances:
 2  - repo: local
 3    subdir: large
 4    items:
 5      - random_500.list
 6      - partially_sorted_500.list
 7  - repo: local
 8    subdir: small
 9      - random_100.list
10      - partially_sorted_100.list
11
12instdir: "./instances"

The above setup resembles the following structure:

example
├── instances
│   ├── large
│   │   ├── partially_sorted_500.list
│   │   └──  random_500.list
│   └── small
│       ├── partially_sorted_100.list
│       └── random_100.list
└── experiments.yml

Remote Instances

It is possible to let simexpal download instances from SNAP, a URL and a Git repository. In the sections below, we will see how to list the different kinds of remote instances in the experiments.yml.

After listing the instances we need to use

$ simex instances install

to download the instances into the instance directory.

Note

1st December 2020: It is no longer possible to automatically download KONECT instances as the website is no longer publicly available. It is still possible to list them and execute supported actions, e.g, transforming the instances to edgelist format via simex instances run-transform --transform='to_edgelist' if you already have them saved locally.

Instances From SNAP

To list instances from the SNAP repository, set the value of repo to snap and put the file names without the .txt.gz extension in the items list.

For instances from the KONECT repository, set the value of repo to konect and put the internal names of the KONECT instances in the items list.

How to list instances from the SNAP and KONECT repository in the experiments.yml file.
 1 instdir: "<path_to_instance_directory>"
 2 instances:
 3   - repo: snap
 4     items:
 5       - facebook_combined
 6       - wiki-Vote
 7   - repo: konect
 8     items:
 9       - dolphins
10       - ucidata-zachary

Instances From a URL

To list instances from a URL, we use the following keys:

  • method: download method

  • url: URL of the instance

We set the value of the method key to 'url' and specify the URL of the instance in the url key.

How to list instances from a URL in the experiments.yml file.
1instdir: "<path_to_instance_directory>"
2instances:
3  - method: url
4    url: 'https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/@INSTANCE_FILENAME@'
5    items:
6      - 'experiments.json'
7      - 'launchers.json'

The @-variable @INSTANCE_FILENAME@ in the URL (from the example above) resolves to the elements in the items key. Thus, we have listed the two instances experiments.json and launchers.json, which come from https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/experiments.json and https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/launchers.json respectively.

Instances From Git

To list instances from a Git repository, we use the following keys:

  • method: download method

  • git: link to the Git repository

  • repo_name: name of the directory to clone into

  • commit: SHA-1 hash

  • git_subdir: subdirectory of the instance in the Git repository

We set the value of the method key to 'git' and specify the Git URL of the instance in the git key. The repo_name states the local directory name of the Git repository. When installing the instance, the Git repository will be stored in <instance_dir>/<repo_name>. The commit value specifies the version of the instance given as SHA-1 hash. It is also possible to specify other revision parameters, e.g. ref names. For reproducibility reasons the former variant is recommended. If the instance is not located in the root directory of the Git repository, we will need to specify the subdirectory of the instance in git_subdir.

How to list instances from a Git repository in the experiments.yml file.
 1instdir: "<path_to_instance_directory>"
 2instances:
 3  - method: git
 4    git: 'https://github.com/hu-macsy/simexpal'
 5    repo_name: 'foo'
 6    commit: 'master'
 7    items:
 8      - 'setup.py'
 9      - 'pytest.ini'
10  - method: git
11    git: 'https://github.com/hu-macsy/simexpal'
12    repo_name: 'foo'
13    commit: 'd5e598f292b90cd7ef2e77d7a478ec52d42279df'
14    git_subdir: 'simexpal/schemes/'
15    items:
16      - 'experiments.json'
17      - 'launchers.json'

In the example above we clone the simexpal repository into <instance_dir>/foo. Then setup.py and pytest.ini of the current master branch and simexpal/schemes/experiments.json and simexpal/schemes/launchers.json of the specified commit d5e598f292b90cd7ef2e77d7a478ec52d42279df will be downloaded into the instance directory.

Multiple Input Files

Until now we only considered experiments with one input file, which might not always be the case. Below we distinguish two cases:

  1. The input filenames only differ in the extension, e.g. foo.graph and foo.xyz.

  2. The input filenames are arbitrary.

Multiple Extensions

Listing instances with multiple extensions is similar to listing Local Instances. The difference is that we will add the following key:

  • extensions: list of extensions that the instance has

How to list instances with multiple extensions in the experiments.yml file.
1 instdir: "<path_to_instance_directory>"
2 instances:
3   - repo: local
4     extensions:
5       - graph
6       - xyz
7     items:
8       - foo
9       - bar

The experiments.yml file above will create the instance foo which contains the files foo.graph and foo.xyz and the instance bar which contains the files bar.graph and bar.xyz.

Arbitrary Input Files

To get an instance with arbitrary input files we will put a list of dictionaries as value for the items key. The dictionaries contain two keys:

  • name: name of the instance

  • files: list of files the instance consists of

How to list instances with arbitrary input files in the experiments.yml file.
 1 instdir: "<path_to_instance_directory>"
 2 instances:
 3   - repo: local
 4     items:
 5       - name: foo
 6         files:
 7           - file1
 8           - file2
 9       - name: bar
10         files:
11           - file3
12           - file4

The experiments.yml file above will create the instance foo which contains the files file1 and file2 and the instance bar which contains the files file3 and file4.

Fileless Instances

There are cases where instances are not defined by a file but rather by some input parameters, e.g. algorithms that generate their data themselves and only need input parameters like --seed 10. Specifying fileless instances works similar to specifying Arbitrary Input Files. The difference is, that we set files: [] to indicate that we are dealing with a fileless instance and use the

  • extra_args: list of extra arguments

key to specify our extra arguments.

How to list fileless instances in the experiments.yml file.
1 instances:
2   - repo: local
3     items:
4       - name: foo
5         files: []
6         extra_args: ['--seed', '10']

Note

If you get an error message pointing out that the experiment is fileless, check if you forgot to remove the @INSTANCE@ variable in the experiment argument list. Since the experiment does not take an instance as input, this variable must not be part of the argument list!

Generator Instances

It is possible to let simexpal generate instances by providing a program that writes to /dev/stdout. In order to do so, we specify the

  • generator: dictionary containing generator arguments

  • items: list of instances

keys.

How to list generator instances in the experiments.yml file.
 1instances:
 2  - generator:
 3      args: ['./generate.py', '--seed=1', '1000']
 4    items:
 5      - uniform-n1000-s1
 6  - generator:
 7      args: ['./generate.py', '--seed=2', '1000']
 8    items:
 9      - uniform-n1000-s2
10  - generator:
11      args: ['./generate.py', '--seed=3', '1000']
12    items:
13      - uniform-n1000-s3

In the example above we list the three instances uniform-n1000-s1, uniform-n1000-s2 and uniform-n1000-s3, which will be created from the program generate.py. It takes the following optional parameters

  • -o: path of output file (default: /dev/stdout),

  • --seed: seed for random generator (default: current system time),

  • --range: range of integers (default: 10e6)

and mandatory parameter

  • n: number of integers to generate

as input.

The command ./generate.py --seed=2 1000 creates 1000 random numbers from the seed 2 and writes them into /dev/stdout. Simexpal redirects the input from /dev/stdout to the file /<instance_directory>/<instance_name>.

Finally, we need to use

$ simex instances install

to generate the instances.

It is also possible to list more than one instance in the items key, e.g.

How to list generator instances with more than one item in the experiments.yml file.
1instances:
2  - generator:
3      args: ['./generate.py' '1000']
4    items:
5      - uniform-n1000-s1
6      - uniform-n1000-s2

Here we list the two instances uniform-n1000-s1 and uniform-n1000-s2. Both ./generate.py 1000 commands will use their respective system times as seed and create 1000 random numbers from them.

Extra Arguments

We can set extra arguments for instance blocks and individual instances, which can be appended to the experiment arguments when the respective instance is used. In order to specify such instances, we use the

  • extra_args: list of extra arguments

key.

Note

It is possible to have instances with common extra arguments and additional individual extra arguments by adding the extra_args key to the respective places (see below).

In order to specify common extra arguments for instance blocks, we simply add the extra_args key to them.

How to list instances with common extra arguments in the experiments.yml file.
 1 instdir: "<path_to_instance_directory>"
 2 instances:
 3   - repo: local
 4     extra_args: ['some','extra_args']
 5     ...
 6     items:
 7       - instance1
 8       - instance2
 9   - repo: local
10     extra_args: ['some','extra_args']
11     ...
12     items:
13       - instance3
14       - instance4

In order to specify individual extra arguments for Local Instances, Remote Instances and Multiple Extensions we need to change the items key from a list of instances to a list of dictionaries containing the

  • name: name of the instance and

  • extra_args

key, e.g.,

How to list local/remote/multiple extension instances with individual extra arguments in the experiments.yml file.
 1instances:
 2  - repo: local # local instances with extra arguments
 3    items:
 4      - name: inst1
 5        extra_args: ['some','extra_args']
 6      - name: inst2
 7        extra_args: ['some','extra_args']
 8  - repo: snap # remote instances with extra arguments
 9    items:
10      - name: facebook_combined
11        extra_args: ['some', 'extra_args']
12      - name: wiki-Vote
13        extra_args: ['some', 'extra_args']
14  - repo: local # multiple extension instances with extra args
15    items:
16      - name: inst3
17        extra_args: ['some', 'extra_args']
18      - name: inst4
19        extra_args: ['some', 'extra_args']
20    extensions: [ext1, ext2]

For Arbitrary Input Files we only need to add the extra_args key to the dictionaries of the instances, e.g,

How to list arbitrary input file instances with individual extra arguments in the experiments.yml file.
1 instances:
2  - repo: local
3    items:
4      - name: inst3
5        files: [file1, file2]
6        extra_args: ['some', 'extra', 'argument']
7      - name: inst4
8        files: [file1, file2]
9        extra_args: ['some', 'extra', 'argument']

Instance Sets

It is possible to assign instances to instance sets. This is useful when trying to run experiments that have common instances. Assume you want to run an experiment on the two instances instance1 and instance2 and a different experiment on instance2 and instance3. To do this, you can use the following key:

  • set: list of sets the instance belongs to

How to assign instances to instance sets in the experiments.yml file.
 1 instdir: "<path_to_instance_directory>"
 2 instances:
 3   - repo: local
 4     set: [set1]
 5     items:
 6       - instance1
 7   - repo: local
 8     set: [set1, set2]
 9     items:
10       - instance2
11   - repo: local
12     set: [set2]
13     items:
14       - instance3

In this way we have created the instance set set1, which contains instance1 and instance2 and set2, which contains instance2 and instance3.

Instance sets will also be useful when using the command line interface of simexpal and when defining the Run Matrix.

Next

To set up your automated builds, visit the Builds page. If you do not plan on using automated builds, you can visit the Experiments page to set up your experiments.