Instances
You might want to take a look at the following pages before exploring instances:
On this page we describe how to specify instances in the experiments.yml
file. You can
list local instances that consist of zero or more files. More over simexpal can download
remote instances from the SNAP repository, Git repositories
and arbitrary URLs. It is also possible to assign instances to instance sets that enable a more
efficient usage of the command line interface and are useful when
defining the run matrix.
Instance Directory
The instance directory is the directory that stores all the instances. The path can be set via
the instdir
key:
1 instdir: "<path_to_instance_directory>"
If instdir
is not set, it will default to <path_to_experiments.yml_directory>/instances
.
The instance directory will be created if it does not exist already.
Local Instances
To add local instances to the instances
key, we add a list of dictionaries with two keys
and an optional third key to its value:
repo
: source of the instancessubdir
: subdirectory of the instances in theinstdir
directoryitems
: a list of instances.
An example of how to list a local set of instances is:
1instances:
2 - repo: local
3 subdir: large
4 items:
5 - random_500.list
6 - partially_sorted_500.list
7 - repo: local
8 subdir: small
9 - random_100.list
10 - partially_sorted_100.list
11
12instdir: "./instances"
The above setup resembles the following structure:
example
├── instances
│ ├── large
│ │ ├── partially_sorted_500.list
│ │ └── random_500.list
│ └── small
│ ├── partially_sorted_100.list
│ └── random_100.list
└── experiments.yml
Remote Instances
It is possible to let simexpal download instances from SNAP,
a URL and a Git repository. In the sections below, we will see how to list the different
kinds of remote instances in the experiments.yml
.
After listing the instances we need to use
$ simex instances install
to download the instances into the instance directory.
Note
1st December 2020: It is no longer possible to automatically download KONECT
instances as the website is no longer publicly available. It is still possible to list them and
execute supported actions, e.g, transforming the instances to edgelist format via
simex instances run-transform --transform='to_edgelist'
if you already have them saved locally.
Instances From SNAP
To list instances from the SNAP repository, set the value of repo
to snap
and put
the file names without the .txt.gz
extension in the items
list.
For instances from the KONECT repository, set the value of repo
to konect
and put
the internal names of the KONECT instances in the items
list.
1 instdir: "<path_to_instance_directory>"
2 instances:
3 - repo: snap
4 items:
5 - facebook_combined
6 - wiki-Vote
7 - repo: konect
8 items:
9 - dolphins
10 - ucidata-zachary
Instances From a URL
To list instances from a URL, we use the following keys:
method
: download methodurl
: URL of the instance
We set the value of the method
key to 'url'
and specify the URL of the instance in the
url
key.
1instdir: "<path_to_instance_directory>"
2instances:
3 - method: url
4 url: 'https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/@INSTANCE_FILENAME@'
5 items:
6 - 'experiments.json'
7 - 'launchers.json'
The @-variable @INSTANCE_FILENAME@
in the URL (from the example above) resolves to
the elements in the items
key. Thus, we have listed the two instances experiments.json
and
launchers.json
, which come from
https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/experiments.json and
https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/launchers.json respectively.
Instances From Git
To list instances from a Git repository, we use the following keys:
method
: download methodgit
: link to the Git repositoryrepo_name
: name of the directory to clone intocommit
: SHA-1 hashgit_subdir
: subdirectory of the instance in the Git repository
We set the value of the method
key to 'git'
and specify the Git URL of the instance in the
git
key. The repo_name
states the local directory name of the Git repository. When installing
the instance, the Git repository will be stored in <instance_dir>/<repo_name>
. The commit
value
specifies the version of the instance given as SHA-1 hash. It is also possible to specify other revision
parameters, e.g. ref names. For reproducibility reasons the former variant is recommended. If the instance
is not located in the root directory of the Git repository, we will need to specify the subdirectory of
the instance in git_subdir
.
1instdir: "<path_to_instance_directory>"
2instances:
3 - method: git
4 git: 'https://github.com/hu-macsy/simexpal'
5 repo_name: 'foo'
6 commit: 'master'
7 items:
8 - 'setup.py'
9 - 'pytest.ini'
10 - method: git
11 git: 'https://github.com/hu-macsy/simexpal'
12 repo_name: 'foo'
13 commit: 'd5e598f292b90cd7ef2e77d7a478ec52d42279df'
14 git_subdir: 'simexpal/schemes/'
15 items:
16 - 'experiments.json'
17 - 'launchers.json'
In the example above we clone the simexpal repository into <instance_dir>/foo
. Then setup.py
and
pytest.ini
of the current master
branch and simexpal/schemes/experiments.json
and
simexpal/schemes/launchers.json
of the specified commit d5e598f292b90cd7ef2e77d7a478ec52d42279df
will be downloaded into the instance directory.
Multiple Input Files
Until now we only considered experiments with one input file, which might not always be the case. Below we distinguish two cases:
The input filenames only differ in the extension, e.g.
foo.graph
andfoo.xyz
.The input filenames are arbitrary.
Multiple Extensions
Listing instances with multiple extensions is similar to listing Local Instances. The difference is that we will add the following key:
extensions
: list of extensions that the instance has
1 instdir: "<path_to_instance_directory>"
2 instances:
3 - repo: local
4 extensions:
5 - graph
6 - xyz
7 items:
8 - foo
9 - bar
The experiments.yml
file above will create the instance foo
which contains the files
foo.graph
and foo.xyz
and the instance bar
which contains the files
bar.graph
and bar.xyz
.
Arbitrary Input Files
To get an instance with arbitrary input files we will put a list of dictionaries as value for
the items
key. The dictionaries contain two keys:
name
: name of the instancefiles
: list of files the instance consists of
1 instdir: "<path_to_instance_directory>"
2 instances:
3 - repo: local
4 items:
5 - name: foo
6 files:
7 - file1
8 - file2
9 - name: bar
10 files:
11 - file3
12 - file4
The experiments.yml
file above will create the instance foo
which contains the files
file1
and file2
and the instance bar
which contains the files
file3
and file4
.
Fileless Instances
There are cases where instances are not defined by a file but rather by some input parameters, e.g.
algorithms that generate their data themselves and only need input parameters like --seed 10
.
Specifying fileless instances works similar to specifying Arbitrary Input Files. The difference is,
that we set files: []
to indicate that we are dealing with a fileless instance and use the
extra_args
: list of extra arguments
key to specify our extra arguments.
1 instances:
2 - repo: local
3 items:
4 - name: foo
5 files: []
6 extra_args: ['--seed', '10']
Note
If you get an error message pointing out that the experiment is fileless, check if you forgot to
remove the @INSTANCE@
variable in the experiment argument list. Since the experiment does not
take an instance as input, this variable must not be part of the argument list!
Generator Instances
It is possible to let simexpal generate instances by providing a program that writes to /dev/stdout
.
In order to do so, we specify the
generator
: dictionary containing generator argumentsitems
: list of instances
keys.
1instances:
2 - generator:
3 args: ['./generate.py', '--seed=1', '1000']
4 items:
5 - uniform-n1000-s1
6 - generator:
7 args: ['./generate.py', '--seed=2', '1000']
8 items:
9 - uniform-n1000-s2
10 - generator:
11 args: ['./generate.py', '--seed=3', '1000']
12 items:
13 - uniform-n1000-s3
In the example above we list the three instances uniform-n1000-s1
, uniform-n1000-s2
and
uniform-n1000-s3
, which will be created from the program
generate.py.
It takes the following optional parameters
-o
: path of output file (default:/dev/stdout
),--seed
: seed for random generator (default: current system time),--range
: range of integers (default:10e6
)
and mandatory parameter
n
: number of integers to generate
as input.
The command ./generate.py --seed=2 1000
creates 1000 random numbers from the seed 2
and writes
them into /dev/stdout
. Simexpal redirects the input from /dev/stdout
to the file
/<instance_directory>/<instance_name>
.
Finally, we need to use
$ simex instances install
to generate the instances.
It is also possible to list more than one instance in the items
key, e.g.
1instances:
2 - generator:
3 args: ['./generate.py' '1000']
4 items:
5 - uniform-n1000-s1
6 - uniform-n1000-s2
Here we list the two instances uniform-n1000-s1
and uniform-n1000-s2
. Both
./generate.py 1000
commands will use their respective system times as seed and create 1000 random
numbers from them.
Extra Arguments
We can set extra arguments for instance blocks and individual instances, which can be appended to the experiment arguments when the respective instance is used. In order to specify such instances, we use the
extra_args
: list of extra arguments
key.
Note
It is possible to have instances with common extra arguments and additional individual extra arguments
by adding the extra_args
key to the respective places (see below).
In order to specify common extra arguments for instance blocks, we simply add the extra_args
key to them.
1 instdir: "<path_to_instance_directory>"
2 instances:
3 - repo: local
4 extra_args: ['some','extra_args']
5 ...
6 items:
7 - instance1
8 - instance2
9 - repo: local
10 extra_args: ['some','extra_args']
11 ...
12 items:
13 - instance3
14 - instance4
In order to specify individual extra arguments for Local Instances, Remote Instances and
Multiple Extensions we need to change the items
key from a list of instances to a list of
dictionaries containing the
name
: name of the instance andextra_args
key, e.g.,
1instances:
2 - repo: local # local instances with extra arguments
3 items:
4 - name: inst1
5 extra_args: ['some','extra_args']
6 - name: inst2
7 extra_args: ['some','extra_args']
8 - repo: snap # remote instances with extra arguments
9 items:
10 - name: facebook_combined
11 extra_args: ['some', 'extra_args']
12 - name: wiki-Vote
13 extra_args: ['some', 'extra_args']
14 - repo: local # multiple extension instances with extra args
15 items:
16 - name: inst3
17 extra_args: ['some', 'extra_args']
18 - name: inst4
19 extra_args: ['some', 'extra_args']
20 extensions: [ext1, ext2]
For Arbitrary Input Files we only need to add the extra_args
key
to the dictionaries of the instances, e.g,
1 instances:
2 - repo: local
3 items:
4 - name: inst3
5 files: [file1, file2]
6 extra_args: ['some', 'extra', 'argument']
7 - name: inst4
8 files: [file1, file2]
9 extra_args: ['some', 'extra', 'argument']
Instance Sets
It is possible to assign instances to instance sets. This is useful when trying to run experiments
that have common instances. Assume you want to run an experiment on the two instances instance1
and instance2
and a different experiment on instance2
and instance3
. To do this, you can
use the following key:
set
: list of sets the instance belongs to
1 instdir: "<path_to_instance_directory>"
2 instances:
3 - repo: local
4 set: [set1]
5 items:
6 - instance1
7 - repo: local
8 set: [set1, set2]
9 items:
10 - instance2
11 - repo: local
12 set: [set2]
13 items:
14 - instance3
In this way we have created the instance set set1
, which contains instance1
and instance2
and set2
, which contains instance2
and instance3
.
Instance sets will also be useful when using the command line interface of simexpal and when defining the Run Matrix.
Next
To set up your automated builds, visit the Builds page. If you do not plan on using automated builds, you can visit the Experiments page to set up your experiments.