.. _Instances: Instances ========= You might want to take a look at the following pages before exploring instances: - :ref:`QuickStart` - :ref:`AtVariables` On this page we describe how to specify instances in the ``experiments.yml`` file. You can list local instances that consist of zero or more files. More over simexpal can download remote instances from the `SNAP `_ repository, Git repositories and arbitrary URLs. It is also possible to assign instances to instance sets that enable a more efficient usage of the :ref:`command line interface ` and are useful when defining the run matrix. .. _InstanceDirectory: Instance Directory ------------------ The instance directory is the directory that stores all the instances. The path can be set via the ``instdir`` key: .. code-block:: YAML :linenos: :caption: How to set the instance directory in the experiments.yml file. instdir: "" If ``instdir`` is not set, it will default to ``/instances``. The instance directory will be created if it does not exist already. .. _LocalInstances: Local Instances --------------- To add local instances to the ``instances`` key, we add a list of dictionaries with two keys and an optional third key to its value: - ``repo``: source of the instances - ``subdir``: subdirectory of the instances in the ``instdir`` directory - ``items``: a list of instances. An example of how to list a local set of instances is: .. literalinclude:: ./experiments.yml.example :linenos: :lines: 1-12 :language: yaml :caption: How to list local instances in the experiments.yml file. The above setup resembles the following structure: :: example ├── instances │ ├── large │ │ ├── partially_sorted_500.list │ │ └── random_500.list │ └── small │ ├── partially_sorted_100.list │ └── random_100.list └── experiments.yml .. _RemoteInstances: Remote Instances ---------------- It is possible to let simexpal download instances from `SNAP `_, a URL and a Git repository. In the sections below, we will see how to list the different kinds of remote instances in the ``experiments.yml``. After listing the instances we need to use .. code-block:: bash $ simex instances install to download the instances into the instance directory. .. note:: 1st December 2020: It is no longer possible to automatically download `KONECT `_ instances as the website is no longer publicly available. It is still possible to list them and execute supported actions, e.g, transforming the instances to edgelist format via ``simex instances run-transform --transform='to_edgelist'`` if you already have them saved locally. Instances From SNAP ^^^^^^^^^^^^^^^^^^^ To list instances from the SNAP repository, set the value of ``repo`` to ``snap`` and put the file names without the ``.txt.gz`` extension in the ``items`` list. For instances from the KONECT repository, set the value of ``repo`` to ``konect`` and put the internal names of the KONECT instances in the ``items`` list. .. code-block:: YAML :linenos: :caption: How to list instances from the SNAP and KONECT repository in the experiments.yml file. instdir: "" instances: - repo: snap items: - facebook_combined - wiki-Vote - repo: konect items: - dolphins - ucidata-zachary Instances From a URL ^^^^^^^^^^^^^^^^^^^^ To list instances from a URL, we use the following keys: - ``method``: download method - ``url``: URL of the instance We set the value of the ``method`` key to ``'url'`` and specify the URL of the instance in the ``url`` key. .. code-block:: YAML :linenos: :caption: How to list instances from a URL in the experiments.yml file. instdir: "" instances: - method: url url: 'https://raw.githubusercontent.com/hu-macsy/simexpal/master/simexpal/schemes/@INSTANCE_FILENAME@' items: - 'experiments.json' - 'launchers.json' The :ref:`@-variable ` ``@INSTANCE_FILENAME@`` in the URL (from the example above) resolves to the elements in the ``items`` key. Thus, we have listed the two instances ``experiments.json`` and ``launchers.json``, which come from ``_ and ``_ respectively. Instances From Git ^^^^^^^^^^^^^^^^^^ To list instances from a Git repository, we use the following keys: - ``method``: download method - ``git``: link to the Git repository - ``repo_name``: name of the directory to clone into - ``commit``: SHA-1 hash - ``git_subdir``: subdirectory of the instance in the Git repository We set the value of the ``method`` key to ``'git'`` and specify the Git URL of the instance in the ``git`` key. The ``repo_name`` states the local directory name of the Git repository. When installing the instance, the Git repository will be stored in ``/``. The ``commit`` value specifies the version of the instance given as SHA-1 hash. It is also possible to specify other revision parameters, e.g. ref names. For reproducibility reasons the former variant is recommended. If the instance is not located in the root directory of the Git repository, we will need to specify the subdirectory of the instance in ``git_subdir``. .. code-block:: YAML :linenos: :caption: How to list instances from a Git repository in the experiments.yml file. instdir: "" instances: - method: git git: 'https://github.com/hu-macsy/simexpal' repo_name: 'foo' commit: 'master' items: - 'setup.py' - 'pytest.ini' - method: git git: 'https://github.com/hu-macsy/simexpal' repo_name: 'foo' commit: 'd5e598f292b90cd7ef2e77d7a478ec52d42279df' git_subdir: 'simexpal/schemes/' items: - 'experiments.json' - 'launchers.json' In the example above we clone the simexpal repository into ``/foo``. Then ``setup.py`` and ``pytest.ini`` of the current ``master`` branch and ``simexpal/schemes/experiments.json`` and ``simexpal/schemes/launchers.json`` of the specified commit ``d5e598f292b90cd7ef2e77d7a478ec52d42279df`` will be downloaded into the instance directory. Multiple Input Files -------------------- Until now we only considered experiments with one input file, which might not always be the case. Below we distinguish two cases: 1. The input filenames only differ in the extension, e.g. ``foo.graph`` and ``foo.xyz``. 2. The input filenames are arbitrary. .. _MultipleExtensions: Multiple Extensions ^^^^^^^^^^^^^^^^^^^ Listing instances with multiple extensions is similar to listing :ref:`LocalInstances`. The difference is that we will add the following key: - ``extensions``: list of extensions that the instance has .. code-block:: YAML :linenos: :caption: How to list instances with multiple extensions in the experiments.yml file. instdir: "" instances: - repo: local extensions: - graph - xyz items: - foo - bar The ``experiments.yml`` file above will create the instance ``foo`` which contains the files ``foo.graph`` and ``foo.xyz`` and the instance ``bar`` which contains the files ``bar.graph`` and ``bar.xyz``. .. _ArbitraryInputFiles: Arbitrary Input Files ^^^^^^^^^^^^^^^^^^^^^ To get an instance with arbitrary input files we will put a list of dictionaries as value for the ``items`` key. The dictionaries contain two keys: - ``name``: name of the instance - ``files``: list of files the instance consists of .. code-block:: YAML :linenos: :caption: How to list instances with arbitrary input files in the experiments.yml file. instdir: "" instances: - repo: local items: - name: foo files: - file1 - file2 - name: bar files: - file3 - file4 The ``experiments.yml`` file above will create the instance ``foo`` which contains the files ``file1`` and ``file2`` and the instance ``bar`` which contains the files ``file3`` and ``file4``. Fileless Instances ------------------ There are cases where instances are not defined by a file but rather by some input parameters, e.g. algorithms that generate their data themselves and only need input parameters like ``--seed 10``. Specifying fileless instances works similar to specifying :ref:`ArbitraryInputFiles`. The difference is, that we set ``files: []`` to indicate that we are dealing with a fileless instance and use the - ``extra_args``: list of extra arguments key to specify our extra arguments. .. code-block:: YAML :linenos: :caption: How to list fileless instances in the experiments.yml file. instances: - repo: local items: - name: foo files: [] extra_args: ['--seed', '10'] .. note:: If you get an error message pointing out that the experiment is fileless, check if you forgot to remove the ``@INSTANCE@`` variable in the experiment argument list. Since the experiment does not take an instance as input, this variable must not be part of the argument list! Generator Instances ------------------- It is possible to let simexpal generate instances by providing a program that writes to ``/dev/stdout``. In order to do so, we specify the - ``generator``: dictionary containing generator arguments - ``items``: list of instances keys. .. code-block:: YAML :linenos: :caption: How to list generator instances in the experiments.yml file. instances: - generator: args: ['./generate.py', '--seed=1', '1000'] items: - uniform-n1000-s1 - generator: args: ['./generate.py', '--seed=2', '1000'] items: - uniform-n1000-s2 - generator: args: ['./generate.py', '--seed=3', '1000'] items: - uniform-n1000-s3 In the example above we list the three instances ``uniform-n1000-s1``, ``uniform-n1000-s2`` and ``uniform-n1000-s3``, which will be created from the program `generate.py `_. It takes the following optional parameters - ``-o``: path of output file (default: ``/dev/stdout``), - ``--seed``: seed for random generator (default: current system time), - ``--range``: range of integers (default: ``10e6``) and mandatory parameter - ``n``: number of integers to generate as input. The command ``./generate.py --seed=2 1000`` creates 1000 random numbers from the seed ``2`` and writes them into ``/dev/stdout``. Simexpal redirects the input from ``/dev/stdout`` to the file ``//``. Finally, we need to use .. code-block:: bash $ simex instances install to generate the instances. It is also possible to list more than one instance in the ``items`` key, e.g. .. code-block:: YAML :linenos: :caption: How to list generator instances with more than one item in the experiments.yml file. instances: - generator: args: ['./generate.py' '1000'] items: - uniform-n1000-s1 - uniform-n1000-s2 Here we list the two instances ``uniform-n1000-s1`` and ``uniform-n1000-s2``. Both ``./generate.py 1000`` commands will use their respective system times as seed and create 1000 random numbers from them. .. _InstanceExtraArguments: Extra Arguments --------------- We can set extra arguments for instance blocks and individual instances, which can be appended to the experiment arguments when the respective instance is used. In order to specify such instances, we use the - ``extra_args``: list of extra arguments key. .. note:: It is possible to have instances with common extra arguments and additional individual extra arguments by adding the ``extra_args`` key to the respective places (see below). In order to specify common extra arguments for instance blocks, we simply add the ``extra_args`` key to them. .. code-block:: YAML :linenos: :caption: How to list instances with common extra arguments in the experiments.yml file. instdir: "" instances: - repo: local extra_args: ['some','extra_args'] ... items: - instance1 - instance2 - repo: local extra_args: ['some','extra_args'] ... items: - instance3 - instance4 In order to specify individual extra arguments for :ref:`LocalInstances`, :ref:`RemoteInstances` and :ref:`MultipleExtensions` we need to change the ``items`` key from a list of instances to a list of dictionaries containing the - ``name``: name of the instance `and` - ``extra_args`` key, e.g., .. code-block:: YAML :linenos: :caption: How to list local/remote/multiple extension instances with individual extra arguments in the experiments.yml file. instances: - repo: local # local instances with extra arguments items: - name: inst1 extra_args: ['some','extra_args'] - name: inst2 extra_args: ['some','extra_args'] - repo: snap # remote instances with extra arguments items: - name: facebook_combined extra_args: ['some', 'extra_args'] - name: wiki-Vote extra_args: ['some', 'extra_args'] - repo: local # multiple extension instances with extra args items: - name: inst3 extra_args: ['some', 'extra_args'] - name: inst4 extra_args: ['some', 'extra_args'] extensions: [ext1, ext2] For :ref:`ArbitraryInputFiles` we only need to add the ``extra_args`` key to the dictionaries of the instances, e.g, .. code-block:: YAML :linenos: :caption: How to list arbitrary input file instances with individual extra arguments in the experiments.yml file. instances: - repo: local items: - name: inst3 files: [file1, file2] extra_args: ['some', 'extra', 'argument'] - name: inst4 files: [file1, file2] extra_args: ['some', 'extra', 'argument'] .. _InstanceSets: Instance Sets ------------- It is possible to assign instances to instance sets. This is useful when trying to run experiments that have common instances. Assume you want to run an experiment on the two instances ``instance1`` and ``instance2`` and a different experiment on ``instance2`` and ``instance3``. To do this, you can use the following key: - ``set``: list of sets the instance belongs to .. code-block:: YAML :linenos: :caption: How to assign instances to instance sets in the experiments.yml file. instdir: "" instances: - repo: local set: [set1] items: - instance1 - repo: local set: [set1, set2] items: - instance2 - repo: local set: [set2] items: - instance3 In this way we have created the instance set ``set1``, which contains ``instance1`` and ``instance2`` and ``set2``, which contains ``instance2`` and ``instance3``. Instance sets will also be useful when using the :ref:`command line interface ` of simexpal and when defining the :ref:`RunMatrix`. Next ---- To set up your automated builds, visit the :ref:`Builds` page. If you do not plan on using automated builds, you can visit the :ref:`Experiments` page to set up your experiments.