Introduction

While functional tests (see Managing functional tests) form an excellent tool for verifying specific features and avoiding regression issues, there is also a need for validating simulation results for more realistic and complex models. Because nontrivial radiative transfer problems cannot be solved analytically or with deterministic numerical methods, the only option is comparing the results of different codes. This realization has led several authors to present well-defined benchmark problems with corresponding reference solutions produced by a number of codes participating in the benchmark.

SKIRT successfully performs the relevant benchmarks available in the literature. The geometries, source spectra and medium properties needed for these benchmarks are built into the code. For an overview, see the Benchmarks section of the SKIRT web site. It shows the results for each benchmark and offers the corresponding configuration files for download, so that any interested third party can run the benchmark simulations.

As a side benefit, benchmarks also test the SKIRT multi-threading parallization mechanisms (because benchmarks are usually executed using a single process, the multi-processing implementation is not tested).

Running benchmarks

It is not feasible to run benchmarks in a fully automated manner for two reasons:

A benchmark simulation may have a runtime of many hours on a typical present-day desktop computer; in fact some of the simulations may be more conveniently run on a remote server.
The benchmark results cannot be validated automatically; a visualization of the results must be compared to the corresponding reference plot(s) by a human.

Fortunately, there is no need to run the benchmarks for every update to the SKIRT code. The recommended approach is as follows:

After a change to the implementation of a given physics module, run the benchmarks relevant to testing those physics.
After a change to the radiation transfer mechanism or to some other fundamental areas of the code, run all benchmarks.

The _ReadMe.txt text file in the Benchmark directory describes the structure of a typical benchmark directory and the Python scripts provided for running the benchmark simulations and verifying the results. Essentially, the procedure for running a benchmark has four stages: prepare, execute, visualize, and evaluate. By seperating the execution stage from the other stages, it is possible to run the simulations on another computer such as a remote server.

Configuring a new benchmark

When a new area of physics is implemented in SKIRT, its operation usually is validated by comparison with known results, theoretical or generated by other simulation codes. These validation tests should be added to the list of benchmark specifications in the Benchmark directory. Likewise, when a new relevant benchmark effort is published, with or without involvement of the SKIRT team, a corresponding benchmark specification should be added.

When constructing a new benchmark specification, follow the structure of the other benchmarks as closely as possible. If there is a need for deviating from this structure, document it in a ReadMe file. Also, remember to add a corresponding benchmark description, ski file(s) and results to the Benchmarks section of the SKIRT web site.