Setting up and starting a retrieval

To start a retrieval calculation, the executable bear needs to be called from the command line with a folder that contains the necessary configuration files as a command line argument:

./bear path_to_retrieval_folder/

The folder typically contains all the necessary configuration files for the retrieval and observational data. Output files will also be written to this folder.

Additionally, BeAR can optionally use the following command line arguments:

  • -r - restart a retrieval that has been interrupted

  • -p - perform only the postprocessing step

The additional arguments are added after the folder path:

./bear path_to_retrieval_folder/ -p

To restart a retrieval, the folder needs to contain all necessary MultiNest files from the previous run. Likewise, for the postprocess, the posterior data needs to be present in the folder. The GitHub repository of BeAR contains an example for each forward model that can be used to test the retrieval code or as templates for other retrievals. The example folders contain all necessary files to run a retrieval calculation.

Configuration files

BeAR requires the following files in the folder the executable is called with:

  • retrieval.config - the main configuration file for the retrieval

  • forward_model.config - the configuration file of chosen forward forward model

  • priors.config - the setup list for the prior distributions of the free parameters

  • observations.list - the list of observational data files that the retrieval should use

Optionally, the postprocessing step can be configured with the following file:

  • post_process.config - the configuration file for the postprocessing step

If this file is not present, BeAR will use default settings for the postprocessing.

Main retrieval file

The file retrieval.config contains the basic information for the retrieval setup.

################
#General config#
################
#Use GPU
Y

#OpenMP processor number (if 0, use maximum)
0

#########################
General retrieval config#
#########################

#forward model type
secondary_eclipse

#Spectral grid parametrisation
const_wavenumber 1.0

#Opacity data folder
/media/data/opacity_data/helios-k/

#Use error inflation prior
N

#####################
#Multinest parameter#
#####################
#Importance nested sampling
Y

#Mode separation
N

#Number of live points
800

#Efficiency
0.8

#Maximum number of iterations (0 for no limit)
0

#Resume
N

#Console feedback
Y

#Print parameter values and likelihoods
Y

The following parameters need to be set:

Use GPU
Determines the use of the graphics card. Set either Y or 1 to run the calculation on the GPU. Any other input is interpreted as running purely on the CPU.
OpenMP processor number
Sets the number of processor cores used for parallel computing on the CPU. Some parts of BeAR will still run on the CPU, even if Use GPU is enabled. This, for example, is the case for the FastChem chemistry code that doesn’t run GPUs. BeAR will run certain calculations in parallel on the CPU as well, using OpenMP. Set this parameter to 0 if you want to use all available cores. Note that OpenMP can only use a single, multi-core processor.
forward model type
Sets the forward model that is supposed to be used. BeAR currently supports the following models:
  • transmission - Transmission spectrum

  • secondary_eclipse - Secondary eclipse / occultation spectrum

  • emission - Emission spectrum

  • flat_line - Fits a flat line to the data

Descriptions of the forward models can be found here

Spectral grid parametrisation
Sets the parametrisation of the spectral grid that will be used for the computation of the high-resolution spectrum. This high-resolution grid should generally be finer than that of the observational data.. The following options are available:
  • const_wavelength x - a constant step in wavelength space with a step size of x in \(\mathrm{\mu m}\)

  • const_wavenumber x - a constant step in wavenumber space with a step size of x \(\mathrm{cm}^{-1}\)

  • const_resolution x - a constant spectral resolution \(x = \lambda/\Delta\lambda\)

Opacity data folder
Location of the folder with opacities for the gas species. Details on the required format of the opacity data can be found in this section
Use error inflation prior
Determines the use of the error inflation. This will artificially enlarge the error bars of the observational data and, thus, will generally make it easier for the retrieval to find a solution. The use of the error inflation acknowledges that certain physical or chemical processes are missing from the simple forward model of the retrieval. The form of the employed error inflation is described in Kitzmann et al. (2020)

The remaining parameters refer to the nested-sampling code MultiNest. For a description of the MultiNest code and its parameters, we refer to Feroz & Hobs (2009) and Feroz et al. (2008) .

The parameters that can be set here include:

Importance nested sampling
Turns the use of importance nested sampling on or off. To use importance nested sampling, set this parameter to either Y or 1. Any other value is interpreted as N. Importance nested sampling requires a bit more memory but, on the other hand, also increases the convergence speed and the overall accuracy of the Bayesian evidence calculation. Unless memory is a real bottleneck, there should be no reason to deactivate important nested sampling.
Mode separation
MultiNest has the ability to trace different modes in a posterior distribution and to save them separately. This option turns the use of mode separation on or off. To use it, set this parameter to either Y or 1. Any other value is interpreted as N. Note that this option in untested and might not work properly.
Number of live points
Sets the number of live points uses by MultiNest. Generally speaking, a high-dimensional parameter space requires a higher amount of live points. It is strongly recommended to perform sensitivity tests by increasing this number and check if the posterior distributions have converged.
Efficiency
Sets the efficiency that determines the way MultiNest draws new points from the parameter space. For more details on this parameter check the MultiNest documentation. The authors of MultiNest suggest to use an efficiency of 0.8 for parameter estimations and 0.3 when the Bayesian evidence is wanted at a high accuracy.
Maximum number of iterations
The maximum number of iterations MultiNest will use before the nested sampling is stopped. A value of 0 indicates that MultiNest will perform the nested sampling until its convergence criteria are met.
Resume
If this is set to Y or 1, MultiNest will try to resume a previously started retrieval run. The files MultiNest needs to restart the nested sampling must all be present in the retrieval folder. This option is useful if BeAR is run on a cluster with a strict time limit. A restart can also be used if a previous MultiNest run was stopped at its maximum number of iterations.
Console feedback
MultiNest will regularly report the current total number of model evaluations and estimates for the Bayesian evidences when this option is turned on with Y or 1.
Print parameter values and likelihoods
Determines whether the parameter values and the computed likelihood values for all models should be displayed (Y or 1). If BeAR is run on a cluster and the terminal output is redirected to a file, it is usually a good idea to deactivate this option. Otherwise, the output file could become quite large.

Forward model configuration file

The file forward_model.config contains the configuration for the forward model. Its structure depends on the chosen model and is discussed in the section on forward models.

Prior distributions file

The priors.config file contains the information on the prior distributions of the free parameters. More information on the format of the prior distributions file can be found in the section and in the description of each forward model.

Observational data file

The observations.list file contains a list of data files with the observational data that the retrieval should use. Its structure depends on the chosen model and is discussed in the section on the observational data.

Postprocess configuration file

During the postprocess step after a retrieval calculation has been finished, BeAR can perform additional calculations. This includes the computation of spectra for the posterior sample, writing out all temperature structures, or computing the effective temperatures for emission spectroscopy retrievals.

The configuration file for the postprocess step is called post_process.config. This file is optional and BeAR will use default settings if it is not present. The structure of the file depends on the chosen forward model is discussed in the section on forward models.

Output files

After a retrieval calculation has been finished, the retrieval folder will contain a set of output files, either directly from the MultiNest sampler or from the postprocess step of BeAR.

The most important MultiNest files are:

  • post_equal_weights.dat - the posterior distributions of the model parameters and likelihood values

  • summary.dat and stats.dat - a basic summary and some statistics of the nested sampling results, including the Bayesian evidence

The folder will also contain additional MultiNest files that were used during the nested sampling process. More detailed descriptions of the files’ contents can be found in the MultiNest documentation in its GitHub repository

If the corresponding option in the optional post_process.config has been enabled, BeAR will delete MultiNest files that are not required for the postprocessing step.

The postprocess step will write out additional files, depending on the chosen forward model. This can include:

  • spectrum_post_XXXX.dat - the spectra for the observation/instrument XXXX for the posterior sample. Each observational data set used in the retrieval will have a separate posterior spectrum file, where XXXX is the name stated in the header of the observational data file. The spectra are binned to each observational data. The first column contains the wavelength in \(\mathrm{\mu m}\), while all other columns are the spectra for each posterior sample. Thus, there are as many spectrum columns as there are posterior samples in the posterior distribution file post_equal_weights.dat. If the original data set was either band-spectroscopy or photometry, the wavelengths in the first column refer to the centre of each spectral bin.

  • spectrum_best_fit_hr.dat - the high-resolution spectrum for the best-fit model, i.e. the model with the highest likelihood. This spectrum is saved at the same resulution as the high-resolution grid used in the retrieval. The first column contains the wavelength in \(\mathrm{\mu m}\), while the second column is the high-resolution spectrum.

  • temperature_structures.dat - the temperature structures for the posterior sample. The first column is the atmospheric pressure in bar. All other columns contain the temperatures at these pressures for each posterior sample. The number of temperature columns is equal to the number of posterior samples.

  • effective_temperatures.dat - the effective temperatures for the posterior sample. Each line contains the effective temperature for one posterior sample.

  • chem_XXX.dat - the mixing ratio of a chemical species XXX (for example H2O) for the posterior sample. The first column is the atmospheric pressure in bar. All other columns contain the mixing ratios at these pressures for each posterior sample. The number of mixing ratio columns is equal to the number of posterior samples.

  • contribution_function_XXXX.dat - the contribution functions for the observation/instrument XXXX for the best-fit model. Each observational data set used in the retrieval will have a separate contribution file, where XXXX is the name stated in the header of the observational data file. The first column contains the atmospheric pressure in bar, while all other columns contain the contribution functions for each wavelength/wavelength bin of the observational data. The number of contribution function columns is, thus, equal to the number of wavelengths/wavelength bins in the observational data. Note that the wavelengths are not saved in this file and have to be taken from either the corresponding observational data file or spectrum posterior file.