Setting up and starting a retrieval¶
To start a retrieval calculation, the executable bear
needs to be called from the command line
with a folder that contains the necessary configuration files as a command line argument:
./bear path_to_retrieval_folder/
The folder typically contains all the necessary configuration files for the retrieval and observational data. Output files will also be written to this folder.
Additionally, BeAR can optionally use the following command line arguments:
-r
- restart a retrieval that has been interrupted
-p
- perform only the postprocessing step
The additional arguments are added after the folder path:
./bear path_to_retrieval_folder/ -p
To restart a retrieval, the folder needs to contain all necessary MultiNest files from the previous run. Likewise, for the postprocess, the posterior data needs to be present in the folder. The GitHub repository of BeAR contains an example for each forward model that can be used to test the retrieval code or as templates for other retrievals. The example folders contain all necessary files to run a retrieval calculation.
Configuration files¶
BeAR requires the following files in the folder the executable is called with:
retrieval.config
- the main configuration file for the retrieval
forward_model.config
- the configuration file of chosen forward forward model
priors.config
- the setup list for the prior distributions of the free parameters
observations.list
- the list of observational data files that the retrieval should use
Optionally, the postprocessing step can be configured with the following file:
post_process.config
- the configuration file for the postprocessing step
If this file is not present, BeAR will use default settings for the postprocessing.
Main retrieval file¶
The file retrieval.config
contains the basic information for the retrieval setup.
################
#General config#
################
#Use GPU
Y
#OpenMP processor number (if 0, use maximum)
0
#########################
General retrieval config#
#########################
#forward model type
secondary_eclipse
#Spectral grid parametrisation
const_wavenumber 1.0
#Opacity data folder
/media/data/opacity_data/helios-k/
#Use error inflation prior
N
#####################
#Multinest parameter#
#####################
#Importance nested sampling
Y
#Mode separation
N
#Number of live points
800
#Efficiency
0.8
#Maximum number of iterations (0 for no limit)
0
#Resume
N
#Console feedback
Y
#Print parameter values and likelihoods
Y
The following parameters need to be set:
Use GPU
Y
or 1
to run the calculation on the GPU.
Any other input is interpreted as running purely on the CPU.OpenMP processor number
Use GPU
is enabled. This, for example, is the case for the
FastChem chemistry code that doesn’t run GPUs. BeAR will run certain calculations in parallel on the CPU as well, using
OpenMP. Set this parameter to 0
if you want to use all available cores. Note that OpenMP can only use a single,
multi-core processor.forward model type
transmission
- Transmission spectrum
secondary_eclipse
- Secondary eclipse / occultation spectrum
emission
- Emission spectrum
flat_line
- Fits a flat line to the dataDescriptions of the forward models can be found here
Spectral grid parametrisation
const_wavelength x
- a constant step in wavelength space with a step size ofx
in \(\mathrm{\mu m}\)
const_wavenumber x
- a constant step in wavenumber space with a step size ofx
\(\mathrm{cm}^{-1}\)
const_resolution x
- a constant spectral resolution \(x = \lambda/\Delta\lambda\)
Opacity data folder
Use error inflation prior
The remaining parameters refer to the nested-sampling code MultiNest. For a description of the MultiNest code and its parameters, we refer to Feroz & Hobs (2009) and Feroz et al. (2008) .
The parameters that can be set here include:
Importance nested sampling
Y
or 1
. Any other value is interpreted as N
. Importance nested sampling requires
a bit more memory but, on the other hand, also increases the convergence speed and
the overall accuracy of the Bayesian evidence calculation. Unless memory is a real
bottleneck, there should be no reason to deactivate important nested sampling.Mode separation
Y
or 1
. Any other value is interpreted as N
. Note that this option in untested and
might not work properly.Number of live points
Efficiency
0.8
for parameter estimations and 0.3
when the Bayesian evidence is
wanted at a high accuracy.Maximum number of iterations
0
indicates that MultiNest will perform the nested sampling until its convergence criteria are met.Resume
Y
or 1
, MultiNest will try to resume a previously started retrieval run. The files MultiNest
needs to restart the nested sampling must all be present in the retrieval folder. This option is useful if BeAR
is run on a cluster with a strict time limit. A restart can also be used if a previous MultiNest run was stopped
at its maximum number of iterations.Console feedback
Y
or 1
.Print parameter values and likelihoods
Y
or 1
). If BeAR is run on a cluster and the terminal output is redirected to a file,
it is usually a good idea to deactivate this option. Otherwise, the output file could become quite large.Forward model configuration file¶
The file forward_model.config
contains the configuration for the forward model.
Its structure depends on the chosen model and is discussed in the section on forward models.
Prior distributions file¶
The priors.config
file contains the information on the prior distributions of the free parameters. More information
on the format of the prior distributions file can be found in the section and in the
description of each forward model.
Observational data file¶
The observations.list
file contains a list of data files with the observational data that the retrieval should use.
Its structure depends on the chosen model and is discussed in the section on the observational data.
Postprocess configuration file¶
During the postprocess step after a retrieval calculation has been finished, BeAR can perform additional calculations. This includes the computation of spectra for the posterior sample, writing out all temperature structures, or computing the effective temperatures for emission spectroscopy retrievals.
The configuration file for the postprocess step is called
post_process.config
. This file is optional and BeAR will use default settings if it is not present. The structure of the
file depends on the chosen forward model is discussed in the section on forward models.
Output files¶
After a retrieval calculation has been finished, the retrieval folder will contain a set of output files, either directly from the MultiNest sampler or from the postprocess step of BeAR.
The most important MultiNest files are:
post_equal_weights.dat
- the posterior distributions of the model parameters and likelihood values
summary.dat
andstats.dat
- a basic summary and some statistics of the nested sampling results, including the Bayesian evidence
The folder will also contain additional MultiNest files that were used during the nested sampling process. More detailed descriptions of the files’ contents can be found in the MultiNest documentation in its GitHub repository
If the corresponding option in the optional post_process.config
has been enabled, BeAR will delete MultiNest files that are
not required for the postprocessing step.
The postprocess step will write out additional files, depending on the chosen forward model. This can include:
spectrum_post_XXXX.dat
- the spectra for the observation/instrumentXXXX
for the posterior sample. Each observational data set used in the retrieval will have a separate posterior spectrum file, whereXXXX
is the name stated in the header of the observational data file. The spectra are binned to each observational data. The first column contains the wavelength in \(\mathrm{\mu m}\), while all other columns are the spectra for each posterior sample. Thus, there are as many spectrum columns as there are posterior samples in the posterior distribution filepost_equal_weights.dat
. If the original data set was either band-spectroscopy or photometry, the wavelengths in the first column refer to the centre of each spectral bin.
spectrum_best_fit_hr.dat
- the high-resolution spectrum for the best-fit model, i.e. the model with the highest likelihood. This spectrum is saved at the same resulution as the high-resolution grid used in the retrieval. The first column contains the wavelength in \(\mathrm{\mu m}\), while the second column is the high-resolution spectrum.
temperature_structures.dat
- the temperature structures for the posterior sample. The first column is the atmospheric pressure in bar. All other columns contain the temperatures at these pressures for each posterior sample. The number of temperature columns is equal to the number of posterior samples.
effective_temperatures.dat
- the effective temperatures for the posterior sample. Each line contains the effective temperature for one posterior sample.
chem_XXX.dat
- the mixing ratio of a chemical speciesXXX
(for example H2O) for the posterior sample. The first column is the atmospheric pressure in bar. All other columns contain the mixing ratios at these pressures for each posterior sample. The number of mixing ratio columns is equal to the number of posterior samples.
contribution_function_XXXX.dat
- the contribution functions for the observation/instrumentXXXX
for the best-fit model. Each observational data set used in the retrieval will have a separate contribution file, whereXXXX
is the name stated in the header of the observational data file. The first column contains the atmospheric pressure in bar, while all other columns contain the contribution functions for each wavelength/wavelength bin of the observational data. The number of contribution function columns is, thus, equal to the number of wavelengths/wavelength bins in the observational data. Note that the wavelengths are not saved in this file and have to be taken from either the corresponding observational data file or spectrum posterior file.