Bayesian Tools - General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics

Quick start

The purpose of this first section is to give you a quick overview of the most important functions of the BayesianTools (BT) package. For a more detailed description, see the following sections.

Install, load and cite the package

If you haven’t installed the package yet, either run

install.packages("BayesianTools")

or follow the instructions on https://github.com/florianhartig/BayesianTools to install a development or an older version.

Loading and citation

library(BayesianTools)
citation("BayesianTools")
## To cite package 'BayesianTools' in publications use:
## 
##   Hartig F, Minunno F, Paul S (2023). _BayesianTools: General-Purpose
##   MCMC and SMC Samplers and Tools for Bayesian Statistics_. R package
##   version 0.1.8, https://florianhartig.github.io/BayesianTools/,
##   <https://github.com/florianhartig/BayesianTools>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {BayesianTools: General-Purpose MCMC and SMC Samplers and Tools for Bayesian
## Statistics},
##     author = {Florian Hartig and Francesco Minunno and Stefan Paul},
##     year = {2023},
##     note = {R package version 0.1.8, 
## https://florianhartig.github.io/BayesianTools/},
##     url = {https://github.com/florianhartig/BayesianTools},
##   }

Note: BayesianTools calls a number of secondary packages. Of particular importance is coda, which is used in a number of plots and summarystatistics. If you use a lot of summary statistics and diagnostic plots, it would be nice to cite coda as well!

Pro Tip: If you are running a stochastic algorithm such as MCMC, you should always set or record your random seed to make your results reproducible (otherwise the results will change slightly each time you run the code).

set.seed(123)

In a real application, recording the session info would be helpful in ensuring reproducibility.

sessionInfo()

This session information includes the version number of R and all loaded packages.

The Bayesian Setup

The central object in the BT package is the BayesianSetup. This class contains the information about the model to be fit (likelihood), and the priors for the model parameters.

The createBayesianSetup function generates a BayesianSetup object. The function requires a log-likelihood and a log-prior (optional) as arguments. It then automatically creates the posterior and various convenience functions for the samplers.

Advantages of the BayesianSetup include

  1. support for automatic parallelization
  2. functions are wrapped in try-catch statements to avoid crashes during long MCMC evaluations
  3. and the posterior checks if the parameter is outside the prior first, in which case the likelihood is not evaluated (makes the algorithms faster for slow likelihoods).

If no prior information is provided, an unbounded flat prior is generated. If no explicit prior is specified, but lower and upper values are given, a standard uniform prior with the respective bounds is created, including the option to sample from this prior, which is useful for SMC and obtaining initial values. This option is used in the following example, which creates a multivariate normal likelihood density and a uniform prior for 3 parameters.

ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

A more detailed description of the BayesianSetup will follow below.

Hint: For an example of how to run these steps for a dynamic ecological model, see ?VSEM.

Run MCMC and SMC functions

After setup, you may want to run a calibration. The runMCMC function serves as the main wrapper for all other implemented MCMC/SMC functions. It always requires the following arguments:

  • bayesianSetup (alternatively, the log target function)
  • sampler name
  • list with settings for each sampler - if settings is not specified, the default value will be applied

The BT package provides a large class of different MCMC samplers, and it depends on the particular application which one is most suitable. For example, choosing sampler = "Metropolis" calls a versatile Metropolis-type MCMC with options for covariance adjustment, delayed rejection, tempering and Metropolis-within-Gibbs sampling. For details, see the later reference on MCMC samplers. When in doubt about the choice of MCMC sampler, we recommend using the default “DEzs”. This is also the default in the runMCMC() function.

iter = 1000
settings = list(iterations = iter, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)

The returned object is of class mcmcSampler, superclass bayesianOutput and contains the BayesianSetup, the sampler and the MCMC chain. We can continue sampling from the posterior via

out2 <- runMCMC(bayesianSetup = out)

In this case, the sampling continues with the same parameters as before, which means that we will sample for another 1000 iterations. However, you can change all parameters in settings at this point.

If you want to extract the MCMC chain from such an output, you can use the getSample() function. The getSample() function allows to specify which parameters to extract, start and end points (e.g. to discard burn-in), thinning, and can return the chain either as a matrix or as a coda object. Note that the getSample() function is also used by most downstream function and its parameters can thus be provided in ... in those downstream functions.

As an example how to use getSample, we’ll extract the first parameter of both the shorter and the longer MCMC chain and plot them

opar = par(mfrow=c(2,1))

plot(getSample(out, whichParameters = 1), type = "l", ylab = "par1", xlab = "iteration")
plot(getSample(out2, whichParameters = 1) , type = "l", ylab = "par1", xlab = "iteration")

par(opar)

Convergence checks for MCMCs

Before interpreting the results, MCMCs should be checked for convergence. We recommend the Gelman-Rubin convergence diagnostics,which are the standard used in most publications. The Gelman-Rubin diagnostics requires running multiple MCMCs (we recommend 3-5).

For all samplers, you can conveniently perform multiple runs using the nrChains argument. Alternatively, for runtime reasons, you can combine the results of three independent runMCMC() runs with nrChains = 1 using combineChains()

settings = list(iterations = iter, nrChains = 3, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
## Parameter(s) currentChain   not used in Metropolis
## 
## Parameter(s) currentChain   not used in Metropolis
## 
## Parameter(s) currentChain   not used in Metropolis

The result is an object of mcmcSamplerList, which should allow you to do everything you can do with an mcmcSampler object (sometimes with slightly different output).

Basic convergence is checked via so-called trace plots, which show the 3 MCMC chains in different colors.

tracePlot(out) # identical to plot(out)

The trace plot is examined for major problems (chains look different, systematic trends) and to decide on the burn-in, i.e. the point at which the MCMC has reached the sampling range (i.e. the chains no longer systematically go in one direction). Here, they have basically reached this point immediately, so we could set the burn-in to 0, but I choose 100, i.e. discard the first 100 samples in all further plots.

If the trace plots look good, we can look at the Gelman-Rubin convergence diagnostics. Note that you have to discard the burn-in.

gelmanDiagnostics(out, plot = T, start = 100)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## par 1       1.02       1.04
## par 2       1.01       1.04
## par 3       1.00       1.02
## 
## Multivariate psrf
## 
## 1.02

Usually, a value < 1.05 for each parameter and a msrf < 1.1 of 1.2 are considered sufficient for convergence.

Summarize the outputs

If we are happy with the convergence, we can plot and summarize all sampler from the console with the standard print() and summary() functions.

print(out, start = 100)
## [1] "mcmcSamplerList - you can use the following methods to summarize, plot or reduce this class:"
## [1] getSample plot      print     summary  
## see '?methods' for accessing help and source code
summary(out, start = 100)
## # # # # # # # # # # # # # # # # # # # # # # # # # 
## ## MCMC chain summary ## 
## # # # # # # # # # # # # # # # # # # # # # # # # # 
##  
## # MCMC sampler:  Metropolis 
## # Nr. Chains:  3 
## # Iterations per chain:  901 
## # Rejection rate:  0.68 
## # Effective sample size:  261 
## # Runtime:  0.74  sec. 
##  
## # Parameters
##         psf MAP   2.5% median 97.5%
## par 1 1.015   0 -1.801  0.087 2.138
## par 2 1.011   0 -2.071  0.027 2.123
## par 3 1.005   0 -2.071 -0.088 2.087
## 
## ## DIC:  12.207 
## ## Convergence 
##  Gelman Rubin multivariate psrf:  1.016 
##  
## ## Correlations 
##        par 1  par 2 par 3
## par 1  1.000 -0.041 -0.02
## par 2 -0.041  1.000 -0.06
## par 3 -0.020 -0.060  1.00

You can also use built-in plot()) functions from the package for visualization. The marginalPlot() can be either plotted as histograms with density overlay (default setting) or as a violin plot (see “?marginalPlot”).

correlationPlot(out)

marginalPlot(out, prior = TRUE)

Additional functions that may be used on all samplers are model selection scores, including the Deviance Information Criterion (DIC) and the marginal likelihood (see later section for details on calculating the Bayes factor), alongside the Maximum A Posteriori (MAP) value. A set of methods are available for calculation of marginal likelihood (refer to ?marginalLikelihood).

marginalLikelihood(out)
## $ln.ML
## [1] -8.912967
## 
## $ln.lik.star
## [1] -2.756816
## 
## $ln.pi.star
## [1] -8.987197
## 
## $ln.pi.hat
## [1] -2.831046
## 
## $method
## [1] "Chib"
DIC(out)
## $DIC
## [1] 12.20705
## 
## $IC
## [1] 15.54033
## 
## $pD
## [1] 3.333279
## 
## $pV
## [1] 3.521552
## 
## $Dbar
## [1] 8.873768
## 
## $Dhat
## [1] 5.540489
MAP(out)
## $parametersMAP
##         par 1         par 2         par 3 
##  0.0003419313  0.0002269956 -0.0002499187 
## 
## $valuesMAP
##  Lposterior Llikelihood      Lprior 
##  -11.744013   -2.756816   -8.987197

To extract part of the sampled parameter values, you can use the following process:

getSample(out, start = 100, end = NULL, thin = 5, whichParameters = 1:2)

BayesianSetup Reference

Reference on creating likelihoods

The likelihood should be provided as a log density function.

ll = logDensity(x)

See options for parallelization below. We will use a simple 3-d multivariate normal density for this demonstration.

ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

Parallelization of the likelihood evaluations

Likelihoods are often costly to compute. If this is the case for you, you should think about parallelization possibilities. The ‘createBayesianSetup’ function has an input variable ‘parallel’, with the following options

  • F / FALSE means no parallelization should be used
  • T / TRUE means that automatic parallelization options from R are used (careful: this will not work if your likelihood writes to file, or uses global variables or functions - see general R help on parallelization)
  • “external”, assumed that the likelihood is already parallelized. In this case, the function needs to accept a matrix with parameters as columns, and rows as the different model runs you want to evaluate. This is the most likely option to use if you have a complicated setup (file I/O, HPC cluster) that cannot be treated with the standard R parallelization.

Algorithms in the BayesianTools package can make use of parallel computing if this option is specified in the BayesianSetup. Note that currently, parallelization is used by the following algorithms: SMC, DEzs and DREAMzs sampler. It can also be used through the BayesianSetup with the functions of the sensitivity package.

Here are some more details about the parallelization.

1. In-build parallelization:

In-built parallelizing is the easiest way to use parallel computing. The “parallel” argument allows you to select the number of cores to use for parallelization. Alternatively, TRUE or “auto” will use all available cores except one. Now the proposals are evaluated in parallel. Technically, the built-in parallelization uses an R cluster to evaluate the posterior density function. The input to the parallel function is a matrix where each column represents a parameter and each row represents a proposal. This allows the proposals to be evaluated in parallel. For samplers that evaluate only one proposal at a time (namely the Metropolis-based algorithms and DE/DREAM without the ‘zs’ extension), parallelization cannot be used.

2. External parallelization

The second option is to use external parallelization. Here, parallelization is attempted in the user-defined likelihood function. To use external parallelization, the likelihood function must take a matrix of proposals and return a vector of likelihood values. In the proposal matrix, each row represents a proposal, and each column represents a parameter. In addition, you will need to specify the “external” parallelization in the “parallel” argument. In simple terms, using external parallelization involves the following steps

## Definition of likelihood function
likelihood <- function(matrix){
    # Calculate likelihood in parallel
    # Return vector of likelihood values
}

## Create Bayesian Setup
BS <- createBayesianSetup(likelihood, parallel = "external", ...)

## Run MCMC
runMCMC(BS, sampler = "SMC", ...)

3. Multi-core and cluster calculations

If you want to run your calculations on a cluster there are several ways to do it.

In the first case, you want to parallelize n internal (not total chains) on n cores. The argument parallel = T in createBayesianSetup() only allows parallelization on a maximum of 3 cores for the SMC, DEzs and DreamsSamplers. But by setting parallel = n in createBayesianSetup() to n cores, the internal chains of DEzs and DREAMzs will be parallelized on n cores. This only works for DEzs and DREAMzs samplers.

## n = Number of cores
n = 2
x <- c(1:10)
likelihood <- function(param) return(sum(dnorm(x, mean = param, log = T)))
bayesianSetup <- createBayesianSetup(likelihood, parallel = n, lower = -5, upper = 5)

## give runMCMC a matrix with n rows of proposals as startValues or sample n times from the previous created sampler
out <- runMCMC(bayesianSetup, settings = list(iterations = 1000))

In the second case, you want to parallelize n internal chains on n cores with an external parallelized likelihood function. Unlike the previous case, DEzs, DREAMzs, and SMC samplers can be parallelized this way.

### Create cluster with n cores
cl <- parallel::makeCluster(n)

## Definition of the likelihood
likelihood  <- function(X) sum(dnorm(c(1:10), mean = X, log = T))

## Definition of the likelihood which will be calculated in parallel. Instead of the parApply function, we could also define a costly parallelized likelihood
pLikelihood <- function(param) parallel::parApply(cl = cl, X = param, MARGIN = 1, FUN = likelihood)

## export functions, dlls, libraries
# parallel::clusterEvalQ(cl, library(BayesianTools))
parallel::clusterExport(cl, varlist = list(likelihood))

## create BayesianSetup
bayesianSetup <- createBayesianSetup(pLikelihood, lower = -10, upper = 10, parallel = 'external')

## For this case we want to parallelize the internal chains, therefore we create a n row matrix with startValues, if you parallelize a model in the likelihood, do not set a n*row Matrix for startValue
settings = list(iterations = 100, nrChains = 1, startValue = bayesianSetup$prior$sampler(n))

## runMCMC
out <- runMCMC(bayesianSetup, settings, sampler = "DEzs")

In another case, your likelihood requires a parallelized model. Start your cluster and export your model, the required libraries, and dlls. Now you can start your calculations with the argument parallel = external in createBayesianSetup().

### Create cluster with n cores
cl <- parallel::makeCluster(n)

## export your model
# parallel::clusterExport(cl, varlist = list(complexModel))

## Definition of the likelihood
likelihood  <- function(param) {
  # ll <- complexModel(param)
  # return(ll)
} 

## create BayesianSetup and settings
bayesianSetup <- createBayesianSetup(likelihood, lower = -10, upper = 10, parallel = 'external')
settings = list(iterations = 100, nrChains = 1)

## runMCMC
out <- runMCMC(bayesianSetup, settings)

In the last case, you can parallelize over whole chain calculations. However, the likelihood itself is not parallelized. Each chain is run on one core, and the likelihood is calculated on that core.

### Definition of likelihood function
x <- c(1:10)
likelihood <- function(param) return(sum(dnorm(x, mean = param, log = T)))

## Create BayesianSetup and settings
bayesianSetup <- createBayesianSetup(likelihood, lower = -10, upper = 10, parallel = F)
settings = list(iterations = 100000)

## Start cluster with n cores for n chains and export BayesianTools library
cl <- parallel::makeCluster(n)
parallel::clusterEvalQ(cl, library(BayesianTools))

## calculate parallel n chains, for each chain the likelihood will be calculated on one core
out <- parallel::parLapply(cl, 1:n, fun = function(X, bayesianSetup, settings) runMCMC(bayesianSetup, settings, sampler = "DEzs"), bayesianSetup, settings)

## Combine the chains
out <- createMcmcSamplerList(out)

** Note: Even though parallelization can significantly reduce the computation time, it is not always useful because of the so-called communication overhead (computational time for distributing and retrieving information from the parallel cores). For models with low computational cost, this procedure may take more time than the actual evaluation of the likelihood. If in doubt, do a small run-time comparison before starting your large sampling. **

Reference on creating priors

The prior in the BayesianSetup consists of four parts

  • A log density function
  • An (optional) sampling function (must be a function without parameters, that returns a draw from the prior)
  • lower / upper boundaries
  • Additional info - best values, names of the parameters, …

This information can be passed by first creating an extra object, via createPrior, or via the createBayesianSetup function.

Creating priors

You have 5 options to create a prior

  • Do not set a prior - in this case, an infinite prior will be created
  • Set min/max values - a bounded flat prior and the corresponding sampling function will be created
  • Use one of the pre-definded priors, see ‘?createPrior’ for a list. One of the options here is to use a previous MCMC output as new prior. Pre-defined priors will usually come with a sampling function
  • Use a user-define prior, see ‘?createPrior’
  • Create a prior from a previous MCMC sample

The prior we choose depends on the prior information we have. For example, if we have no prior information, we can choose a uniform prior. The normal distribution is often used to model a wide range of phenomena in statistics, such as the distribution of heights or weights in a population. Beta distribution, on the other hand, is defined on the interval 0, 1. It is often used to model random variables that represent proportions, probabilities or other values that are constrained to lie within this interval.

$\color{darkgreen}{\text{createPrior}}$ $\color{darkgreen}{\text{createBetaPrior}}$ $\color{darkgreen}{\text{createUniformPrior}}$ $\color{darkgreen}{\text{createTruncatedNormalPrior}}$
Any density provided by the user Beta density Uniform density Normal density
createPrior(density, sampler, lower, upper, best) createBetaPrior(a, b, lower, upper) createUniformPrior(lower, upper, best) createTruncatedNormalPrior(mean, sd, lower, upper).

Creating user-defined priors

When creating a user-defined prior, the following information can/should be passed to createPrior

  • A log density function, as a function of a parameter vector x, same syntax as the likelihood
  • Additionally, you should consider providing a function that samples from the prior, because many samplers (SMC, DE, DREAM) can make use of this function for initial conditions. If you use one of the pre-defined priors, the sampling function is already implemented
  • lower / upper boundaries (can be set on top of any prior, to create truncation)
  • Additional info - best values, names of the parameters, …

Creating a prior from a previous MCMC sample

The following example from the help file illustrates the process

# Create a BayesianSetup
ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup = createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))

settings = list(iterations = 2500,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, settings = settings)


newPrior = createPriorDensity(out, method = "multivariate", eps = 1e-10, lower = rep(-10, 3), upper =  rep(10, 3), best = NULL)
bayesianSetup <- createBayesianSetup(likelihood = ll, prior = newPrior)

settings = list(iterations = 1000,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, settings = settings)

MCMC sampler reference

The different MCMC samplers

Note: Please recall, that every sampler can be accessed through the runMCMC() function. For a general introduction of how to run a MCMC sampling and how to generate the BayesianSetup-object, we refer to The Bayesian Setup and Run MCMC and SMC functions. Also please note, that every result should be checked for convergence before interpretation (see Convergence checks for MCMCs).

For simplicity, we will define a fixed number of iterations.

iter = 10000

The Metropolis MCMC class

The BayesianTools package is able to run a large number of Metropolis-Hastings (MH) based algorithms. All of these samplers can be accessed by specifying sampler = "Metropolis" in the runMCMC().

The subsequent code provides an overview of the default settings of the MH sampler.

The following code gives an overview about the default settings of the MH sampler.

applySettingsDefault(sampler = "Metropolis")
## $sampler
## [1] "Metropolis"
## 
## $startValue
## NULL
## 
## $iterations
## [1] 10000
## 
## $burnin
## [1] 0
## 
## $thin
## [1] 1
## 
## $consoleUpdates
## [1] 100
## 
## $parallel
## NULL
## 
## $message
## [1] TRUE
## 
## $optimize
## [1] TRUE
## 
## $proposalGenerator
## NULL
## 
## $adapt
## [1] FALSE
## 
## $adaptationInterval
## [1] 500
## 
## $adaptationNotBefore
## [1] 3000
## 
## $DRlevels
## [1] 1
## 
## $proposalScaling
## NULL
## 
## $adaptationDepth
## NULL
## 
## $temperingFunction
## NULL
## 
## $gibbsProbabilities
## NULL
## 
## $nrChains
## [1] 1
## 
## $runtime
## [1] 0
## 
## $sessionInfo
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BayesianTools_0.1.8 rmarkdown_2.29     
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-1         jsonlite_1.8.9       compiler_4.4.2      
##  [4] Rcpp_1.0.13-1        stringr_1.5.1        DHARMa_0.4.7        
##  [7] jquerylib_0.1.4      splines_4.4.2        boot_1.3-31         
## [10] yaml_2.3.10          fastmap_1.2.0        lattice_0.22-6      
## [13] coda_0.19-4.1        R6_2.5.1             Brobdingnag_1.2-9   
## [16] knitr_1.49           MASS_7.3-61          tmvtnorm_1.6        
## [19] nloptr_2.1.1         maketools_1.3.1      minqa_1.2.8         
## [22] bslib_0.8.0          rlang_1.1.4          cachem_1.1.0        
## [25] stringi_1.8.4        xfun_0.49            IDPmisc_1.1.21      
## [28] sass_0.4.9           sys_3.4.3            cli_3.6.3           
## [31] magrittr_2.0.3       digest_0.6.37        grid_4.4.2          
## [34] gmm_1.8              mvtnorm_1.3-2        sandwich_3.1-1      
## [37] lme4_1.1-35.5        lifecycle_1.0.4      nlme_3.1-166        
## [40] evaluate_1.0.1       glue_1.8.0           numDeriv_2016.8-1.1 
## [43] zoo_1.8-12           buildtools_1.0.0     bridgesampling_1.1-2
## [46] stats4_4.4.2         tools_4.4.2          htmltools_0.5.8.1

Here are some examples of how to apply different settings. Activate individual options or combinations as demonstrated.

Standard MH MCMC

The following settings run the standard Metropolis Hastings MCMC.

Refernences: Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. Biometrika 57 (1), 97-109.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller (1953). Equation of state calculations by fast computing machines. The journal of chemical physics 21 (6), 1087 - 1092.

settings <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = F, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Standard MH MCMC, prior optimization

Prior to the sampling process, this sampler employs an optimization step. The purpose of optimization is to improve the initial values and the covariance of the proposal distribution.

settings <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T, message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Adaptive MCMC, prior optimization

The adaptive Metropolis sampler (AM) uses the information already obtained during the sampling process to improve (or adapt) the proposal function. The BayesianTools package adjusts the covariance of the proposal distribution by utilizing the history of the chain.

References: Haario, H., E. Saksman, and J. Tamminen (2001). An adaptive metropolis algorithm. Bernoulli , 223-242.

settings <- list(iterations = iter, adapt = T, DRlevels = 1, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Standard MCMC, prior optimization, delayed rejection

Although rejection is an essential step in an MCMC algorithm, it can also mean that the proposal distribution is (locally) poorly tuned to the target distribution. In a delayed rejection (DR) sampler, a second (or third, etc.) proposal is made before rejection. This proposal is usually drawn from a different distribution, allowing for greater flexibility of the sampler. In the BayesianTools package, the number of delayed rejection steps and the scaling of the proposals can be specified. ** Note that the current version supports only two delayed rejection steps. **

References: Green, Peter J., and Antonietta Mira. “Delayed rejection in reversible jump Metropolis-Hastings.” Biometrika (2001): 1035-1053.

settings <- list(iterations = iter, adapt = F, DRlevels = 2, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Adaptive MCMC, prior optimization, delayed rejection

The Delayed Rejection Adaptive Metropolis (DRAM) sampler is simply a combination of the two previous samplers (DR and AM).

References: Haario, Heikki, et al. “DRAM: efficient adaptive MCMC.” Statistics and Computing 16.4 (2006): 339-354.

settings <- list(iterations = iter, adapt = T, DRlevels = 2, gibbsProbabilities = NULL, temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Standard MCMC, prior optimization, Gibbs updating

To reduce the dimensions of the target function, a Metropolis-within-Gibbs sampler can be run with the BayesianTools package. This means that only a subset of the parameter vector is updated in each iteration. In the example below, at most two (of the three) parameters are updated at each step, and it is twice as likely to vary one as to vary two.

** Note that currently adaptive cannot be mixed with Gibbs updating! **

settings <- list(iterations = iter, adapt = T, DRlevels = 1, gibbsProbabilities = c(1,0.5,0), temperingFunction = NULL, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Standard MCMC, prior optimization, gibbs updating, tempering

Simulated tempering is closely related to simulated annealing (e.g., Bélisle, 1992) in optimization algorithms. The idea of tempering is to increase the acceptance rate during burn-in. This should lead to a faster initial scanning of the target function. To incorporate this, a tempering function must be provided by the user. The function describes how to influence the acceptance rate during burn-in. In the example below, an exponential decline approaching 1 (= no influence on the acceptance rate) is used.

References: Bélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on rd. Journal of Applied Probability, 885–895.

C. J. Geyer (2011) Importance sampling, simulated tempering, and umbrella sampling, in the Handbook of Markov Chain Monte Carlo, S. P. Brooks, et al (eds), Chapman & Hall/CRC.

temperingFunction <- function(x) 5 * exp(-0.01*x) + 1
settings <- list(iterations = iter, adapt = F, DRlevels = 1, gibbsProbabilities = c(1,1,0), temperingFunction = temperingFunction, optimize = T,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Metropolis", settings = settings)
plot(out) 

Differential Evolution MCMC

The BT package implements two differential evolution MCMCs. When in doubt, use the DEzs option.

The first is the normal DE MCMC, according to Ter Braak, Cajo JF. “A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces”. Statistics and Computing 16.3 (2006): 239-249. This sampler runs multiple chains in parallel (but not in the sense of parallel computing). The main difference to the Metropolis based algorithms is the generation of the proposal. In general, all samplers use the current position of the chain and add a step in the parameter space to generate a new proposal. While in the Metropolis based sampler this step is usually drawn from a multivariate normal distribution (but any distribution is possible), the DE sampler uses the current position of two other chains to generate the step for each chain. For successful sampling, at least 2*d chains, where d is the number of parameters, must be run in parallel.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DE", settings = settings)
plot(out) 

The second is the Differential Evolution MCMC with snooker update and sampling from past states, according to ter Braak, Cajo JF, and Jasper A. Vrugt. “Differential Evolution Markov Chain with Snooker Updater and Fewer Chains”. Statistics and Computing 18.4 (2008): 435-446. This extension covers two differences from the normal DE MCMC. First, it uses a snooker update based on a user-defined probability. Second, past states of other chains are taken into account when generating the proposal. These extensions allow fewer chains (i.e., 3 chains are usually sufficient for up to 200 parameters) and parallel computing, since the current position of each chain depends only on the past states of the other chains.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
plot(out) 

DREAM sampler

There are two versions of the DREAM sampler. First, the standard DREAM sampler, see Vrugt, Jasper A., et al. “Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling”. International Journal of Nonlinear Sciences and Numerical Simulation 10.3 (2009): 273-290.

This sampler is largely based on the DE sampler with some significant changes:

  1. More than two chains can be used to generate a proposal.
  2. Randomized subspace sampling can be used to improve efficiency for high-dimensional posteriors. Each dimension is updated with a crossover probability CR. To speed up the exploration of the posterior, DREAM adjusts the distribution of CR values during burn-in to favor large jumps over small ones.
  3. Outlier chains can be removed during burn-in.
settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DREAM", settings = settings)
plot(out) 

The second implementation uses the same extension as the DEzs sampler. Namely sampling from past states and a snooker update. Again, this extension allows the use of fewer chains and parallel computing.

Again, if in doubt, you should prefer “DREAMzs”.

settings <- list(iterations = iter,  message = FALSE)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DREAMzs", settings = settings)
plot(out) 

T-walk

The t-walk is an MCMC algorithm developed by Christen, J. Andrés, and Colin Fox. “A general purpose sampling algorithm for continuous distributions (the t-walk)”. Bayesian Analysis 5.2 (2010): 263-281. The sampler uses two independent points to explore the posterior space. Based on probabilities, four different moves are used to generate proposals for the two points. As with the DE sampler, this procedure does not require tuning of the proposal distribution for efficient sampling in complex posterior distributions.

settings = list(iterations = iter,  message = FALSE) 

out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "Twalk", settings = settings)

Non-MCMC sampling algorithms

MCMCs sample the posterior space by creating a chain in parameter space. While this allows for “learning” from past steps, it does not allow for running a large number of posteriors in parallel.

An alternative to MCMCs are particle filters, also known as Sequential Monte-Carlo (SMC) algorithms. See Hartig, F.; Calabrese, J. M.; Reineking, B.; Wiegand, T. & Huth, A. Statistical inference for stochastic simulation models - theory and application Ecol. Lett., 2011, 14, 816-827

Rejection samling

The simplest option is to sample a large number of parameters and accept them according to their posterior value. This option can be emulated with the implemented SMC by setting iterations to 1.

settings <- list(initialParticles = iter, iterations= 1)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "SMC", settings = settings)
plot(out) 

Sequential Monte Carlo (SMC)

The more sophisticated option is to use the implemented SMC, which is basically a particle filter that applies multiple filter steps.

settings <- list(initialParticles = iter, iterations= 10)
out <- runMCMC(bayesianSetup = bayesianSetup, sampler = "SMC", settings = settings)
plot(out) 

Note that using a number for initialParticles requires that the bayesianSetup includes the option to sample from the prior.

Bayesian model comparison and averaging

There are a number of Bayesian methods for model selection and model comparison. The BT implements three of the most common: DIC, WAIC, and the Bayes factor.

  • On the Bayes factor, see Kass, R. E. & Raftery, A. E. Bayes Factors J. Am. Stat. Assoc., Amer Statist Assn, 1995, 90, 773-795

  • For an overview of DIC and WAIC, see Gelman, A.; Hwang, J. & Vehtari, A. (2014) Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997-1016-. On DIC, see also the original reference by Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P. & van der Linde, A. (2002) Bayesian measures of model complexity and fit. J. Roy. Stat. Soc. B, 64, 583-639.

The Bayes factor relies on the estimation of marginal likelihoods, which is numerically challenging. The BT package currently implements three methods

  • The recommended method is the “Chib” method (Chib and Jeliazkov, 2001), which is based on MCMC samples but performs additional calculation. Although this is the current recommendation, note that there are some numerical issues with this algorithm that may limit its reliability for larger dimensions.

  • The harmonic mean approximation, is implemented only for comparison. Note that the method is numerically unrealiable and usually should not be used.

  • The third method is simply sampling from the prior. While in principle unbiased, it converges only for a large number of samples and is therefore numerically inefficient.

Example

Data linear regression with quadratic and linear effect

sampleSize = 30
x <- (-(sampleSize-1)/2):((sampleSize-1)/2)
y <-  1 * x + 1*x^2 + rnorm(n=sampleSize,mean=0,sd=10)
plot(x,y, main="Test Data")

Likelihoods for both

likelihood1 <- function(param){
    pred = param[1] + param[2]*x + param[3] * x^2
    singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[4]^2), log = T)
    return(sum(singlelikelihoods))  
}

likelihood2 <- function(param){
    pred = param[1] + param[2]*x 
    singlelikelihoods = dnorm(y, mean = pred, sd = 1/(param[3]^2), log = T)
    return(sum(singlelikelihoods))  
}

Posterior definitions

setUp1 <- createBayesianSetup(likelihood1, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30))

setUp2 <- createBayesianSetup(likelihood2, lower = c(-5,-5,0.01), upper = c(5,5,30))

MCMC and marginal likelihood estimation

settings = list(iterations = 15000,  message = FALSE)
out1 <- runMCMC(bayesianSetup = setUp1, sampler = "Metropolis", settings = settings)
#tracePlot(out1, start = 5000)
M1 = marginalLikelihood(out1)
M1

settings = list(iterations = 15000,  message = FALSE)
out2 <- runMCMC(bayesianSetup = setUp2, sampler = "Metropolis", settings = settings)
#tracePlot(out2, start = 5000)
M2 = marginalLikelihood(out2)
M2

Model comparison via Bayes factor

Bayes factor (need to invert the log)

exp(M1$ln.ML - M2$ln.ML)
## [1] 2.613552e+21

BF > 1 means that the evidence favors M1. See Kass, R. E. & Raftery, A. E. (1995) Bayes Factors. J. Am. Stat. Assoc. Amer Statist Assn, 90, 773-795.

Assuming equal prior weights for all models, we can calculate the posterior weight of M1 as

exp(M1$ln.ML) / ( exp(M1$ln.ML) + exp(M2$ln.ML))
## [1] 1

If models have different model priors, multiply with the prior probabilities of each model.

The Deviance Information Criterion is a commonly used method to summarize the fit of an MCMC chain. It can be calculated using

DIC(out1)$DIC
## [1] 269.7765
DIC(out2)$DIC
## [1] 364.9808

Model Comparison via WAIC

The Watanabe-Akaike Information Criterion is another criterion for model comparison. To compute the WAIC, the model must implement a log-likelihood density that allows the log-likelihood to be computed pointwise (the likelihood functions require a “sum” argument that determines whether the summed log-likelihood should be returned). It can be obtained via

# This will not work, since likelihood1 has no sum argument
# WAIC(out1)

# likelihood with sum argument
likelihood3 <- function(param, sum = TRUE){
    pred <- param[1] + param[2]*x + param[3] * x^2
    singlelikelihoods <- dnorm(y, mean = pred, sd = 1/(param[4]^2), log = T)
    return(if (sum == TRUE) sum(singlelikelihoods) else singlelikelihoods)  
}
setUp3 <- createBayesianSetup(likelihood3, lower = c(-5,-5,-5,0.01), upper = c(5,5,5,30))
out3 <- runMCMC(bayesianSetup = setUp3, sampler = "Metropolis", settings = settings)

WAIC(out3)
## $WAIC1
## [1] 1104035854
## 
## $WAIC2
## [1] 1104035854
## 
## $lppd
## [1] -552017927
## 
## $pWAIC1
## [1] 1.848716e-05
## 
## $pWAIC2
## [1] 1.851485e-05