To enable calibration in BT, it’s essential to run the model with a specified set of parameters. In fact, BT does not see the model as such, but requires a likelihood function with the interface likelihood(par), where par is a vector, but in this function you will probably then run the model with the parameters par, where par could remain a vector or be transformed into some other format, e.g. data.frame, matrix or list.
What happens next depends on how your model is programmed. The following steps are arranged according to speed and convenience. If your model has not been interfaced with R before, you will most likely have to skip to the last option.
Typically, no action is required. Ensure that you are able to call your model.
Typically this function will directly return the model outputs, so step 2 can be skipped.
If your model is already in the form of a DLL, or if you can prepare it as such, use the function to link R to your model.
If you implement this process, you will also usually return the
output directly via the dll
and not write to a file, which
means that step 2 can be skipped.
The problem with this approach is that you have to code the interface
to your dll
, which in most programming languages
technically means setting your variables as external or something like
that, so that they can be accessed from outside. The specifics of how
this works will vary depending on the programming language being
used.
RCPP provides a highly flexible platform to interface between R and
C/C++. RCPP provides a secure and powerful way to connect to R if your
model is coded in C/C++ (much more flexible than using the command line
or a dll
).
Nevertheless, code adjustments may be necessary for the interface, and beginners may find it challenging to resolve any technical issues. Attempting to interface an existing C/C++ model without prior experience using RCPP or at least substantial experience with C/C++ is not advisable.
In addition, step 2 can be skipped if you are implementing this, as
you will usually return the output directly via the dll
,
rather than writing to a file.
If your model is written in a compiled or interpreted language and accepts parameters via std I/O, wrapping is usually nothing more than writing the system call in an R function. For example
runMyModel(par){
# Create here a string with what you would write to call the model from the command line
systemCall <- paste("model.exe", par[1], par[2])
out = system(systemCall, intern = TRUE) # intern indicates whether to capture the output of the command as an R character vector
# write here to convert out in the apprpriate R classes
}
Note: If you encounter difficulties with the system command, try system2. If the model provides the output via std.out, you can catch and convert it and skip step 2. If your model writes the output to a file, proceed to step 2.
Many models use parameter files to read parameters. For this case, you want to do something like the following example
runMyModel(par, returnData = NULL){
writeParameters(par)
system("Model.exe")
if(! is.null(returnData)) return(readData(returnData)) # The readData function will be defined later
}
writeParameters(par){
# e.g.
# read template parameter fil
# replace strings in template file
# write parameter file
}
For some problems, it may be useful to create a setup function, as in the example below.
setUpModel <- function(parameterTemplate, site, localConditions){
# create the runModel, readData functions (see later) here
return(list(runModel, readData))
}
The way you write a parameter function depends on the file format you are using for the parameters. Usually, creating a template parameter file is recommended as a starting point, and then the parameters can be changed as required.
To achieve one of the previous options, you must change your model code. If the model is in C/C++, the best alternative is to go directly to RCPP.
If your model returns the output directly (which is highly preferable, ), you can skip this step.
You can skip this process if your model directly returns the output (which is highly recommended).
If you have a simple model, you might consider using the
runMyModel
function to return the model’s output directly.
This is suitable for cases in a) and b) namely, models that are already
in R, or receive parameters via the command line.
In contrast, more complex models generate a large number of outputs.
However, usually you do not need all of them. Therefore, it is more
useful to create one or multiple separate readData
or
getDate
functions. The only two cases I will consider here
are
Model is a dll If the model is a dll
file, it
would probably be best to implement appropriate getData
functions in the source code that can then be called from R. If your
model is in C and in a dll
, interfacing it via RCPP would
probably be easier because you can return R dataframes and other data
structures directly.
Model writes file output If the model generates file output,
create a getData
function that reads the model outputs and
returns the data in the desired format, typically the same format you
would use to represent your field data.
You should now see an example from R that looks like this
Although the final step is optional, we recommend doing it from the beginning because there is really no downside. You can work with R code in multiple files that can be run manually or incorporated into an R package directly. Creating and managing R packages is a straightforward process and makes it simpler to share your code, as everything, including help guides, is in one package. To create an R package, please refer to the tutorial . Please remember to write good documentation using Roxygen.
Runtime is often a concern when performing sensitivity analyses or calibrations. Ensure that your model has been optimized for maximum speed before parallelization.
One way to speed up the runtime of your model is to run it on multiple cores (CPUs). To do so, you have two choices:
Which of the two makes more sense depends a lot on your problem. Parallelizing the model itself will be especially interesting for very large models that could not otherwise be calibrated with MCMCs. However, this approach typically requires writing parallel C/C++ code and advanced programming skills, which is why we will not discuss it further here.
The usual advice in parallel computing anyway is to parallelize the outer loops first to minimize the communication overhead, which would suggest starting with parallelizing the model evaluations. This is also much easier to program. Even within that, there are two levels of parallelization possible:
Currently, BT only supports parallelization within MCMCs / SMCs, but it easy to also implement between sampler parallelization by hand. Both approaches are describe below.
Within-sampler parallelization is particularly useful for algorithms that can use a large number of cores in parallel, such as sensitivity analysis or SMC sampling. For MCMCs, the amount of parallelization that can be used depends on the settings and the algorithm. In general, MCMCs are, as the name implies, Markovian, i.e., they set up a chain of sequential model evaluations, and these calls cannot be fully parallelized. However, a number of MCMCs in the BT package uses MCMC algorithms that can be partly parallelized, in particular the population MCMC algorithms DE/DEzs/DREAM/DREAMzs. In all these cases, BT will automatically use parallelization of the BT setup to indicate that it is implemented.
How to do this? A first requirement is that your model is wrapped in an R function (see PREVIOUS SECTION). Once that is the case, R provides a number of ways to run functions in parallel. The easiest is to use the parallel package that comes with the R core. Other packages can be found on the Internet and in the CRAN task view at High Performance Computing
As an example, suppose we have the following very simple model:
To start a parallel computation, we must first create a cluster object. Here we start a cluster with 2 CPUs.
library(parallel)
cl <- makeCluster(2)
runParallel<- function(parList){
parSapply(cl, parList, mymodel)
}
runParallel(c(1,2))
You could use this principle to build your own parallelized
likelihood. However, something very similar to the previous loop is
automated in BayesianTools
. You can create a parallel model
evaluation function directly with the
generateParallelExecuter
function, or alternatively
directly in the createBayesianSetup
function.
If your model is thread-safe, you should be able to run it out of the box. Therefore, I recommend using the hand-coded parallelization only if your model is not thread-safe.
In addition to within-chain parallelization, you can also run
multiple MCMCs in parallel and later combine them into a single
McmcSamplerList
.
library(BayesianTools)
ll <- generateTestDensityMultiNormal(sigma = "no correlation")
bayesianSetup <- createBayesianSetup(likelihood = ll, lower = rep(-10, 3), upper = rep(10, 3))
settings = list(iterations = 200)
# run the several MCMCs chains either in seperate R sessions, or via R parallel packages
out1 <- runMCMC(bayesianSetup = bayesianSetup, sampler = "DEzs", settings = settings)
## Running DEzs-MCMC, chain 1 iteration 201 of 201 . Current logp -16.84623 -12.12165 -13.12708 . Please wait!
## runMCMC terminated after 0.0510000000000002seconds
## Running DEzs-MCMC, chain 1 iteration 201 of 201 . Current logp -12.16394 -12.36114 -13.63612 . Please wait!
## runMCMC terminated after 0.0540000000000003seconds
Thread safety generally means that you can run multiple instances of your code on your hardware. There are several things that can limit thread safety, such as