Fitting Hierarchical Bayes photosynthetic response curves

Introduction

This package is designed to fit the Farquhar, von Caemmerer, and Berry (1980) photosynthesis model [FvCB model] to leaf-level photosynthetic data. The statistical model generalizes the Hierarchial Bayesian approach described in Feng and Dietze (2013). The important advance that this approach provides is the ability to include covariates (e.g. leaf traits) in a mixed effects model describing key model parameters, α and Vcmax. At the moment the only supported random effect is a leaf-level effect, however fixed effects can be specified using standard R linear model notation, including interaction terms.

This package includes functions for: loading photosynthetic data in the common LI-COR text-based format, performing visual QA/QC on that data, fitting the model to data, and generating diagnostic response-curve plots with confidence and predictive intervals. The diagnostic and QA/QC functions assume that data was collected as CO2 and light response curves (A-Ci and A-Q respectively). This assumption is not required for model fitting, so alternative sampling designs are allowed, as described by Dietze (2014).

The version of the FvCB model used is described below, and at the moment does not include the TPU limitation term or temperature corrections (i.e. all data are assumed to be collected at the same leaf temperature, all parameters are specific to that temperature). It also assumes that electron transport rate, J, is a saturating function of Jmax.

$$A_j = \frac{\alpha Q}{\sqrt{1+(\alpha^2 Q^2)/(J_\textrm{max}^2)}} \frac{C_i- \Gamma}{4 C_i + 8 \Gamma}$$

$$A_c = V_{\textrm{cmax}} \frac{C_i - \Gamma}{C_i+ K_C (1+[O]/K_O) }$$

A(m) = min(Aj, Ac) − r

A(o) ∼ N(A(m), τ)

Parameter Symbol Definition
alpha0 α quantum yield (mol electrons/mole photon)
Jmax Jmax maximum electron transport
cp Γ CO2 compensation point
vmax0 Vcmax maximum Rubisco capacity (a.k.a Vcmax)
r Rd leaf respiration
tau τ residual precision
q Q PAR
pi Ci CO2 concentration

The ‘Parameter’ above refers to how the variable is referenced in the code, and thus the name that will be returned by the statistical fit.

The hierarchical version of this model is equivalent to the standard model except that α and Vcmax are mixed effect linear models of any covariates specified. These linear models assume uninformative Normal priors, while the random effects and residual errors are assumed to have Gamma priors on their precisions. All other priors are as described below in the example code. The FvCB model is fit using JAGS via the rjags package so outputs are coda mcmc.list objects that can be assessed and manipulated using standard tools in the coda package.

Install package from Github

If you have all of PEcAn installed, skip this – PEcAn.photosynthesis is already installed!

To install only the PEcAn.photosynthesis package, separately from the rest of PEcAn, run the following in R:

remotes::install_github("PecanProject/pecan/modules/photosynthesis")

You should only have to do this once.

Load library and example data

In this example we are using a set of files that are built into the package, but you could easily replace filenames with vector of your own filenames, for example using list.files to look up all the files in a directory

library(PEcAn.photosynthesis)
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.3.2
## Loaded modules: basemod,bugs
## Get list of LI-COR 6400 file names (ASCII not xls)
filenames <- system.file("extdata", paste0("flux-course-",rep(1:6,each=2),c("aci","aq")), package = "PEcAn.photosynthesis")

## Load files to a list
master = lapply(filenames, read_Licor)
## [1] "flux-course-1aci"
## [1] "flux-course-1aq"
## [1] "flux-course-2aci"
## [1] "flux-course-2aq"
## [1] "flux-course-3aci"
## [1] "flux-course-3aq"
## [1] "flux-course-4aci"
## [1] "flux-course-4aq"
## [1] "flux-course-5aci"
## [1] "flux-course-5aq"
## [1] "flux-course-6aci"
## [1] "flux-course-6aq"

run QA/QC checks

The code below performs a set of interactive QA/QC checks on the LI-COR data that’s been loaded. Because it’s interactive it isn’t run when this vignette is knit.

If you want to get a feel for how the code works you’ll want to run it first on just one file, rather than looping over all the files

master[[1]] <- Licor_QC(master[[1]])

On the first screen you will be shown an A-Ci curve. Click on points that are outliers that you want to remove. Be aware that these points will not change color in THIS SCREEN, but will be updated in the next. Also be aware that if your data set is not an A-Ci curve (or contains both A-Ci and A-Q curves) there are points that may look like outliers just because they are data from the other curve. When you are done selecting outliers, click [esc] to move to the next screen.

The second screen then verifies the status of point selections from the first screen and gives you an opportunity to unselect points that had been flagged as ‘fail’.

The third and fourth screens are the equivalent plots for the A-Q (light response) curves.

Finally, this function returns a copy of the original data with a new column, “QC”, added. This column will flag all passed values with 1, all unchecked values with 0, and all failed values with -1.

The function Licor_QC has an optional arguement, curve, which can be set to either “ACi” or “AQ” if you only want to perform one of these diagnostics rather than both (which is the default).

Also, the QC code attempts to automatically separate which points are part of the ACi curve from which parts are part of the AQ curve, based on how close points are to the the variable which is supposed to be held constant. The optional variable “tol” controls the tolerance of this filter, and is expressed as a proportion of the fixed value. The default value, 0.05, corresponds to a 5% deviation. For example, in the ACi curve the light level should be held constant so the code filters the PARi variable to find the mode and then included any data that’s within 5% of the mode in the ACi curve.

Once you have a feel for the QA/QC function, you’ll want to run it for all the data you’ve loaded.

for(i in 1:length(master)){
    master[[i]] = Licor_QC(master[[i]])
}

Merge data into one data frame

Once data been’s checked we’ll merge it into one large data frame

dat <- do.call("rbind", master)

## if QC was done, remove both unchecked points and those that fail QC
if("QC" %in% colnames(dat)){
  dat <- dat[-which(dat$QC < 1),]  
} else {
  QC <- rep(1, nrow(dat))
  dat <- cbind(dat, QC)
}

Fit FvCB model to composite data

If you only want to fit the model to the data from one leaf you could do this as

fit1 <- fitA(master[[1]])

Where the one required argument is the data frame of photosynthetic data. However, this code also allows use to fit the FvCB model to all of the data at once.

if(file.exists("fit.RData")){
  load("fit.RData")
} else{
  fit <- fitA(dat)
  save(fit, file = "fit.RData")
}
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 133
##    Unobserved stochastic nodes: 139
##    Total graph size: 2117
## 
## Initializing model

Because the MCMC can take a bit of time to run, in this example the code is written to load the existing fit if it exists, which just makes knitting the vignette more efficient.

The returned object is a list with two mcmc.lists, “params” and “predict”, and the text of the JAGS model that was fit. We can look at this model below:

writeLines(fit$model)
##   
## model{
## 
## ## Priors
##   Jmax0 ~ dlnorm(4.7,2.7)             ## maximum electron transport rate prior
##   alpha0~dnorm(0.25,100)             ##quantum yield  (mol electrons/mole photon) prior
##   vmax0 ~dlnorm(4.6,2.7)             ## maximum rubisco capacity prior
## 
##   #Jmax ~ dweibull(2.0,260)          ## maximum electron transport rate prior Serbin 2012
##   #alpha0 ~ dgamma(2.0,22.0)         ## quantum yield prior Serbin 2012
##   #vmax0 ~ dweibull(1.7,80)          ## maximum rate of carboxylation prior Serbin 2012
## 
##   r0 ~ dlnorm(0.75,1.56)             ## leaf respiration prior
##   #r ~ dweibull(2.0,6.0)             ## broad leaf respiration prior for trees
##   cp0 ~ dlnorm(1.9,2.7)              ## CO2 compensation point prior
##   tau ~ dgamma(0.1,0.1)
## #TPU  tpu~ dlnorm(3,2.8)             ##tpu
## 
## ## Constants: Bernacchi et al 2001, PC&E, Table 1
## R <- 8.3144621 ## gas constant
## r.c <- 18.72
## r.H <- 46.39
## Vc.c <- 26.35
## Vc.H <- 65.33
## Vo.c <- 22.98
## Vo.H <- 60.11
## cp.c <- 19.02
## cp.H <- 37.83
## cp.ref <- 42.75
## Kc.c <- 38.05
## Kc.H <- 79.43
## Kc.ref <- 404.9
## Ko.c <- 20.30
## Ko.H <- 36.38
## Ko.ref <- 278.4
## 
## ## Constants: June et al 2004, Funct Plant Bio
## Omega <- 18
## To <- 35    ## Representative value, would benifit from spp calibration!
## 
## ## Vcmax BETAS
## 
## #RLEAF.V  tau.Vleaf~dgamma(0.1,0.1)          ## add random leaf effects
## #RLEAF.V  for(i in 1:nrep){                  
## #RLEAF.V   Vleaf[i]~dnorm(0,tau.Vleaf)
## #RLEAF.V  }
## 
## ## alpha BETAs
## 
## #RLEAF.A  tau.Aleaf~dgamma(0.1,0.1)
## #RLEAF.A  for(i in 1:nrep){                  
## #RLEAF.A   Aleaf[i]~dnorm(0,tau.Aleaf)
## #RLEAF.A  }
## 
##   for(i in 1:n) {
## 
##      r[i]  <- r0 ##B01* exp(r.c - r.H/R/T[i])
##      cp[i] <- cp0 ##B01* exp(cp.c - cp.H/R/T[i])/cp.ref
##      Kc.T[i] <- Kc ##B01* exp(Kc.c - Kc.H/R/T[i])/Kc.ref
##      Ko.T[i] <- Ko ##B01* exp(Ko.c - Ko.H/R/T[i])/Ko.ref
##      Jmax[i] <- Jmax0 ##J04 * exp(-(T[i]-To)*(T[i]-To)/(Omega*Omega))
## 
##      alpha[i] <- alpha0 #AFORMULA
##      al[i]<-(alpha[i]*q[i]/(sqrt(1+(alpha[i]*alpha[i]*q[i]*q[i])/(Jmax[i]*Jmax[i]))))*(pi[i]-cp[i])/(4*pi[i]+8*cp[i])    ## electron transport limited without covariates
## 
##      vmax.refT[i] <- vmax0 #VFORMULA
##      vmax[i] <- vmax.refT[i] ##B01* exp(Vc.c - Vc.H/R/T[i])
##      ae[i]<- vmax[i]*(pi[i]-cp[i])/(pi[i]+Kc.T[i]*(1+po/Ko.T[i]))                                                    ## maximum rubisco limited without covariates
## 
## #TPU    ap[i]<-3*tpu                      ## phosphate limited
## 
##      pmean[i]<-min(al[i], ae[i]) - r[i]      ## predicted net photosynthesis
##      an[i]~dnorm(pmean[i],tau)            ## likelihood
##      pA[i] ~ dnorm(pmean[i],tau)          ## prediction
##      }
## 
##      foo <- rep[1] + nrep + T[1]                ## prevent warnings
## }

Note that the lines beginning with # are comments. Some of these comments are followed by specific tags, such as RLEAF.V and RLEAF.A, which are bits of code that will get turned on when we specify leaf random effects (see below).

The ‘params’ mcmc.list contains the parameter estimate MCMC chains, which we can do standard MCMC diagnositics on.

(Note: It is OK to ignore the par(mfrow = ..., mar = ...) calls that appear here and elsewhere. We use them in this vignette to cram all the subplots together as compactly as possible, but for everyday use of the package is is perfectly fine to leave these parameters at their default values.)

par(mfrow = c(6, 2), mar = c(4, 2, 2, 1))
plot(fit$params, auto.layout = FALSE)    ## MCMC diagnostic plots

summary(fit$params) ## parameter estimates  
## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean        SD  Naive SE Time-series SE
## Jmax0  1.410e+02 3.080e+01 2.515e-01      9.827e-01
## alpha0 2.743e-01 7.946e-02 6.488e-04      1.233e-03
## cp0    4.585e+01 1.273e+01 1.039e-01      4.144e-01
## r0     2.575e+00 1.574e+00 1.285e-02      3.984e-02
## tau    7.074e-03 9.008e-04 7.355e-06      9.863e-06
## vmax0  9.265e+01 6.431e+01 5.251e-01      1.522e+00
## 
## 2. Quantiles for each variable:
## 
##             2.5%       25%       50%       75%     97.5%
## Jmax0  1.019e+02 1.239e+02 1.360e+02 1.504e+02 2.241e+02
## alpha0 1.302e-01 2.181e-01 2.705e-01 3.270e-01 4.411e-01
## cp0    1.168e+01 4.008e+01 4.692e+01 5.349e+01 6.846e+01
## r0     5.035e-01 1.418e+00 2.237e+00 3.367e+00 6.556e+00
## tau    5.424e-03 6.445e-03 7.024e-03 7.666e-03 8.931e-03
## vmax0  2.706e+01 4.824e+01 7.557e+01 1.169e+02 2.635e+02
par(mfrow = c(3, 2), mar = c(4, 4, 2, 1))
coda::gelman.plot(fit$params, auto.layout = FALSE)

coda::gelman.diag(fit$params)
## Potential scale reduction factors:
## 
##        Point est. Upper C.I.
## Jmax0        1.02       1.02
## alpha0       1.00       1.00
## cp0          1.02       1.06
## r0           1.01       1.02
## tau          1.00       1.01
## vmax0        1.01       1.01
## 
## Multivariate psrf
## 
## 1.02

The ‘predict’ object can be used to perform standard predictive diagnostics and to construct CI around curves.

## predicted vs observed plot
par(mfrow = c(1, 1), mar = c(4, 4, 2, 1))
mstats <- summary(fit$predict)
pmean <- mstats$statistics[grep("pmean", rownames(mstats$statistics)), 1]
plot(pmean, dat$Photo, pch = "+", xlab = "Predicted A", ylab = "Observed A")
abline(0, 1, col = 2, lwd = 2)

par(mfrow = c(3, 4), mar = c(4, 4, 2, 1))
plot_photo(dat, fit)
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"

Refit with leaf-level random effects

Next, let’s look at how to specify leaf-level random effects in the model. To do so we’re going to add an extra arguement to the function, which is a list of model specifications.

A.model <- list(a.fixed = NULL, a.random = "leaf", 
                V.fixed = NULL, V.random = "leaf", 
                n.iter = 5000, match = "fname")

In this list a refers to α and V refers to Vcmax. Fixed refers to the specification of the fixed effects, which we’re leaving unspecified at the moment (NULL). Random refers to the specification of random effects, which we’re setting to ‘leaf’ in order allow alpha and Vcmax to vary on a leaf-by-leaf basis. Note: at the moment ‘leaf’ is the only valid random effect, though in the future we hope to allow the specification of arbitrary random effects in a covariate file (see next section for how to specify covariates). Next, n.iter refers to the number of MCMC iterations run by the model. Here we’re setting that to 5000, which is the default. You can increase this if you find the model isn’t converging. For example, if you wanted to increase the iterations with the previous default model, you would set all the fixed and random effects to NULL but increase n.iter. Finally, match is the variable used both to group records into individual leaves and to match individual leaves to covariate data. If you look at our photosynthesis dataframe, dat, you’ll see that it has a column fname that corresponds to the filename the data was read from (here we’re assuming each file contains the data for one leaf, but it’s not hard to add columns to your data if you want to group things in other ways)

Once the model is specified, the fitting and diagnostics are the same as before.

if(file.exists("fitI.RData")){
  load("fitI.RData")
} else{
  fitI <- fitA(dat,model = A.model)
  save(fitI, file="fitI.RData")
}
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 133
##    Unobserved stochastic nodes: 165
##    Total graph size: 2556
## 
## Initializing model
par(mfrow = c(8, 2), mar = c(4, 4, 2, 1))
plot(fitI$params, auto.layout = FALSE)    ## MCMC diagnostic plotssummary

par(mfrow = c(4, 2), mar = c(4, 4, 2, 1))
coda::gelman.plot(fitI$params, auto.layout = FALSE)

coda::gelman.diag(fitI$params)
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## Jmax0           1.14       1.29
## alpha0          1.06       1.19
## cp0             1.00       1.01
## r0              1.00       1.00
## tau             1.00       1.00
## tau.Aleaf       1.02       1.05
## tau.Vleaf       1.00       1.02
## vmax0           1.05       1.14
## 
## Multivariate psrf
## 
## 1.06
## predicted vs observed plot
par(mfrow = c(1, 1), mar = c(4, 4, 2, 1))
mstats <- summary(fitI$predict)
pmean <- mstats$statistics[grep("pmean",rownames(mstats$statistics)),1]
plot(pmean, dat$Photo, pch = "+", xlab = "Predicted A", ylab = "Observed A")
abline(0,1,col=2,lwd=2)

par(mfrow = c(3, 4), mar = c(4, 4, 2, 1))
plot_photo(dat,fitI)
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"

Fitting the model with covariates

Next, let’s look at how to fit a model that includes covariates. To begin with let’s load up some covariate data. In this specific case the covariate data, leaf % Nitrogen, is simulated because no covariates were actually measured for these leaves.

cov.data = read.csv(system.file("extdata", "cov.csv", package = "PEcAn.photosynthesis"))
knitr::kable(cov.data)
fname N
flux-course-1aci 0.0175286
flux-course-1aq 0.0238066
flux-course-2aci 0.0202750
flux-course-2aq 0.0258545
flux-course-3aci 0.0234969
flux-course-3aq 0.0240135
flux-course-4aci 0.0236842
flux-course-4aq 0.0238648
flux-course-5aci 0.0124040
flux-course-5aq 0.0239068
flux-course-6aci 0.0147226
flux-course-6aq 0.0239943

To define this model we’ll again define a list of model specifications. Specifically we are going to set Vcmax to be linear functions of leaf N. To do this we’ll set V.fixed “N” and we’ll set V.random to NULL to turn off the leaf random effect. For this example we’ll leave α as random.

C.model <- list(a.fixed = NULL, a.random = "leaf", 
                V.fixed= "N", V.random = NULL,
                n.iter = 5000, match = "fname")

If we had additional covariates, for example SLA and chl (leaf chlorophyll), we might write that model as “N + SLA + chl” if we just wanted the direct effects and “N + SLA + chl + NSLA + Nchl + SLA*chl” if we also wanted the pairwise interactions. That said, in this case we don’t have a large enough sample size to justify so many covariates.

Also note that the rows in the photosynthesis data are matched to rows in the covariate data using the column specified in the arguement ‘match’. The default for this, fname, is the filename for the photosynthetic data. Typically one file corresponds to one leaf so each set of traits match to each file. If you have your data organized differently you’ll want to specify a different column match.

To call the fitA function we now have to pass not just the photosynthetic data and the model, but also the covariate data. Otherwise the fit and diagnostics are identical to before.

if(file.exists("fitC.RData")){
  load("fitC.RData")
} else{
  fitC <- fitA(dat,cov.data,model = C.model)
  save(fitC,file="fitC.RData")
}
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 133
##    Unobserved stochastic nodes: 153
##    Total graph size: 2569
## 
## Initializing model
par(mfrow = c(8, 2), mar = c(4, 4, 2, 1))
plot(fitC$params,auto.layout = FALSE)    ## MCMC diagnostic plots

par(mfrow = c(4, 2), mar = c(4, 4, 2, 1))
summary(fitC$params) ## parameter estimates  
## 
## Iterations = 1:5000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                Mean       SD  Naive SE Time-series SE
## Jmax0     333.93748 13.48375 0.1100944      0.3760364
## alpha0      0.20416  0.05562 0.0004542      0.0183918
## betaVN      0.77229 31.48455 0.2570703      0.3250465
## cp0        57.49339  2.06671 0.0168746      0.0365990
## r0          0.52109  0.24847 0.0020288      0.0040528
## tau         0.08099  0.01084 0.0000885      0.0001143
## tau.Aleaf  16.75844  8.63980 0.0705436      0.2458134
## vmax0      81.19691  5.49322 0.0448519      0.1070215
## 
## 2. Quantiles for each variable:
## 
##                2.5%       25%      50%       75%    97.5%
## Jmax0     312.09149 324.68208 332.4252 341.19718 365.3705
## alpha0      0.09361   0.16277   0.2147   0.24777   0.2887
## betaVN    -60.14075 -20.53251   0.7050  21.97785  62.7717
## cp0        53.38477  56.09148  57.5105  58.90273  61.5065
## r0          0.16347   0.33950   0.4761   0.65926   1.1191
## tau         0.06125   0.07347   0.0806   0.08793   0.1033
## tau.Aleaf   4.83647  10.55004  15.1339  21.19091  37.8996
## vmax0      74.60347  77.67029  79.8960  83.23367  95.3309
coda::gelman.plot(fitC$params, auto.layout = FALSE)

coda::gelman.diag(fitC$params)
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## Jmax0           1.05       1.15
## alpha0          2.44       5.76
## betaVN          1.00       1.00
## cp0             1.00       1.00
## r0              1.00       1.01
## tau             1.00       1.01
## tau.Aleaf       1.09       1.25
## vmax0           1.00       1.00
## 
## Multivariate psrf
## 
## 1.92
## predicted vs observed plot
par(mfrow = c(1, 1),  mar = c(4, 4, 2, 1))
mstats <- summary(fitC$predict)
pmean <- mstats$statistics[grep("pmean", rownames(mstats$statistics)), 1]
plot(pmean, dat$Photo, pch = "+", xlab = "Predicted A", ylab = "Observed A")
abline(0,1,col=2,lwd=2)

par(mfrow = c(3, 4), mar = c(4, 4, 2, 1))
plot_photo(dat, fitC)
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"
## [1] "No AQ data available"
## [1] "No ACi data available"

Citations

Dietze, M.C. (2014). Gaps in knowledge and data driving uncertainty in models of photosynthesis. Photosynth. Res., 19, 3–14.

Farquhar, G., Caemmerer, S. & Berry, J.A. (1980). A biochemical model of photosynthetic CO2 assimilation in leaves of C3 species. Planta, 149, 78–90.

Feng, X. & Dietze, M.C. (2013). Scale dependence in the effects of leaf ecophysiological traits on photosynthesis: Bayesian parameterization of photosynthesis models. New Phytol., 200, 1132–1144.