Package 'PEcAnAssimSequential'

Title: PEcAn Functions Used for Ecological Forecasts and Reanalysis
Description: The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Authors: Mike Dietze
Maintainer: Mike Dietze <[email protected]>
License: BSD_3_clause + file LICENSE
Version: 1.10.0.9000
Built: 2026-06-05 14:54:39 UTC
Source: https://github.com/PecanProject/pecan

Help Index


adj.ens

Description

This functions gives weights to different ensemble members based on their likelihood during the analysis step. Then it adjusts the analysis mean estimates of state variables based on the estimated weights.

Usage

adj.ens(Pf, X, mu.f, mu.a, Pa)

Arguments

Pf

A cov matrix of forecast state variables.

X

Dataframe or matrix of forecast state variables for different ensembles.

mu.f

A vector with forecast mean estimates of state variables.

mu.a

A vector with analysis mean estimates of state variables.

Pa

The state estimate cov matrix of analysis.

Value

Returns a vector of adjusted analysis mean estimates of state variables.

Author(s)

Michael Dietze [email protected], Ann Raiho and Hamze Dokoohaki


Aggregation Function

Description

Aggregation Function

Usage

aggregate(downscale_output, polygon_data, func = "mean")

Arguments

downscale_output

Raster file output from downscale_function.R. Read file in this way if stored locally: downscale_output <- readRDS("xxx.rds")

polygon_data

A spatial polygon object (e.g., an 'sf' object) that defines the spatial units for aggregation. This data should be in a coordinate reference system compatible with the raster data (e.g., "EPSG:4326").

func

A character string specifying the aggregation function to use (e.g., 'mean', 'sum').

Details

This function will aggregate previously downscaled carbon flux amount to a spatial unit of choice

Value

It returns the 'polygon_data' with added columns for mean and sum values of the aggregated raster data for each ensemble member.

Author(s)

Harunobu Ishii

Examples

## Not run: 
       # Download a shapefile of U.S. (polygon data)
       url <- "https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_state_20m.zip"
       download.file(url, destfile = "polygon/us_states.zip")
       
       # Unzip the downloaded file and save locally
       unzip("polygon/us_states.zip", exdir = "polygon/us_states")
       us_states <- st_read("polygon/us_states/cb_2020_us_state_20m.shp")
       saveRDS(us_states, "polygon/us_states.rds")
       
       # Load the saved polygon data with Massachusetts as an example
       us_states <- readRDS("polygon/us_states.rds")
       state <- "MA"
       polygon_data <- st_transform(us_states[us_states$STUSPS == state, ], crs = "EPSG:4326")
     
       # Load the downscaled raster output
       downscale_output <- readRDS("path/to/downscale_output.rds")
       
       # Slot in as argument to the aggregate function
       result <- aggregate(downscale_output, polygon_data)
       print(result)
       
## End(Not run)

alltocs

Description

This function finds all the tic functions called before and estimates the time elapsed for each one saves/appends it to a csv file.

Usage

alltocs(fname = "tocs.csv")

Arguments

fname

string path to where the output needs to be saved as a csv file.

Value

This function writes down a csv file with three columns: 1- message sepecified in the 'tic' 2- Total elapsed time and 3- the execution time

Examples

## Not run: 
 library(tictoc)
 tic("Analysis")
 Sys.sleep(5)
 testfunc()
 tic("Adjustment")
 Sys.sleep(4)
 alltocs("timing.csv")  

## End(Not run)

Additive Log Ratio transform

Description

Additive Log Ratio transform

Usage

alr(y)

Arguments

y

state var


analysis_sda_block

Description

This function provides the block-based MCMC sampling approach.

Usage

analysis_sda_block(
  settings,
  block.list.all,
  X,
  obs.mean,
  obs.cov,
  t,
  nt,
  MCMC.args,
  block.list.all.pre = NULL
)

Arguments

settings

pecan standard multi-site settings list.

block.list.all

Lists of forecast and analysis outputs for each time point of each block. If t=1, we initialize those outputs of each block with NULL from the 'sda.enkf.multisite' function.

X

A matrix contains ensemble forecasts with the dimensions of '[ensemble number, site number * number of state variables]'. The columns are matched with the site.ids and state variable names of the inside the 'FORECAST' object in the 'sda.enkf.multisite' script.

obs.mean

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation means for each state variables of each site for each time point.

obs.cov

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation covariances for all state variables of each site for each time point.

t

time point in format of YYYY-MM-DD.

nt

total length of time steps, corresponding to the 'nt' variable in the 'sda.enkf.multisite' function.

MCMC.args

arguments for the MCMC sampling, details can be found in the roxygen strucutre for control list in the 'sda.enkf.multisite' function.

block.list.all.pre

pre-existed block.list.all object for passing the aqq and bqq to the current SDA run, the default is NULL. Details can be found in the roxygen structure for 'pre_enkf_params' of the 'sda.enkf.multisite' function.

Details

This function will add data and constants into each block that are needed for the MCMC sampling.

Value

It returns the 'build.block.xy' object and the analysis results.

Author(s)

Dongchen Zhang


Analysis.sda

Description

This functions uses the FUN to perform the analysis. EnKF function is developed inside the PEcAnAssimSequential package which can be sent to this function to perform the Ensemble Kalman Filter. The other option is GEF function inside the same package allowing to perform Generalized Ensemble kalman Filter.

If you're using an arbitrary function you can use the ... to send any other variables to your desired analysis function.

Usage

Analysis.sda(
  settings,
  FUN,
  Forecast = list(Pf = NULL, mu.f = NULL, Q = NULL, X = NULL),
  Observed = list(R = NULL, Y = NULL),
  H,
  extraArg,
  ...
)

Arguments

settings

pecan standard settings list.

FUN

A Function for performing the analysis step. Two available options are: 1-EnKF and 2-GEF.

Forecast

A list containing the forecasts variables including Q (process variance) and X (a dataframe of forecasts state variables for different ensemble)

Observed

A list containing the observed variables including R (cov of observed state variables) and Y (vector of estimated mean of observed state variables)

H

is a matrix of 1's and 0's specifying which observations go with which variables.

extraArg

This argument is a list containing aqq, bqq and t. The aqq and bqq are shape parameters estimated over time for the proccess covariance and t gives the time in terms of index of obs.list. See Details.

...

Extra argument sent to the analysis function. In case you're using the 'GEF' function, this function requires nt, obs.mean, obs.cov, which are the total number of steps, list of observed means and list of observed cov respectively.

Value

Returns whatever the FUN is returning. In case of EnKF and GEF, this function returns a list with estimated mean and cov matrix of forecast state variables as well as mean and cov estimated as a result of assimilation/analysis .

Author(s)

Michael Dietze [email protected], Ann Raiho and Hamze Dokoohaki


assess.params

Description

Assessing parameter estimations after mapping model output to tobit space

Usage

assessParams(dat, Xt, wts = NULL, mu_f_TRUE = NULL, P_f_TRUE = NULL)

Arguments

dat

MCMC output

Xt

ensemble output matrix

wts

ensemble weights

mu_f_TRUE

muf before tobit2space

P_f_TRUE

Pf before tobit2space

Value

make plots

Author(s)

Michael Dietze and Ann Raiho [email protected]


block_matrix

Description

This function is adopted from migest package.

Usage

block_matrix(x = NULL, b = NULL, byrow = FALSE, dimnames = NULL)

Arguments

x

Vector of numbers to identify each block.

b

Numeric value for the size of the blocks within the matrix ordered depending on byrow

byrow

logical value. If FALSE (the default) the blocks are filled by columns, otherwise the blocks in the matrix are filled by rows.

dimnames

Character string of name attribute for the basis of the block matrix. If NULL a vector of the same length of b provides the basis of row and column names.#'.

Value

Returns a matrix with block sizes determined by the b argument. Each block is filled with the same value taken from x.

Author(s)

Guy J. Abel


block.2.vector

Description

block.2.vector

Usage

block.2.vector(block.list, X, H, adjustment)

Arguments

block.list

lists of blocks generated by the 'build.block.xy' function.

X

A matrix contains ensemble forecasts.

H

H index created by the 'construct_nimble_H' function.

adjustment

logical variable determine if we want to adjust the analysis ensembles based on likelihood.

Value

It returns a list of analysis results by MCMC sampling.

Author(s)

Dongchen Zhang


build_X

Description

builds X matrix for SDA

Usage

build_X(
  out.configs,
  settings,
  new.params,
  nens,
  read_restart_times,
  outdir,
  t = 1,
  var.names,
  my.read_restart,
  restart_flag = FALSE
)

Arguments

out.configs

object created for build_X passed from sda.enkf_MultiSite

settings

settings object, passed from sda.enkf_MultiSite

new.params

object created from sda_matchparam, passed from sda.enkf_MultiSite

nens

number of ensemble members i.e. runs

read_restart_times

passed from sda.enkf_MultiSite

outdir

location of previous run output folder containing .nc files

t

Default t=1, for function to work within time loop

var.names

list of state variables taken from settings object

my.read_restart

object that points to the model restart function i.e. read_restart.SIPNET

restart_flag

flag if it's a restart stage. Default is FALSE.

Value

X ready to be passed to SDA Analysis code

Author(s)

Alexis Helgeson


build.block.xy

Description

This function split long vector and covariance matrix into blocks corresponding to the localization.

Usage

build.block.xy(settings, block.list.all, X, obs.mean, obs.cov, t)

Arguments

settings

pecan standard multi-site settings list.

block.list.all

List contains nt empty sub-elements.

X

A matrix contains ensemble forecasts.

obs.mean

List of dataframe of observation means, named with observation datetime.

obs.cov

List of covariance matrices of state variables , named with observation datetime.

t

time point.

Details

This function will add data and constants into each block that are needed for the MCMC sampling.

Author(s)

Dongchen Zhang


sda_assemble

Description

This function can help to assemble sda outputs (analysis and forecasts) from each job execution.

Usage

check.qsub.job.info(username)

Arguments

username

character: username for the HPC.

Value

list: number of jobs and vector of job ids.

Author(s)

Dongchen Zhang.


Weighted conjugate wishart

Description

Weighted conjugate wishart

Usage

conj_wt_wishart_sampler(model, mvSaved, target, control)

Arguments

model

model

mvSaved

copied to

target

thing being targetted

control

unused


Construc_H

Description

This function creates a matrix mapping obsereved data to their forecast state variable.

Usage

Construct_H(choose, Y, X)

Arguments

choose

a vector of observations indices oredered based on their appearances in the list of state variable names.

Y

vector of observations

X

Dataframe or matrix of forecast state variables for different ensembles.

Value

This returns a matrix specifying which observation go with which state variables.

Author(s)

Hamze Dokoohaki


construct_nimble_H

Description

This function is an upgrade to the Construct.H.multisite function which provides the index by different criteria.

Usage

construct_nimble_H(
  site.ids,
  var.names,
  obs.t,
  pft.path = NULL,
  by = "single",
  products = NULL
)

Arguments

site.ids

a vector name of site ids

var.names

vector names of state variable names

obs.t

list of vector of means for the time t for different sites.

pft.path

physical path to the pft.csv file.

by

criteria, it supports by variable, site, pft, all, and single Q.

products

vector names of products for each observation.

Value

Returns one vector containing index for which Q to be estimated for which variable, and the other vector gives which state variable has which observation (= element.W.Data).

Author(s)

Dongchen Zhang


Construct.H.multisite

Description

This function is makes the blocked mapping function.

Usage

Construct.H.multisite(site.ids, var.names, obs.t.mean, products = NULL)

Arguments

site.ids

a vector name of site ids

var.names

vector names of state variable names

obs.t.mean

list of vector of means for the time t for different sites.

products

vector names of products for each observation.

Value

Returns a matrix with block sizes determined by the b argument. Each block is filled with the same value taken from x.

Author(s)

Hamze


Construct.R

Description

Make sure that both lists are named with siteids.

Usage

Construct.R(site.ids, var.names, obs.t.mean, obs.t.cov)

Arguments

site.ids

a vector name of site ids

var.names

vector names of state variable names

obs.t.mean

list of vector of means for the time t for different sites.

obs.t.cov

list of list of cov for the time for different sites.

Value

This function returns a list with Y and R ready to be sent to the analysis functions.

Author(s)

Hamze Dokoohaki


Contruct.Pf

Description

The argument X needs to have an attribute pointing the state variables to their corresponding site. This attribute needs to be called 'Site'. At the moment, the cov between state variables at blocks defining the cov between two sites are assumed zero.

Usage

Contruct.Pf(
  site.ids,
  var.names,
  X,
  localization.FUN = NULL,
  t = 1,
  blocked.dis = NULL,
  ...
)

Arguments

site.ids

a vector name of site ids.

var.names

vector names of state variable names.

X

a matrix of state variables. In this matrix rows represent ensembles, while columns show the variables for different sites.

localization.FUN

This is the function that performs the localization of the Pf matrix and it returns a localized matrix with the same dimensions.

t

not used

blocked.dis

passed to 'localization.FUN'

...

passed to 'localization.FUN'

Value

It returns the var-cov matrix of state variables at multiple sites.

Author(s)

Hamze Dokoohaki


Title Identify pft for each site of a multi-site settings using NLCD and Eco-region

Description

Title Identify pft for each site of a multi-site settings using NLCD and Eco-region

Usage

Create_Site_PFT_CSV(settings, Ecoregion, NLCD, con)

Arguments

settings

a multi-site settings

Ecoregion

path of Ecoregion data (*.shp)

NLCD

path of NLCD img file

con

connection to bety

Value

pft info with each site

Examples

## Not run: 
 NLCD <- file.path(
   "/fs", "data1", "pecan.data", "input",
   "nlcd_2001_landcover_2011_edition_2014_10_10",
   "nlcd_2001_landcover_2011_edition_2014_10_10.img")
 Ecoregion <- file.path(
   "/projectnb", "dietzelab", "dongchen",
   "All_NEON_SDA", "NEON42", "eco-region", "us_eco_l3_state_boundaries.shp")
 settings <- PEcAn.settings::read.settings(
   "/projectnb/dietzelab/dongchen/All_NEON_SDA/NEON42/pecan.xml")
 con <- PEcAn.DB::db.open(settings$database$bety)
   site_pft_info <- Create_Site_PFT_CSV(settings, Ecoregion, NLCD, con)

## End(Not run)

debias.average

Description

This function helps to average the predicted residuals based on certain criteria. The default function is averaging. TODO: we will be averaging residuals based on ML precision for each residual prediction.

Usage

debias.average(residuals, by_product_accuracy = NULL)

Arguments

residuals

vector: predicted residuals across products.

by_product_accuracy

vector: the out-of-sample RMSE for each product.

Value

numeric: an averaged residual.

Author(s)

Dongchen Zhang


debias.extract.cov

Description

This function helps to extract the covariates based on date and coordinates from the settings object.

Usage

debias.extract.cov(cov.dir, date, site.info)

Arguments

cov.dir

character: physical path to the directory contains the time series covariate maps.

date

numeric or character: date that used to find the corresponding covariate file.

site.info

data.frame: data.frame that contains longitude and latitude in its first and second column.

Value

data.frame: a data frame containing M (rows) of locations across N (columns) variables.

Author(s)

Dongchen Zhang


debias.map.products

Description

This function helps to create the mapping operator that maps between product names and variable names by locations.

Usage

debias.map.products(obs.mean.t, obs_prep)

Arguments

obs.mean.t

list: lists of sites that contain observations for each variable.

obs_prep

list: lists contains basic information for each product of each observation.

Value

matrix: a matrix containing M (rows) of locations*variables across N (columns) products.

Author(s)

Dongchen Zhang


sda.bias.correction

Description

This function helps to correct the forecasts' biases based on ML (random forest) training on the previous time points.

Usage

debias.ML(pred.name, cov.names, dat.train, dat.pred, var.name, py.init)

Arguments

pred.name

character: the name for the predictive variable.

cov.names

character: the name for the covariates.

dat.train

data.frame: data frame containing associated covariates and predictive variable for ML training.

dat.pred

data.frame: data frame containing associated covariates for prediction.

var.name

character: variable name to be predicted.

py.init

R function: R function to initialize the python functions. Default is NULL. the default random forest will be used if 'py.init' is NULL.

Value

list: the ML predicted residuals and other ML outputs.

Author(s)

Dongchen Zhang


debias.res.lag.calc

Description

This function helps to calculate the lagged residual error time series.

Usage

debias.res.lag.calc(res.ts)

Arguments

res.ts

list: lists of residuals at each time point.

Value

list: lists of lagged residuals between time points.

Author(s)

Dongchen Zhang


debias.residual.calc

Description

This function helps to calculate the residual error for a certain time point and variable.

Usage

debias.residual.calc(obs.mean, all.X, t, var.name, data.source)

Arguments

obs.mean

List: lists of date times named by time points, which contains lists of sites named by site ids, which contains observation means for each state variables of each site for each time point.

all.X

list: lists of data frame of model forecast from the beginning to the current time points that has n (ensemble size) rows and n.var (number of variables) times n.site (number of locations) columns. (e.g., 100 ensembles, 4 variables, and 8,000 locations will end up with data.frame of 100 rows and 32,000 columns).

t

numeric: the current number of time points (e.g., t=1 for the beginning time point).

var.name

character: variable name to be predicted.

data.source

character: product name of the data.

Value

list: lists of residuals between forecasts and observations across at time t.

Author(s)

Dongchen Zhang


debias.train.accuracy

Description

This function helps to calculate the out-of-sample accuracy of residual predictions.

Usage

debias.train.accuracy(
  pred.name,
  cov.names,
  dat.train,
  var.name,
  py.init,
  ratio = 0.8
)

Arguments

pred.name

character: the name for the predictive variable.

cov.names

character: the name for the covariates.

dat.train

data.frame: data frame containing associated covariates and predictive variable for ML training.

var.name

character: variable name to be predicted.

py.init

R function: R function to initialize the python functions. Default is NULL. the default random forest will be used if 'py.init' is NULL.

ratio

numeric from 0 to 1: define the ratio of samples used for training. The rest samples will be used for calculating the out-of-sample accuracy. Default is 0.8.

Value

list: the variable importance of the ML and RMSE of the out-of-sample predictions.

Author(s)

Dongchen Zhang


downscale_main

Description

This is the main function to execute the machine learning training and prediction. Note it will be deployed by each node you requested if the qsub feature is enabled below.

Usage

downscale_main(
  settings,
  analysis,
  covariates.dir,
  time,
  variable,
  outdir,
  base.map.dir,
  method = "randomForest",
  cores = parallel::detectCores()
)

Arguments

settings

character: physical path that points to the pecan settings XML file.

analysis

numeric: data frame (rows: ensemble member; columns: site*state_variables) of updated ensemble analysis results from the 'sda_enkf' function.

covariates.dir

character: path to the exported covariates GeoTIFF file.

time

character: the time tag used to differentiate the outputs from others.

variable

character: name of state variable. It should match up with the column names of the analysis data frame.

outdir

character: the output directory where the downscaled maps will be stored.

base.map.dir

character: path to the GeoTIFF file within which the extents and CRS will be used to generate the ensemble maps.

method

character: machine learning method, default is randomForest (currently support randomForest and xgboost).

cores

numeric: how many CPus to be used in the calculation, the default is the total CPU number you have.

Value

paths to the ensemble downscaled maps.

Author(s)

Dongchen Zhang


downscale_qsub_main

Description

This qsub function helps to run the submitted qsub jobs for running the downscale_main function.

Usage

downscale_qsub_main(folder.path)

Arguments

folder.path

Character: physical path to which the job file is located.

Author(s)

Dongchen Zhang


weighted multivariate normal density

Description

weighted multivariate normal density

Usage

dwtmnorm(x, mean, prec, wt, log = 0)

Arguments

x

random variable

mean

mean

prec

precision

wt

weight

log

log


EnKF

Description

Given the Forecast and Observed this function performs the Ensemble Kalamn Filter.

Usage

EnKF(settings, Forecast, Observed, H, extraArg = NULL, ...)

Arguments

settings

pecan standard settings list.

Forecast

A list containing the forecasts variables including Q (process variance) and X (a dataframe of forecasts state variables for different ensemble)

Observed

A list containing the observed variables including R (cov of observed state variables) and Y (vector of estimated mean of observed state variables)

H

is a matrix of 1's and 0's specifying which observations go with which variables.

extraArg

This argument is NOT used inside this function but it is a list containing aqq, bqq and t. The aqq and bqq are shape parameters estimated over time for the proccess covariance and t gives the time in terms of index of obs.list. See Details.

...

Extra argument sent to the analysis function.

Value

It returns a list with estimated mean and cov matrix of forecast state variables as well as mean and cov estimated as a result of assimilation/analysis .

Author(s)

Michael Dietze [email protected], Ann Raiho and Hamze Dokoohaki


EnKF.MultiSite

Description

Given the Forecast and Observed this function performs the Ensemble Kalamn Filter.

Usage

EnKF.MultiSite(settings, Forecast, Observed, H, extraArg = NULL, ...)

Arguments

settings

pecan standard settings list.

Forecast

A list containing the forecasts variables including Q (process variance) and X (a dataframe of forecasts state variables for different ensemble)

Observed

A list containing the observed variables including R (cov of observed state variables) and Y (vector of estimated mean of observed state variables)

H

is a matrix of 1's and 0's specifying which observations go with which state variables.

extraArg

This argument is NOT used inside this function but it is a list containing aqq, bqq and t. The aqq and bqq are shape parameters estimated over time for the proccess covariance and t gives the time in terms of index of obs.list.

...

Extra argument sent to the analysis function.

Details

This function is different than 'EnKF' function in terms of how it creates the Pf matrix.

Value

It returns a list with estimated mean and cov matrix of forecast state variables as well as mean and cov estimated as a result of assimilation/analysis .

Author(s)

Michael Dietze [email protected], Ann Raiho and Hamze Dokoohaki


GEF

Description

Given the Forecast and Observed this function performs the Generalized Ensemble Kalamn Filter. The generalized ensemble filter follows generally the three steps of sequential state data assimilation. But, in the generalized ensemble filter we add a latent state vector that accounts for added process variance. Furthermore, instead of solving the analysis analytically like the EnKF, we have to estimate the mean analysis vector and covariance matrix with MCMC.

Usage

GEF(
  settings,
  Forecast,
  Observed,
  H,
  extraArg,
  nitr = 50000,
  nburnin = 10000,
  ...
)

GEF.MultiSite(settings, Forecast, Observed, H, extraArg, ...)

Arguments

settings

pecan standard settings list.

Forecast

A list containing the forecasts variables including Q (process variance) and X (a dataframe of forecast state variables for different ensemble)

Observed

A list containing the observed variables including R (cov of observed state variables) and Y (vector of estimated mean of observed state variables)

H

not used

extraArg

This argument is a list containing aqq, bqq and t. The aqq and bqq are shape parameters estimated over time for the process covariance and t gives the time in terms of index of obs.list. See Details.

nitr

Number of iterations to run each MCMC chain.

nburnin

Number of initial, pre-thinning, MCMC iterations to discard.

...

This function requires nt, obs.mean, obs.cov, which are the total number of steps, list of observed means and list of observed cov respectively.

Value

It returns a list with estimated mean and cov matrix of forecast state variables as well as mean and cov estimated as a result of assimilation/analysis .

Author(s)

Michael Dietze [email protected], Ann Raiho and Hamze Dokoohaki


multisite TWEnF

Description

multisite TWEnF

Usage

GEF.MultiSite.Nimble

Format

TBD


get_ensemble_weights

Description

Creates file of ensemble weights in format needed for SDA

Usage

get_ensemble_weights(settings, time_do)

Arguments

settings

PEcAn settings object

time_do

Give user specific time so you don't have to have it be annual

Value

NONE

Author(s)

Ann Raiho [email protected]


GrabFillMatrix

Description

GrabFillMatrix

Usage

GrabFillMatrix(M, ind, M1 = NULL)

Arguments

M

source matrix that will be either subtracted or filled in.

ind

vector of index that of where to be subtracted or filled in.

M1

additional matrix used to fill in the source matrix, the default it NULL.

Details

This function helps subtract or fill in a matrix given the index.

Author(s)

Dongchen Zhang


hop_test

Description

Hop test. This script tests that the model successfully reads it's own restart and can restart without loss of information.

Usage

hop_test(settings, ens.runid = NULL, nyear)

Arguments

settings

SDA PEcAn settings object

ens.runid

run id. If not provided, is looked up from [settings$outdir]/runs.txt

nyear

number of years to run hop test over

Value

NONE

Author(s)

Ann Raiho [email protected]


Internal functions for plotting SDA outputs. Interactive, post analysis time-series and bias plots in base plotting system and ggplot

Description

Internal functions for plotting SDA outputs. Interactive, post analysis time-series and bias plots in base plotting system and ggplot

Usage

interactive.plotting.sda(
  settings,
  t,
  obs.times,
  obs.mean,
  obs.cov,
  obs,
  X,
  FORECAST,
  ANALYSIS
)

postana.timeser.plotting.sda(
  settings,
  t,
  obs.times,
  obs.mean,
  obs.cov,
  obs,
  X,
  FORECAST,
  ANALYSIS
)

postana.bias.plotting.sda(
  settings,
  t,
  obs.times,
  obs.mean,
  obs.cov,
  obs,
  X,
  FORECAST,
  ANALYSIS
)

postana.bias.plotting.sda.corr(t, obs.times, X, aqq, bqq)

post.analysis.ggplot(
  settings,
  t,
  obs.times,
  obs.mean,
  obs.cov,
  obs,
  X,
  FORECAST,
  ANALYSIS,
  plot.title = NULL
)

post.analysis.ggplot.violin(
  settings,
  t,
  obs.times,
  obs.mean,
  obs.cov,
  obs,
  X,
  FORECAST,
  ANALYSIS,
  plot.title = NULL
)

post.analysis.multisite.ggplot(
  settings,
  t,
  obs.times,
  obs.mean,
  obs.cov,
  FORECAST,
  ANALYSIS,
  plot.title = NULL,
  facetg = FALSE,
  readsFF = NULL,
  Add_Map = FALSE
)

SDA_timeseries_plot(
  ANALYSIS,
  FORECAST,
  obs.mean = NULL,
  obs.cov = NULL,
  outdir,
  pft.path = NULL,
  by = "site",
  types = c("FORECAST", "ANALYSIS", "OBS"),
  CI = c(0.025, 0.975),
  unit = list(AbvGrndWood = "Mg/ha", LAI = "m2/m2", SoilMoistFrac = "", TotSoilCarb =
    "kg/m2"),
  style = list(general_color = c(FORECAST = "blue", ANALYSIS = "red", OBS = "black"),
    fill_color = c(FORECAST = "yellow", ANALYSIS = "green", OBS = "grey"), title_color =
    "red"),
  PDF_w = 20,
  PDF_h = 16,
  t.inds = NULL
)

Arguments

settings

pecan standard settings list.

t

current time - int number giving the position of the current time in obs.time.

obs.times

vector of dates of measurements

obs.mean

obs.mean

obs.cov

obs.cov

obs

list containing the mean and cov object

X

dataframe of state variables for each ensemble

FORECAST

Forecast object from the sda.output.Rdata.

ANALYSIS

Analysis object from the sda.output.Rdata.

aqq, bqq

shape parameters estimated over time for the process covariance

plot.title

character giving the title for post visualization ggplots

facetg

logical: Create a subpanel for each variable?

readsFF

optional forward forecast

Add_Map

Bool variable decide if we want to export the GIS map of Ecoregion.

outdir

physical path where the pdf will be stored.

pft.path

Physical path of pft.csv file to allow by = pft option.

by

arrange figures by var, pft, or site.

types

data types that shown in the figure.

CI

range of confidence interval.

unit

list of unit used for y axis label.

style

color option.

PDF_w

width of exported PDF file, passed on to 'base::pdf()'.

PDF_h

height of exported PDF file, passed on to 'base::pdf()'.

t.inds

index of period that will be plotted.

Author(s)

Dongchen Zhang


inverse of ALR transform

Description

inverse of ALR transform

Usage

inv.alr(alr)

Arguments

alr

state var


load_data_paleon_sda

Description

Load data function for paleon SDA data products

Usage

load_data_paleon_sda(settings)

Arguments

settings

PEcAn SDA settings object

Value

obs.mean and obs.cov for sda.enkf

Author(s)

Ann Raiho [email protected]


load_nimble

Description

This functions is internally used to register a series of nimble functions inside GEF analysis function.

Usage

y_star_create(X)

Arguments

X

state var

Author(s)

Ann Raiho, Hamze Dokoohaki


Local.support

Description

distance.mat matrix doesn't need to be just the physical distance, however it represent a measure of similarity between state variables in different sites.

Usage

Local.support(Pf, distance.mat, scalef = 1)

Arguments

Pf

Forecast error covariance matrix

distance.mat

is matrix of distances between sites.

scalef

scalef is a numeric value that requires tuning and it controls the shape of the corrolation function

Value

It returns a localized covariance matrix by taking a Schur product between Pf and a corrolation function

Author(s)

Hamze Dokoohaki


matrix_network

Description

matrix_network

Usage

matrix_network(mat)

Arguments

mat

a boolean matrix representing the interactions between any sites.

Value

It returns lists of index representing each network.

Author(s)

Dongchen Zhang


MCMC_block_function

Description

MCMC_block_function

Usage

MCMC_block_function(block)

Arguments

block

each block within the 'block.list' lists.

Value

It returns the 'block' object with analysis results filled in.

Author(s)

Dongchen Zhang


MCMC_function

Description

MCMC_function

Usage

MCMC_function(data)

Arguments

data

list containing everything needed for the MCMC sampling.

Details

This function replace the previous code where implenmented the MCMC sampling part, which allows the MCMC sampling of multiple chains under parallel mode.

Author(s)

Michael Dietze [email protected], Ann Raiho, Hamze Dokoohaki, and Dongchen Zhang.


MCMC_Init

Description

MCMC_Init

Usage

MCMC_Init(block.list, X)

Arguments

block.list

lists of blocks generated by the 'build.block.xy' function.

X

A matrix contains ensemble forecasts.

Details

This function helps create initial conditions for the MCMC sampling.

Value

It returns the 'block.list' object with initial conditions filled in.

Author(s)

Dongchen Zhang


metSplit

Description

metSplit

Usage

metSplit(
  conf.settings,
  inputs,
  settings,
  model,
  no_split = FALSE,
  obs.times,
  t,
  nens,
  restart_flag = FALSE,
  my.split_inputs
)

Arguments

conf.settings

listed multisite settings object generated by sda.enkf_MultiSite

inputs

listed object containing met ensemble members generated by input.ens.gen

settings

settings object passed to sda.enkf_MultiSite

model

model name ex. SIPNET

no_split

TRUE/FALSE if model requires met split

obs.times

matrix of dates used for assimilation

t

number of dates in obs.times

nens

number of ensemble members, taken from settings object

restart_flag

TRUE/FALSE taken from restart arguement

my.split_inputs

generated by sda.enkf_MultiSite ex. split_inputs.SIPNET

Value

input.split object with split met filepaths

Author(s)

Alexis Helgeson


convert from timestep to actual time points. supports year, month, week, and day as time unit.

Description

convert from timestep to actual time points. supports year, month, week, and day as time unit.

Usage

obs_timestep2timepoint(start.date, end.date, timestep)

Arguments

start.date

start date when the first observation was taken.

end.date

end date when the last observation was taken.

timestep

a list includes time unit and number of time unit per timestep.

Value

timepoints from start to end date given the number of time unit per timestep.

Author(s)

Dongchen Zhang


Obs.data.prepare.MultiSite

Description

Obs.data.prepare.MultiSite

Usage

Obs.data.prepare.MultiSite(obs.path, site.ids)

Arguments

obs.path

Path to the obs data which is expected to be an .Rdata.

site.ids

a character vector of site ids which need to be extracted.

Value

a list of observed mean and cov as the SDA expected it to be.


outlier.detector.boxplot

Description

This function performs a simple outlier replacement on all the columns of dataframes inside a list

Usage

outlier.detector.boxplot(X)

Arguments

X

A list of dataframes

Value

A list the same dimension as X, with each column of each dataframe modified by replacing outlier points with the column median


parallel_prediction

Description

This function helps to predict the target variable observations based on the covariates. The prediction is working in parallel across vegetated pixels.

Usage

parallel_prediction(
  base.map.dir,
  models,
  cov.vecs,
  non.na.inds,
  outdir,
  name,
  cores = parallel::detectCores()
)

Arguments

base.map.dir

character: path to the GeoTIFF file within which the extents and CRS will be used to generate the ensemble maps.

models

list: trained models across ensemble members generated by the 'parallel_train' function.

cov.vecs

numeric: data frame containing covaraites across vegetated pixels generated from the 'stack_covariates_2_df' function.

non.na.inds

numeric: the corresponding index of vegetated pixels generated from the 'stack_covariates_2_df' function.

outdir

character: the output directory where the downscaled maps will be stored.

name

list: containing the time and variable name to create the final GeoTIFF file name.

cores

numeric: how many CPus to be used in the calculation, the default is the total CPU number you have.

Value

paths to the ensemble downscaled maps.

Author(s)

Dongchen Zhang


parallel_train

Description

This function helps to train the ML model across ensemble members in parallel.

Usage

parallel_train(
  full_data,
  method = "randomForest",
  cores = parallel::detectCores()
)

Arguments

full_data

numeric: the matrix generated using the 'prepare_train_dat' function.

method

character: machine learning method (currently support randomForest and xgboost).

cores

numeric: how many CPus to be used in the calculation, the default is the total CPU number you have.

Value

list of trained models across ensemble members.

Author(s)

Dongchen Zhang


pecan_settings_2_pts

Description

convert settings to geospatial points in terra.

Usage

pecan_settings_2_pts(settings)

Arguments

settings

PEcAn settings: either a character that points to the settings or shape file or the actual pecan settings object will be accepted.

Value

terra spatial points object.

Author(s)

Dongchen Zhang


piecew.poly.local

Description

5th order piecewise polynomial adopted from Data assimilation for spatio-temporal processes - p250 - Sebastian Reich

Usage

piecew.poly.local(Pf, distance.mat, scalef = 2)

Arguments

Pf

Forecast error covariance matrix

distance.mat

is matrix of distances between sites.

scalef

scalef is a numeric value that requires tuning and it controls the shape of the corrolation function

Value

It returns a localized covariance matrix by taking a Schur product between Pf and a corrolation function

Author(s)

Hamze Dokoohaki


SDA observation preparation function for LAI and AGB

Description

SDA observation preparation function for LAI and AGB

Usage

Prep_OBS_SDA(settings, out_dir, AGB_dir, Search_Window = 30)

Arguments

settings

multi.settings objects that contains multiple sites info

out_dir

output dir

AGB_dir

AGB data dir

Search_Window

search window for locate available LAI values

Value

mean and covariance of observations


prepare_train_dat

Description

This function helps to create the training dataset of specific variable type and locations for downscaling. TODO: Add a ratio argument (training sample size/total sample size) so that we could calculate the out-of-sample accuracy.

Usage

prepare_train_dat(pts, analysis, covariates.dir, variable)

Arguments

pts

spatialpoints: spatial points returned by 'terra::vectors' function.

analysis

numeric: data frame (rows: ensemble member; columns: site*state_variables) of updated ensemble analysis results from the 'sda_enkf' function.

covariates.dir

character: path to the exported covariates GeoTIFF file.

variable

character: name of state variable. It should match up with the column names of the analysis data frame.

Value

matrix (num.sites, num.variables * num.ensemble + num.covariates) within which the first sets of columns contain values of state variables for each ensemble member of every site, and the rest columns contain the corresponding covariates.

Author(s)

Dongchen Zhang


qsub_sda

Description

This function provides means to split large SDA runs into separate 'qsub' jobs. Including job creation, submission, and assemble.

Usage

qsub_sda(
  settings,
  obs.mean,
  obs.cov,
  Q,
  pre_enkf_params,
  ensemble.samples,
  outdir,
  control,
  block.index = NULL,
  debias = list(cov.dir = NULL, start.year = NULL, t.start = NULL, residual.lag = NULL,
    fun = NULL),
  prefix = "batch"
)

Arguments

settings

PEcAn settings object

obs.mean

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation means for each state variables of each site for each time point.

obs.cov

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation covariances for all state variables of each site for each time point.

Q

Process covariance matrix given if there is no data to estimate it.

pre_enkf_params

Used for passing pre-existing time-series of process error into the current SDA runs to ignore the impact by the differences between process errors.

ensemble.samples

Pass ensemble.samples from outside to avoid GitHub check issues.

outdir

Physical path to the folder where the SDA outputs will be stored. The default is NULL, where we will be using outdir from the settings object.

control

List of flags controlling the behavior of the SDA. 'TimeseriesPlot' for post analysis examination; 'OutlierDetection' decide if we want to execute the outlier detection each time after the model forecasting; 'send_email' contains lists for sending email to report the SDA progress; 'keepNC' decide if we want to keep the NetCDF files inside the out directory; 'forceRun' decide if we want to proceed the Bayesian MCMC sampling without observations; 'MCMC.args' include lists for controling the MCMC sampling process (iteration, nchains, burnin, and nthin.).

block.index

list of site ids for each block, default is NULL. This is used when the localization turns on. Please keep using the default value because the localization feature is still in development.

debias

List: R list containing the covariance directory ('cov.dir'), start year ('start.year'), start time point ('t.start'), if we want to include the 'residual.lag' as additional covariate, and the ML function ('fun'). covariance directory should include GeoTIFF files named by year. start time point is numeric input which decide when to start the debiasing feature. 'residual.lag' is either TRUE or FALSE.

prefix

character: the desired folder name to store the outputs.

Value

NONE

Author(s)

Dongchen Zhang


qsub_sda_batch

Description

This function can help to execute sda function.

Usage

qsub_sda_batch(folder.path)

Arguments

folder.path

character: path where the 'configs.rds' file is stored.

Author(s)

Dongchen Zhang.


Remote_Sync_launcher

Description

Remote_Sync_launcher

Usage

Remote_Sync_launcher(settingPath, remote.path, PID)

Arguments

settingPath

Path to your local setting .

remote.path

Path generated by SDA_remote_launcher which shows the path to your remote SDA run.

PID

PID generated by SDA_remote_launcher which shows the active PID running your SDA job.


rescaling_stateVars

Description

This function uses a set of scaling factors defined in the pecan XML to scale a given matrix

Usage

rescaling_stateVars(settings, X, multiply = TRUE)

Arguments

settings

pecan xml settings where state variables have the scaling_factor tag

X

Any Matrix with column names as variable names

multiply

TRUE = multiplication, FALSE = division

Value

rescaled Matrix


random weighted multivariate normal

Description

random weighted multivariate normal

Usage

rwtmnorm(n, mean, prec, wt)

Arguments

n

sample size

mean

mean

prec

precision

wt

weight


Sample meteorological ensembles

Description

Sample meteorological ensembles

Usage

sample_met(settings, nens = 1)

Arguments

settings

PEcAn settings list

nens

number of ensemble members to be sampled


sample parameters

Description

sample parameters

Usage

sample.parameters(ne, settings, con)

Arguments

ne

number of ensemble members

settings

PEcAn settings object

con

PEcAn database connection

Value

data frame of sampled parameters from the posterior distribution

Author(s)

Michael Dietze [email protected]


sampler toggling

Description

sampler toggling

Usage

sampler_toggle(model, mvSaved, target, control)

Arguments

model

model

mvSaved

copied to

target

thing being targetted

control

unused


sda_assemble

Description

This function can help to assemble sda outputs (analysis and forecasts) from each job execution.

Usage

sda_assemble(batch.folder, outdir)

Arguments

batch.folder

character: path where the SDA batch jobs stored.

outdir

character: path where we want to store the assembled analysis and forecasts.

Author(s)

Dongchen Zhang.


SDA_control

Description

SDA_control

Usage

SDA_control(
  trace = TRUE,
  ForewardForecast = FALSE,
  interactivePlot = FALSE,
  TimeseriesPlot = FALSE,
  BiasPlot = FALSE,
  plot.title = NULL,
  facet.plots = FALSE,
  debug = FALSE,
  pause = FALSE,
  Profiling = FALSE,
  OutlierDetection = FALSE
)

Arguments

trace

Logical if code should print out the progress of SDA .

ForewardForecast

Logical if the foreward forecast estimates needs to be read and visualized in time series plots.

interactivePlot

Logical if time series plots need to be generated.

TimeseriesPlot

Logical if time series plots need to be generated.

BiasPlot

Logical if bias plots need to be generated

plot.title

Character defining the title of times series plots

facet.plots

Logical if the timeseries plots should be faceted based on state variables

debug

Logical if the code should stop at some milestones and open an interactive debugging environment

pause

Logical if code needs to be paused and wait for further instruction after the analysis step

Profiling

Logical if code should keep track of how much time each step took

OutlierDetection

Logical if TRUE then a simple method will be used to replace simulations of outside 3IQR with the median of ensembles.

Value

list of all arguments needed to setup the SDA function


SDA Downscale Function

Description

This function uses either Random Forest or Convolutional Neural Network model based on the model_type parameter.

Usage

SDA_downscale(
  preprocessed,
  date,
  carbon_pool,
  covariates,
  model_type = "rf",
  seed = NULL
)

Arguments

preprocessed

List. Preprocessed data returned as an output from the SDA_downscale_preprocess function.

date

Date. If SDA site run, format is yyyy/mm/dd; if NEON, yyyy-mm-dd. Restricted to years within file supplied to 'preprocessed' from the 'data_path'.

carbon_pool

Character. Carbon pool of interest. Name must match carbon pool name found within file supplied to 'preprocessed' from the 'data_path'.

covariates

SpatRaster stack. Used as predictors in downscaling. Layers within stack should be named. Recommended that this stack be generated using 'covariates' instructions in assim.sequential/inst folder

model_type

Character. Either "rf" for Random Forest or "cnn" for Convolutional Neural Network. Default is Random Forest.

seed

Numeric or NULL. Optional seed for random number generation. Default is NULL.

Details

This function will downscale forecast data to unmodeled locations using covariates and site locations

Value

A list containing the training and testing data sets, models, predicted maps for each ensemble member, and predictions for testing data.

Author(s)

Joshua Ploshay, Sambhav Dixit


SDA Downscale Function for Hourly Data

Description

This function uses the randomForest model to downscale forecast data (hourly) to unmodeled locations using covariates and site locations

Usage

SDA_downscale_hrly(nc_file, coords, yyyy, covariates)

Arguments

nc_file

In quotes, file path for .nc containing ensemble data.

coords

In quotes, file path for .csv file containing the site coordinates, columns named "lon" and "lat".

yyyy

In string, format is yyyy(year of interest)

covariates

SpatRaster stack, used as predictors in randomForest. Layers within stack should be named. Recommended that this stack be generated using 'covariates' instructions in assim.sequential/inst folder

Value

It returns the 'downscale_output' list containing lists for the training and testing data sets, models, and predicted maps for each ensemble member.

Author(s)

Harunobu Ishii


Calculate Metrics for Downscaling Results

Description

This function takes the output from the SDA_downscale function and computes various performance metrics for each ensemble. It provides a way to evaluate the accuracy of the downscaling results without modifying the main downscaling function.

Usage

SDA_downscale_metrics(downscale_output, carbon_pool)

Arguments

downscale_output

List. Output from the SDA_downscale function, containing data, models, maps, and predictions for each ensemble.

carbon_pool

Character. Name of the carbon pool used in the downscaling process.

Details

This function calculates performance metrics for the downscaling results. It computes Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared for each ensemble. The function uses the actual values from the testing data and the predictions generated during the downscaling process.

Value

A list of metrics for each ensemble, where each element contains MAE , MSE ,R_squared ,actual values from testing data and predicted values for the testing data

Author(s)

Sambhav Dixit


Preprocess Data for Downscaling

Description

This function reads and checks the input data, ensuring that the required date and carbon pool exist, and that the site coordinates are valid.

Usage

SDA_downscale_preprocess(data_path, coords_path, date, carbon_pool)

Arguments

data_path

Character. File path for .rds containing ensemble data.

coords_path

Character. File path for .csv file containing the site coordinates, with columns named "lon" and "lat".

date

Date. If SDA site run, format is yyyy/mm/dd; if NEON, yyyy-mm-dd. Restricted to years within the file supplied to 'data_path'.

carbon_pool

Character. Carbon pool of interest. Name must match the carbon pool name found within the file supplied to 'data_path'.

Details

This function ensures that the specified date and carbon pool are present in the input data. It also checks the validity of the site coordinates and aligns the number of rows between site coordinates and carbon data.

Value

A list containing The read .rds data , The cleaned site coordinates, and the preprocessed carbon data.

Author(s)

Sambhav Dixit


sda_matchparam

Description

sda_matchparam

Usage

sda_matchparam(settings, ensemble.samples, site.ids, nens)

Arguments

settings

settings object passed from sda.enkf_MultiSite

ensemble.samples

taken from sample.Rdata object

site.ids

character object passed from sda.enkf_MultiSite

nens

number of ensemble members in model runs, taken from restart$runids

Value

new.params object used to

Author(s)

Alexis Helgeson


Assembler for preparing obs.mean and obs.cov for the SDA workflow

Description

Assembler for preparing obs.mean and obs.cov for the SDA workflow

Usage

SDA_OBS_Assembler(settings)

Arguments

settings

the settings object followed by PEcAn.settings format.

Value

list of obs.mean and obs.cov

Author(s)

Dongchen Zhang

Examples

## Not run: 
settings_dir <- "/projectnb/dietzelab/dongchen/All_NEON_SDA/NEON42/IC/pecan.xml"
settings <- PEcAn.settings::read.settings(settings_dir)
OBS <- SDA_OBS_Assembler(settings)

## End(Not run)

SDA_remote_launcher

Description

SDA_remote_launcher

Usage

SDA_remote_launcher(settingPath, ObsPath, run.bash.args)

Arguments

settingPath

The Path to the setting that will run SDA

ObsPath

Path to the obs data which is expected to be an .Rdata.

run.bash.args

Shell commands to be run on the remote host before launching the SDA. See examples

Value

This function returns a list of two pieces of information. One the remote path that SDA is running and the PID of the active run.

Examples

## Not run: 
 # This example can be found under inst folder in the package
 library(PEcAn.all)
 library(purrr)
 
 run.bash.args <- c(
  "#$ -l h_rt=48:00:00",
  "#$ -pe omp 28 # Request a parallel environment with 4 cores",
  "#$ -l mem_per_core=1G # and 4G memory for each",
  "#$ -l buyin",
  "module load R/3.5.2",
  "module load python/2.7.13"
  )
 settingPath <-"pecan.SDA.4sites.xml"

 ObsPath <- "Obs/LandTrendr_AGB_output50s.RData"
   
 SDA_remote_launcher(settingPath, ObsPath, run.bash.args)

## End(Not run)

Calculate ensemble weights for each site at time t.

Description

Calculate ensemble weights for each site at time t.

Usage

sda_weights_site(FORECAST, ANALYSIS, t, ens)

Arguments

FORECAST

FORECAST object built within the sda.enkf_MultiSite function.

ANALYSIS

ANALYSIS object built within the Analysis_sda_multisite function.

t

exact number of t inside the sda.enkf_MultiSite function.

ens

number of ensemble members.

Value

list of weights associated with each ensemble member of each site.

Author(s)

Dongchen Zhang and Hamze Dokoohaki


sda.bias.correction

Description

This function helps to correct the forecasts' biases based on ML (random forest) training on the previous time points.

Usage

sda.bias.correction(
  settings,
  t,
  t.start,
  dates,
  all.X,
  obs.mean,
  state.interval,
  cov.dir,
  residual.lag = FALSE,
  py.init = NULL
)

Arguments

settings

PEcAn settings object.

t

numeric: the current number of time points (e.g., t=1 for the beginning time point).

t.start

numeric: the user-defined time point to avoid the initial burnin period.

dates

vector: a vector of dates used for extracting covariates through time.

all.X

list: lists of data frame of model forecast from the beginning to the current time points that has n (ensemble size) rows and n.var (number of variables) times n.site (number of locations) columns. (e.g., 100 ensembles, 4 variables, and 8,000 locations will end up with data.frame of 100 rows and 32,000 columns)

obs.mean

List: lists of date times named by time points, which contains lists of sites named by site ids, which contains observation means for each state variables of each site for each time point.

state.interval

matrix: containing the upper and lower boundaries for each state variable.

cov.dir

character: physical path to the directory contains the time series covariate maps.

residual.lag

logical: decide if we want to include the lagged residual (difference in residual between time points) in the ML process.

py.init

R function: R function to initialize the python functions. Default is NULL. the default random forest will be used if 'py.init' is NULL.

Value

list: the current X after the bias-correction; the ML outputs for each variable; predicted residuals.

Author(s)

Dongchen Zhang


State Variable Data Assimilation: Ensemble Kalman Filter and Generalized ensemble filter

Description

Restart mode: Basic idea is that during a restart (primary case envisioned as an iterative forecast), a new workflow folder is created and the previous forecast for the start_time is copied over. During restart the initial run before the loop is skipped, with the info being populated from the previous run. The function then dives right into the first Analysis, then continues on like normal.

Usage

sda.enkf(
  settings,
  obs.mean,
  obs.cov,
  Q = NULL,
  restart = NULL,
  control = list(trace = TRUE, interactivePlot = TRUE, TimeseriesPlot = TRUE, BiasPlot =
    FALSE, plot.title = NULL, debug = FALSE, pause = FALSE),
  ...
)

Arguments

settings

PEcAn settings object

obs.mean

List of dataframe of observation means, named with observation datetime.

obs.cov

List of covariance matrices of state variables , named with observation datetime.

Q

Process covariance matrix given if there is no data to estimate it.

restart

Used for iterative updating previous forecasts. When the restart is TRUE it read the object in SDA folder written from previous SDA.

control

List of flags controlling the behaviour of the SDA. trace for reporting back the SDA outcomes, interactivePlot for plotting the outcomes after each step, TimeseriesPlot for post analysis examination, BiasPlot for plotting the correlation between state variables, plot.title is the title of post analysis plots and debug mode allows for pausing the code and examining the variables inside the function.

...

Additional arguments, currently ignored

Value

NONE

Author(s)

Michael Dietze and Ann Raiho [email protected]


sda.enkf_local

Description

This function provides complete support for the multi-core and multi-node computation on the general HPC system. Thus, this script will be more computationally efficient, making it possible to run SDA over thousands of locations.

Usage

sda.enkf_local(
  settings,
  obs.mean,
  obs.cov,
  Q = NULL,
  pre_enkf_params = NULL,
  ensemble.samples = NULL,
  outdir = NULL,
  control = list(TimeseriesPlot = FALSE, OutlierDetection = FALSE, send_email = NULL,
    keepNC = TRUE, forceRun = TRUE, MCMC.args = NULL, merge_nc = TRUE),
  debias = list(cov.dir = NULL, start.year = NULL, t.start = NULL, residual.lag = NULL,
    fun = NULL)
)

Arguments

settings

PEcAn settings object

obs.mean

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation means for each state variables of each site for each time point.

obs.cov

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation covariances for all state variables of each site for each time point.

Q

Process covariance matrix given if there is no data to estimate it.

pre_enkf_params

Used for passing pre-existing time-series of process error into the current SDA runs to ignore the impact by the differences between process errors.

ensemble.samples

list of ensemble parameters across PFTs. Default is NULL.

outdir

physical path to the folder that stores the SDA outputs. Default is NULL.

control

List of flags controlling the behavior of the SDA. 'TimeseriesPlot' for post analysis examination; 'OutlierDetection' decide if we want to execute the outlier detection each time after the model forecasting; 'send_email' contains lists for sending email to report the SDA progress; 'keepNC' decide if we want to keep the NetCDF files inside the out directory; 'forceRun' decide if we want to proceed the Bayesian MCMC sampling without observations; 'MCMC.args' include lists for controling the MCMC sampling process (iteration, nchains, burnin, and nthin.). 'merge_nc' determine if we want to merge all netCDF files across sites and ensembles. If it's set as 'TRUE', we will then combine all netCDF files into the 'merged_nc' folder within the 'outdir'.

debias

List: R list containing the covariance directory ('cov.dir'), start year ('start.year'), start time point ('t.start'), if we want to include the 'residual.lag' as additional covariate, and the ML function ('fun'). covariance directory should include GeoTIFF files named by year. start time point is numeric input which decide when to start the debiasing feature. 'residual.lag' is either TRUE or FALSE.

Value

NONE

Author(s)

Dongchen Zhang [email protected]


State Variable Data Assimilation: Ensemble Kalman Filter and Generalized ensemble filter

Description

Check out SDA_control function for more details on the control arguments.

Usage

sda.enkf.multisite(
  settings,
  obs.mean,
  obs.cov,
  Q = NULL,
  restart = NULL,
  pre_enkf_params = NULL,
  ensemble.samples = NULL,
  control = list(TimeseriesPlot = FALSE, OutlierDetection = FALSE, send_email = NULL,
    keepNC = TRUE, forceRun = TRUE, run_parallel = TRUE, MCMC.args = NULL, merge_nc =
    TRUE, execution = "local"),
  debias = list(cov.dir = NULL, t.start = NULL, residual.lag = NULL),
  ...
)

Arguments

settings

PEcAn settings object

obs.mean

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation means for each state variables of each site for each time point.

obs.cov

Lists of date times named by time points, which contains lists of sites named by site ids, which contains observation covariances for all state variables of each site for each time point.

Q

Process covariance matrix given if there is no data to estimate it.

restart

Used for iterative updating previous forecasts. Default NULL. List object includes file path to previous runs and start date for SDA.

pre_enkf_params

Used for passing pre-existing time-series of process error into the current SDA runs to ignore the impact by the differences between process errors.

ensemble.samples

list of ensemble parameters across PFTs. Default is NULL.

control

List of flags controlling the behavior of the SDA. 'TimeseriesPlot' for post analysis examination; 'OutlierDetection' decide if we want to execute the outlier detection each time after the model forecasting; 'send_email' contains lists for sending email to report the SDA progress; 'keepNC' decide if we want to keep the NetCDF files inside the out directory; 'forceRun' decide if we want to proceed the Bayesian MCMC sampling without observations; 'run_parallel' decide if we want to run the SDA under parallel mode for the 'future_map' function; 'MCMC.args' include lists for controling the MCMC sampling process (iteration, nchains, burnin, and nthin.). 'merge_nc' determine if we want to merge all netCDF files across sites and ensembles. If it's set as 'TRUE', we will then combine all netCDF files into the 'merged_nc' folder within the 'outdir'. 'execution' decide the way we want to execute model including 'local' ,where we execute the model locally; 'qsub', where we use the traditional 'start_model_runs' function for submission; 'qsub_parallel', where we first combine jobs and submit them into the SCC.

debias

List: R list containing the covariance directory, the start time point, and if we want to include the 'residual.lag' as additional covariate. covariance directory should include GeoTIFF files named by year. start time point is numeric input which decide when to start the debiasing feature. 'residual.lag' is either TRUE or FALSE.

...

Additional arguments, currently ignored

Details

Restart mode: Basic idea is that during a restart (primary case envisioned as an iterative forecast), a new workflow folder is created and the previous forecast for the start_time is copied over. During restart the initial run before the loop is skipped, with the info being populated from the previous run. The function then dives right into the first Analysis, then continues on like normal.

Value

NONE

Author(s)

Michael Dietze, Ann Raiho and Alexis Helgeson [email protected]


State Variable Data Assimilation: Ensemble Kalman Filter

Description

Restart mode: Basic idea is that during a restart (primary case envisioned as an iterative forecast), a new workflow folder is created and the previous forecast for the start_time is copied over. During restart the initial run before the loop is skipped, with the info being populated from the previous run. The function then dives right into the first Analysis, then continues on like normal.

Usage

sda.enkf.original(
  settings,
  obs.mean,
  obs.cov,
  IC = NULL,
  Q = NULL,
  adjustment = TRUE,
  restart = NULL
)

Arguments

settings

PEcAn settings object

obs.mean

list of observations of the means of state variable (time X nstate)

obs.cov

list of observations of covariance matrices of state variables (time X nstate X nstate)

IC

initial conditions

Q

process covariance matrix given if there is no data to estimate it

adjustment

flag for using ensemble adjustment filter or not

restart

Used for iterative updating previous forecasts. This is a list that includes ens.inputs, the list of inputs by ensemble member, params, the parameters, and old_outdir, the output directory from the previous workflow. These three things are needed to ensure that if a new workflow is started that ensemble members keep there run-specific met and params. See Details

Value

NONE

Author(s)

Michael Dietze and Ann Raiho [email protected]


sda_assemble

Description

This function can help to assemble sda outputs (analysis and forecasts) from each job execution.

Usage

sda.qsub.job.submission(
  batch.folder,
  username,
  outdir = NULL,
  past.job.ids = NULL,
  max.job = 20,
  prefix = "sda.all.forecast.analysis.Rdata",
  resources = list(hour = 24, memory = "4G", ncpu = 28)
)

Arguments

batch.folder

character: path where the SDA batch jobs stored.

username

character: username for the HPC.

outdir

character: path where to stored the assembled results. Default is NULL.

past.job.ids

vector: vector of the past submitted job ids. Default is NULL.

max.job

numeric: maximum allowance for the number of running jobs on HPC.

prefix

character: file name to be determined if a job is finished or not.

resources

list: computation resources (time, memory, cores) when used to submit jobs.

Value

vector or numeric: return -1 when all jobs are finished; return vector of submitted job ids when there is any job that is not finished.

Author(s)

Dongchen Zhang.


simple.local

Description

Adopted from Data assimilation for spatio-temporal processes - p250 - Sebastian Reich

Usage

simple.local(Pf, distance.mat, scalef = 2)

Arguments

Pf

Forecast error covariance matrix

distance.mat

is matrix of distances between sites.

scalef

scalef is a numeric value that requires tuning and it controls the shape of the corrolation function

Value

It returns a localized covariance matrix by taking a Schur product between Pf and a corrolation function

Author(s)

Hamze Dokoohaki


stack_covariates_2_df

Description

This function helps to build the data frame (pixels by data columns) for only vegetated pixels to improve the efficiency. Note that the 'LC' field using the 'MODIS land cover' observations (MCD12Q1.061) must be supplied in the covariates to make this function work.

Usage

stack_covariates_2_df(rast.dir, cores = parallel::detectCores())

Arguments

rast.dir

character: a character that points to the covariates raster file generated by the 'stack_covariates_2_geotiff' function.

cores

numeric: how many CPus to be used in the calculation, the default is the total CPU number you have.

Value

list containing the data frame of covariates for vegetated pixels and the corresponding index of the pixels.

Author(s)

Dongchen Zhang


stack_covariates_2_geotiff

Description

This function helps to stack target data layers from various GeoTIFF maps (with different extents, CRS, and resolutions) to a single map.

Usage

stack_covariates_2_geotiff(
  outdir,
  year,
  base.map.dir,
  cov.tif.file.list,
  normalize = T,
  cores = parallel::detectCores()
)

Arguments

outdir

character: the output directory where the stacked GeoTIFF file will be generated.

year

numeric: the year of when the covariates are stacked.

base.map.dir

character: path to the GeoTIFF file within which the extents and CRS will be used to generate the final map.

cov.tif.file.list

list: a list contains sub-lists with each including path to the corresponding map and the variables to be extracted (e.g., list(LC = list(dir = "path/to/landcover.tiff", var.name = "LC")).

normalize

boolean: decide if we want to normalize each data layer, the default is TRUE.

cores

numeric: how many CPus to be used in the calculation, the default is the total CPU number you have.

Value

path to the exported GeoTIFF file.

Author(s)

Dongchen Zhang


tobit_model_censored

Description

tobit_model_censored

Usage

tobit_model_censored(settings, X, var.names, mu.f, Pf, t)

Arguments

settings

(list) pecan standard settings list.

X

(numeric) A matrix contains ensemble forecasts (ensembles * variables).

var.names

(character) variable names.

mu.f

(numeric) forecast mean values.

Pf

(numeric) forecast covariance matrix.

t

(numeric) timestep. If t=1, initial values are imputed for zero values in mu.f

Value

list with updated mu.f, pf, X, and indication of which y values are censored


TWEnF

Description

TWEnF

Usage

tobit.model

Format

TBD


Fit tobit prior to ensemble members

Description

Fit tobit prior to ensemble members

Usage

tobit2space.model

Format

TBD


update_q

Description

update_q

Usage

update_q(
  block.list.all,
  t,
  nt,
  aqq.Init = NULL,
  bqq.Init = NULL,
  MCMC_dat = NULL,
  block.list.all.pre = NULL
)

Arguments

block.list.all

each block within the 'block.list' lists.

t

time point.

nt

total length of time steps.

aqq.Init

the initial values of aqq, the default is NULL.

bqq.Init

the initial values of bqq, the default is NULL.

MCMC_dat

data frame of MCMC samples, the default it NULL.

block.list.all.pre

pre-existed block.list.all object for passing the aqq and bqq to the current SDA run, the default is NULL.

Value

It returns the 'block.list.all' object with initialized/updated Q filled in.

Author(s)

Dongchen Zhang