Statistical model#

Functions for building a proxy signal inference model.

build_model

Create a proxy signal (i.e., carbon isotope) inference model.

build_prior_age_model

Create a prior age model for each section.

get_valid_initial_ages

Helper function that resets the initial sample age values (in Model.initial_point()) such that all detrital and intrusive age constraints are respected.

intermediate_detrital_potential

Helper function for enforcing detrital age constraints (maximum age constraint for all overlying samples; no constraint on underlyig samples) in the middle of a section.

intermediate_intrusive_potential

Helper function for enforcing intrusive age constraints (minimum age constraint for all underlying samples; no constraint on overlying samples) in the middle of a section.

superposition

Helper function for explicitly enforcing stratigraphic superposition (any given sample must be younger than the underlying sample) for a group of radiometric age constraints.

superposition_depositional_and_limiting_ages

Helper function for ensuring that depositional age constraints respect superposition with limiting age constraints.

untransformed_initval

Helper function to retrieve the untransformed initial values for a random variable in a pymc.model.core.Model object.

stratmc.model.build_model(sample_df, ages_df, proxies=['d13c'], proxy_sigma_default=0.1, approximate=False, hsgp_m=15, hsgp_c=1.3, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, white_noise_sigma=0.1, gp_mean_mu=None, gp_mean_sigma=None, offset_type='section', offset_prior='Laplace', offset_mu=0, offset_b=2, noise_type='section', noise_prior='HalfCauchy', noise_beta=1, noise_sigma=None, noise_sigma_single_sample=1, noise_sigma_studentT=1, noise_nu=1, jitter=0.001, proxy_observed=True, **kwargs)[source]#

Create a proxy signal (i.e., carbon isotope) inference model.

Note that while excluded samples (Exclude? is True in sample_df) do not affect the proxy signal reconstruction, their ages are passively tracked within the inference model (if other samples from the same sections are included).

Parameters:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections. Load from .csv file using load_data() in stratmc.data.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections. Load from .csv file using load_data() in stratmc.data.

sections:: list(str) or numpy.array(str), optional

List of sections to include in the inference model. Defaults to all sections in sample_df.

proxies: str or list(str), optional

Column or columns containing proxy data in sample_df. Defaults to ‘d13c’.

proxy_sigma_default: float or dict{float}, optional

Measurement uncertainty (\(1\sigma\)) to use for proxy observations if not specified in proxy_std column of sample_df. To set a different value for each proxy, pass a dictionary with proxy names as keys. Defaults to 0.1.

approximate: bool, optional

Build model with an unapproximated latent GP (pymc.gp.Latent) if False, or a Hilbert space Gaussian process approximation (pymc.gp.HSGP) if True; defaults to False. If using the HSGP approximation, also pass the hsgp_m and hsgp_c parameters (the defaults are unlikely to work well for all problems). Appropriate values for m and c can be estimated using approx_hsgp_hyperparams().

hsgp_m: int or dict{int}, optional

Number of basis vectors to use in the HSGP approximation (see pymc.gp.HSGP). Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 15.

hsgp_c: float or dict{float}, optional

Proportion extension factor for the HSGP approximation (see pymc.gp.HSGP). Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 1.3.

ls_dist: str or dict{str}, optional

Prior distribution for the lengthscale hyperparameter of the exponential quadratic covariance kernel (pymc.gp.cov.ExpQuad); set to Wald (pymc.Wald) or HalfNormal (pymc.HalfNormal). Defaults to Wald with mu = 20 and lambda = 50. To change mu and lambda, pass the ls_mu and ls_lambda parameters. For HalfNormal, the variance defaults to sigma = 50; change by passing ls_sigma. Pass as a dictionary with proxy names as keys to specify a different prior distribution for each proxy.

ls_min: float or dict{float}, optional

Minimum value for the lengthscale hyperparameter of the pymc.gp.cov.ExpQuad covariance kernel; shifts the lengthscale prior by ls_min. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 0.

ls_mu: float or dict{float}, optional

Mean (mu) of the pymc.gp.cov.ExpQuad lengthscale prior if ls_dist = `Wald`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 20.

ls_lambda: float or dict{float}, optional

Relative precision (lam) of the pymc.gp.cov.ExpQuad lengthscale hyperparameter prior if ls_dist = `Wald`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 50.

ls_sigma: float or dict{float}, optional

Scale parameter (sigma) of the pymc.gp.cov.ExpQuad lengthscale hyperparameter prior if ls_dist = `HalfNormal`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 50.

var_sigma: float or dict{float}, optional

Scale parameter (‘sigma’) of the covariance kernel variance hyperparameter prior, which is a pymc.HalfNormal distribution. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 10.

white_noise_sigma: float or dict{float}, optional

Amplitude of white noise component of GP covariance function. Defaults to 0.1. Should be equal to or less than the proxy measurement uncertainty; use a smaller value for proxies with low-amplitude signals (e.g., Sr isotopes). Pass as a dictionary with proxy names as keys to specify a different value for each proxy.

gp_mean_mu: float or dict{float}, optional

Mean (mu) of the GP mean function prior, which is a pymc.Normal distribution. Defaults to the mean of the observations for each proxy. Pass as a dictionary with proxy names as keys to specify a different value for each proxy.

gp_mean_sigma: float or dict{float}, optional

Standard deviation (sigma) of the GP mean function prior, which is a pymc.Normal distribution. Defaults to twice the standard deviation of the observations for each proxy. Pass as a dictionary with proxy names as keys to specify a different value for each proxy.

offset_type: str or dict{str}, optional

Parameterize offset such that all samples from the same section have the same offset (set to section), or such that custom sample groups share an offset term (set to groups, and specify the group for each sample and proxy with a offset_group_proxy column in sample_df). To omit the offset term, set to none. Defaults to section. Pass as a dictionary with proxy names as keys to specify a different offset type for each proxy.

offset_prior: str or dict{str}, optional

Type of distribution to use for the offset prior. Pass as a string to use the same prior for all proxies, or as a dict of string (with proxy names as keys) to specify a different prior distribution for each proxy. Defaults to Laplace (pymc.Laplace) with mu = 0 and b = 2. mu and b can be changed by passing offset_mu and offset_b. To use other types of priors, pass the name of a distribution from pymc.distributions, along with parameter values in offset_params. Pass as a dictionary with proxy names as keys to specify a different prior for each proxy.

offset_params: dict{float} or dict{dict{float}}, optional

Only required if using a custom offset_prior. Pass as a dictionary to use the same parameters for all proxies, or as a dictionary of dictionaries (with proxy names as keys) to specify different parameters for each proxy. Keys are param_1_name, param_1, param_2_name, and param_2 (with parameter names corresponding to those required for the specified pymc.distributions object). If only one parameter is required, use np.nan for param_2_name and param_2.

noise_type: str or dict{str}, optional

Parameterize noise as per-section (set to section) or per-group (set to groups, and specify the group for each sample and proxy with a noise_group_proxy column in sample_df). Defaults to section. Pass as a dictionary with proxy names as keys to specify a different noise type for each proxy.

noise_prior: str or dict{str}, optional

Type of distribution to use for the noise prior. Pass as a string to use the same prior for all proxies, or as a dict of string (with proxy names as keys) to specify a different prior distribution for each proxy. Defaults to HalfCauchy (pymc.HalfCauchy) with beta = 1. beta can be changed by passing noise_beta.

Other implemented priors (note that noise must be positive-only) are``HalfStudentT`` (pymc.HalfStudentT; defaults to noise_nu = 1 and noise_sigma_studentT = 1) and HalfNormal (pymc.HalfNormal). By default, the HalfNormal prior has sigma equal to the standard deviation of the data associated with each noise term. sigma can be changed by passing noise_sigma. For sections with only 1 sample, the standard deviation of the noise prior is set by noise_sigma_single_sample (defaults to 1).

superposition_dict: dict{list(str)}, optional

Optional; dictionary specifying superposition relationships between different sections. Should only be used when superposition is not implicitly enforced by the age constraints; for example, when sections share the same minimum and maximum age constraints, but are from geological formations with a known stratigraphic relationship. Dictionary keys are section names, and the value for each key is a list of sections that must be older (stratigraphically lower).

jitter: float, optional

Value of jitter passed to pymc.gp.Latent.prior(). Defaults to 0.001.

proxy_observed: bool, optional

Whether to pass observed values to the likelihood function; defaults to True. Only set to False to generate synthetic observations from the model prior in synthetic_observations_from_prior() in stratmc.synthetics.

Returns:
model: PyMC model

pymc.model.core.Model object.

gp: pymc.gp.Latent or pymc.gp.HSGP

Gaussian process prior for the model. pymc.gp.Latent if approximate = True, or pymc.gp.HSGP if approximate = False.

stratmc.model.build_prior_age_model(sample_df, ages_df, proxies=['d13c'], **kwargs)[source]#

Create a prior age model for each section. Omits the proxy signal component of the statistical model.

Parameters:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections. Load from .csv file using load_data() in stratmc.data.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections. Load from .csv file using load_data() in stratmc.data.

proxies: str or list(str), optional

Column or columns containing proxy data in sample_df. Defaults to ‘d13c’.

sections:: list(str) or numpy.array(str), optional

List of sections to include in the inference model. Defaults to all sections in sample_df.

Returns:
prior_age_model: PyMC model

pymc.model.core.Model object.

stratmc.model.get_valid_initial_ages(detrital_age_dist_names, intrusive_age_dist_names, maximum_age_dist_name, minimum_age_dist_name, sample_age_dist_name, sample_heights, detrital_heights, intrusive_heights, model, interval, sf1_name, sf2_name, shared_radiometric_age_dist)[source]#

Helper function that resets the initial sample age values (in Model.initial_point()) such that all detrital and intrusive age constraints are respected.

Parameters:
detrital_age_dist_names: list(str)

List of names for detrital age constraint distributions in model.

intrusive_age_dist_names: list(str)

List of names for intrusive age constraint distributions in model.

maximum_age_dist_name: str

Name of distribution for underlying maximum age constraint in model.

minimum_age_dist_name: str

Name of distribution for overlying minimum age constraint in model.

sample_age_dist_name: str

Name of sample age distribution (unsorted and unscaled) in the pymc.Model object.

sample_heights: np.array

Array of heights for samples in the current interval.

detrital_heights: list(float)

Heights of detrital age constraints.

intrusive_heights: list(float)

Heights of intrusive age constraints.

model: pymc.Model

pymc.model.core.Model object associated with the input distributions.

section: str

Name of the current section.

interval: int

Current interval number.

sf1_name: str

Name of the distribution associated with scaling factor 1 in model.

sf2_name: str

Name of the distribution associated with scaling factor 2 in model.

shared_radiometric_age_dist: bool, optional

Whether the radiometric age distributions are part of a single object (versus initiated as separate distributions). Defaults to True.

stratmc.model.intermediate_detrital_potential(detrital_age_dist, detrital_age_dist_name, sample_age_dist_sorted, sample_heights, detrital_height, section)[source]#

Helper function for enforcing detrital age constraints (maximum age constraint for all overlying samples; no constraint on underlyig samples) in the middle of a section.

Parameters:
detrital_age_dist: pymc.distributions

Detrital age distribution.

detrital_age_dist_name: str

Name of detrital_age_dist in model.

sample_age_dist_sorted: pymc.distribtions

Sorted and scaled sample age distribitions.

sample_age_dist: pymc.distributions

Unsorted and unscaled sample age distributions.

sample_age_dist_name: str

Name of sample_age_dist in the pymc.Model object.

sample_heights: np.array

Array of heights for samples in the current interval.

detrital_height: float

Height of detrital age constraint.

section: str

Name of the current section.

stratmc.model.intermediate_intrusive_potential(intrusive_age_dist, intrusive_age_dist_name, sample_age_dist_sorted, sample_heights, intrusive_height, section)[source]#

Helper function for enforcing intrusive age constraints (minimum age constraint for all underlying samples; no constraint on overlying samples) in the middle of a section.

Parameters:
intrusive_age_dist: pymc.distributions

Intrusive age distribution.

intrusive_age_dist_name: str

Name of intrusive_age_dist in model.

sample_age_dist_sorted: pymc.distribtions

Sorted and scaled sample age distribitions.

sample_heights: np.array

Array of heights for samples in the current interval.

intrusive_height: float

Height of intrusive age constraint.

section: str

Name of the current section.

stratmc.model.superposition(age_dist, age_dist_names, model, section_age_df, section)[source]#

Helper function for explicitly enforcing stratigraphic superposition (any given sample must be younger than the underlying sample) for a group of radiometric age constraints. Each constraint must have a unique name in model.

Parameters:
age_dist: pymc.distribtions

pytensor.tensor containing radiometric age distributions (must be in stratigraphic order - lowest to highest).

age_dist_names: list(str)

Names of each entry in age_dist in model.

model: pymc.Model

pymc.model.core.Model object associated with the distributions in age_dist.

section_age_df: pandas.DataFrame

pandas.DataFrame containing age constraints for section.

section: str

Name of the section containing the radiometric age constraints.

stratmc.model.superposition_depositional_and_limiting_ages(model, depositional_age_name, detrital_age_dist_names, intrusive_age_dist_names, section, depositional_age_idx=None)[source]#

Helper function for ensuring that depositional age constraints respect superposition with limiting age constraints. For example, a depositional age most be younger than any underlying detrital age constraints, even if their ages overlap within measurement uncertainty.

Parameters:
model: pymc.Model

pymc.model.core.Model object associated with the input distributions.

depositional_age_name: str

Name of distribution for target depositional age constraint in model.

detrital_age_dist_names: list(str)

Names of underlying detrital age constraint distributions in model.

intrusive_age_dist_names: list(str)

Names of overlying intrusive age constraint distributions in model.

section: str

Name of section containing the target age constraints.

depositional_age_idx: int

Position of depositional_age in model variable depositional_age_name. Only required if depositional_age is one of multiple ages modeled using a single multidimensional distribution.

stratmc.model.untransformed_initval(var_name, model)[source]#

Helper function to retrieve the untransformed initial values for a random variable in a pymc.model.core.Model object. Applies backward transform to the transformed initial values.

Parameters:
var_name: str

Name of variable (untransformed, as specified in model).

model: PyMC model

pymc.model.core.Model object that contains the target random variable.