Statistical model#

Functions for building a proxy signal inference model.

`build_model`	Create a proxy signal (i.e., carbon isotope) inference model.
`build_prior_age_model`	Create a prior age model for each section.
`get_valid_initial_ages`	Helper function that resets the initial sample age values (in `Model.initial_point()`) such that all detrital and intrusive age constraints are respected.
`intermediate_detrital_potential`	Helper function for enforcing detrital age constraints (maximum age constraint for all overlying samples; no constraint on underlyig samples) in the middle of a section.
`intermediate_intrusive_potential`	Helper function for enforcing intrusive age constraints (minimum age constraint for all underlying samples; no constraint on overlying samples) in the middle of a section.
`superposition`	Helper function for explicitly enforcing stratigraphic superposition (any given sample must be younger than the underlying sample) for a group of radiometric age constraints.
`superposition_depositional_and_limiting_ages`	Helper function for ensuring that depositional age constraints respect superposition with limiting age constraints.
`untransformed_initval`	Helper function to retrieve the untransformed initial values for a random variable in a `pymc.model.core.Model` object.

stratmc.model.build_model(sample_df, ages_df, proxies=['d13c'], proxy_sigma_default=0.1, approximate=False, hsgp_m=15, hsgp_c=1.3, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, white_noise_sigma=0.1, gp_mean_mu=None, gp_mean_sigma=None, offset_type='section', offset_prior='Laplace', offset_mu=0, offset_b=2, noise_type='section', noise_prior='HalfCauchy', noise_beta=1, noise_sigma=None, noise_sigma_single_sample=1, noise_sigma_studentT=1, noise_nu=1, jitter=0.001, proxy_observed=True, **kwargs)[source]#

Create a proxy signal (i.e., carbon isotope) inference model.

Note that while excluded samples (Exclude? is True in sample_df) do not affect the proxy signal reconstruction, their ages are passively tracked within the inference model (if other samples from the same sections are included).

Parameters:

sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections. Load from .csv file using load_data() in stratmc.data.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections. Load from .csv file using load_data() in stratmc.data.

sections:: list(str) or numpy.array(str), optional

List of sections to include in the inference model. Defaults to all sections in sample_df.

proxies: str or list(str), optional

Column or columns containing proxy data in sample_df. Defaults to ‘d13c’.

proxy_sigma_default: float or dict{float}, optional

Measurement uncertainty (\(1\sigma\)) to use for proxy observations if not specified in proxy_std column of sample_df. To set a different value for each proxy, pass a dictionary with proxy names as keys. Defaults to 0.1.

approximate: bool, optional

Build model with an unapproximated latent GP (pymc.gp.Latent) if False, or a Hilbert space Gaussian process approximation (pymc.gp.HSGP) if True; defaults to False. If using the HSGP approximation, also pass the hsgp_m and hsgp_c parameters (the defaults are unlikely to work well for all problems). Appropriate values for m and c can be estimated using approx_hsgp_hyperparams().

hsgp_m: int or dict{int}, optional

Number of basis vectors to use in the HSGP approximation (see pymc.gp.HSGP). Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 15.

hsgp_c: float or dict{float}, optional

Proportion extension factor for the HSGP approximation (see pymc.gp.HSGP). Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 1.3.

ls_dist: str or dict{str}, optional

Prior distribution for the lengthscale hyperparameter of the exponential quadratic covariance kernel (pymc.gp.cov.ExpQuad); set to Wald (pymc.Wald) or HalfNormal (pymc.HalfNormal). Defaults to Wald with mu = 20 and lambda = 50. To change mu and lambda, pass the ls_mu and ls_lambda parameters. For HalfNormal, the variance defaults to sigma = 50; change by passing ls_sigma. Pass as a dictionary with proxy names as keys to specify a different prior distribution for each proxy.

ls_min: float or dict{float}, optional

Minimum value for the lengthscale hyperparameter of the pymc.gp.cov.ExpQuad covariance kernel; shifts the lengthscale prior by ls_min. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 0.

ls_mu: float or dict{float}, optional

Mean (mu) of the pymc.gp.cov.ExpQuad lengthscale prior if ls_dist = `Wald`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 20.

ls_lambda: float or dict{float}, optional

Relative precision (lam) of the pymc.gp.cov.ExpQuad lengthscale hyperparameter prior if ls_dist = `Wald`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 50.

ls_sigma: float or dict{float}, optional

Scale parameter (sigma) of the pymc.gp.cov.ExpQuad lengthscale hyperparameter prior if ls_dist = `HalfNormal`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 50.

var_sigma: float or dict{float}, optional

Scale parameter (‘sigma’) of the covariance kernel variance hyperparameter prior, which is a pymc.HalfNormal distribution. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 10.

white_noise_sigma: float or dict{float}, optional

Amplitude of white noise component of GP covariance function. Defaults to 0.1. Should be equal to or less than the proxy measurement uncertainty; use a smaller value for proxies with low-amplitude signals (e.g., Sr isotopes). Pass as a dictionary with proxy names as keys to specify a different value for each proxy.

gp_mean_mu: float or dict{float}, optional

Mean (mu) of the GP mean function prior, which is a pymc.Normal distribution. Defaults to the mean of the observations for each proxy. Pass as a dictionary with proxy names as keys to specify a different value for each proxy.

gp_mean_sigma: float or dict{float}, optional

Standard deviation (sigma) of the GP mean function prior, which is a pymc.Normal distribution. Defaults to twice the standard deviation of the observations for each proxy. Pass as a dictionary with proxy names as keys to specify a different value for each proxy.

offset_type: str or dict{str}, optional

Parameterize offset such that all samples from the same section have the same offset (set to section), or such that custom sample groups share an offset term (set to groups, and specify the group for each sample and proxy with a offset_group_proxy column in sample_df). To omit the offset term, set to none. Defaults to section. Pass as a dictionary with proxy names as keys to specify a different offset type for each proxy.

offset_prior: str or dict{str}, optional

Type of distribution to use for the offset prior. Pass as a string to use the same prior for all proxies, or as a dict of string (with proxy names as keys) to specify a different prior distribution for each proxy. Defaults to Laplace (pymc.Laplace) with mu = 0 and b = 2. mu and b can be changed by passing offset_mu and offset_b. To use other types of priors, pass the name of a distribution from pymc.distributions, along with parameter values in offset_params. Pass as a dictionary with proxy names as keys to specify a different prior for each proxy.

offset_params: dict{float} or dict{dict{float}}, optional

Only required if using a custom offset_prior. Pass as a dictionary to use the same parameters for all proxies, or as a dictionary of dictionaries (with proxy names as keys) to specify different parameters for each proxy. Keys are param_1_name, param_1, param_2_name, and param_2 (with parameter names corresponding to those required for the specified pymc.distributions object). If only one parameter is required, use np.nan for param_2_name and param_2.

noise_type: str or dict{str}, optional

Parameterize noise as per-section (set to section) or per-group (set to groups, and specify the group for each sample and proxy with a noise_group_proxy column in sample_df). Defaults to section. Pass as a dictionary with proxy names as keys to specify a different noise type for each proxy.

noise_prior: str or dict{str}, optional

Type of distribution to use for the noise prior. Pass as a string to use the same prior for all proxies, or as a dict of string (with proxy names as keys) to specify a different prior distribution for each proxy. Defaults to HalfCauchy (pymc.HalfCauchy) with beta = 1. beta can be changed by passing noise_beta.

Other implemented priors (note that noise must be positive-only) are``HalfStudentT`` (pymc.HalfStudentT; defaults to noise_nu = 1 and noise_sigma_studentT = 1) and HalfNormal (pymc.HalfNormal). By default, the HalfNormal prior has sigma equal to the standard deviation of the data associated with each noise term. sigma can be changed by passing noise_sigma. For sections with only 1 sample, the standard deviation of the noise prior is set by noise_sigma_single_sample (defaults to 1).

superposition_dict: dict{list(str)}, optional

Optional; dictionary specifying superposition relationships between different sections. Should only be used when superposition is not implicitly enforced by the age constraints; for example, when sections share the same minimum and maximum age constraints, but are from geological formations with a known stratigraphic relationship. Dictionary keys are section names, and the value for each key is a list of sections that must be older (stratigraphically lower).

jitter: float, optional

Value of jitter passed to pymc.gp.Latent.prior(). Defaults to 0.001.

proxy_observed: bool, optional

Whether to pass observed values to the likelihood function; defaults to True. Only set to False to generate synthetic observations from the model prior in synthetic_observations_from_prior() in stratmc.synthetics.

Returns:

model: PyMC model: pymc.model.core.Model object.
gp: pymc.gp.Latent or pymc.gp.HSGP: Gaussian process prior for the model. pymc.gp.Latent if approximate = True, or pymc.gp.HSGP if approximate = False.

stratmc.model.build_prior_age_model(sample_df, ages_df, proxies=['d13c'], **kwargs)[source]#

Create a prior age model for each section. Omits the proxy signal component of the statistical model.

Parameters:

sample_df: pandas.DataFrame: pandas.DataFrame containing proxy data for all sections. Load from .csv file using load_data() in stratmc.data.
ages_df: pandas.DataFrame: pandas.DataFrame containing age constraints for all sections. Load from .csv file using load_data() in stratmc.data.
proxies: str or list(str), optional: Column or columns containing proxy data in sample_df. Defaults to ‘d13c’.
sections:: list(str) or numpy.array(str), optional: List of sections to include in the inference model. Defaults to all sections in sample_df.

Returns:

prior_age_model: PyMC model: pymc.model.core.Model object.

stratmc.model.get_valid_initial_ages(detrital_age_dist_names, intrusive_age_dist_names, maximum_age_dist_name, minimum_age_dist_name, sample_age_dist_name, sample_heights, detrital_heights, intrusive_heights, model, interval, sf1_name, sf2_name, shared_radiometric_age_dist)[source]#

Helper function that resets the initial sample age values (in Model.initial_point()) such that all detrital and intrusive age constraints are respected.

Parameters:

detrital_age_dist_names: list(str): List of names for detrital age constraint distributions in model.
intrusive_age_dist_names: list(str): List of names for intrusive age constraint distributions in model.
maximum_age_dist_name: str: Name of distribution for underlying maximum age constraint in model.
minimum_age_dist_name: str: Name of distribution for overlying minimum age constraint in model.
sample_age_dist_name: str: Name of sample age distribution (unsorted and unscaled) in the pymc.Model object.
sample_heights: np.array: Array of heights for samples in the current interval.
detrital_heights: list(float): Heights of detrital age constraints.
intrusive_heights: list(float): Heights of intrusive age constraints.
model: pymc.Model: pymc.model.core.Model object associated with the input distributions.
section: str: Name of the current section.
interval: int: Current interval number.
sf1_name: str: Name of the distribution associated with scaling factor 1 in model.
sf2_name: str: Name of the distribution associated with scaling factor 2 in model.
shared_radiometric_age_dist: bool, optional: Whether the radiometric age distributions are part of a single object (versus initiated as separate distributions). Defaults to True.

stratmc.model.intermediate_detrital_potential(detrital_age_dist, detrital_age_dist_name, sample_age_dist_sorted, sample_heights, detrital_height, section)[source]#

Helper function for enforcing detrital age constraints (maximum age constraint for all overlying samples; no constraint on underlyig samples) in the middle of a section.

Parameters:

detrital_age_dist: pymc.distributions: Detrital age distribution.
detrital_age_dist_name: str: Name of detrital_age_dist in model.
sample_age_dist_sorted: pymc.distribtions: Sorted and scaled sample age distribitions.
sample_age_dist: pymc.distributions: Unsorted and unscaled sample age distributions.
sample_age_dist_name: str: Name of sample_age_dist in the pymc.Model object.
sample_heights: np.array: Array of heights for samples in the current interval.
detrital_height: float: Height of detrital age constraint.
section: str: Name of the current section.

stratmc.model.intermediate_intrusive_potential(intrusive_age_dist, intrusive_age_dist_name, sample_age_dist_sorted, sample_heights, intrusive_height, section)[source]#

Helper function for enforcing intrusive age constraints (minimum age constraint for all underlying samples; no constraint on overlying samples) in the middle of a section.

Parameters:

intrusive_age_dist: pymc.distributions: Intrusive age distribution.
intrusive_age_dist_name: str: Name of intrusive_age_dist in model.
sample_age_dist_sorted: pymc.distribtions: Sorted and scaled sample age distribitions.
sample_heights: np.array: Array of heights for samples in the current interval.
intrusive_height: float: Height of intrusive age constraint.
section: str: Name of the current section.

stratmc.model.superposition(age_dist, age_dist_names, model, section_age_df, section)[source]#

Helper function for explicitly enforcing stratigraphic superposition (any given sample must be younger than the underlying sample) for a group of radiometric age constraints. Each constraint must have a unique name in model.

Parameters:

age_dist: pymc.distribtions: pytensor.tensor containing radiometric age distributions (must be in stratigraphic order - lowest to highest).
age_dist_names: list(str): Names of each entry in age_dist in model.
model: pymc.Model: pymc.model.core.Model object associated with the distributions in age_dist.
section_age_df: pandas.DataFrame: pandas.DataFrame containing age constraints for section.
section: str: Name of the section containing the radiometric age constraints.

stratmc.model.superposition_depositional_and_limiting_ages(model, depositional_age_name, detrital_age_dist_names, intrusive_age_dist_names, section, depositional_age_idx=None)[source]#

Helper function for ensuring that depositional age constraints respect superposition with limiting age constraints. For example, a depositional age most be younger than any underlying detrital age constraints, even if their ages overlap within measurement uncertainty.

Parameters:

model: pymc.Model: pymc.model.core.Model object associated with the input distributions.
depositional_age_name: str: Name of distribution for target depositional age constraint in model.
detrital_age_dist_names: list(str): Names of underlying detrital age constraint distributions in model.
intrusive_age_dist_names: list(str): Names of overlying intrusive age constraint distributions in model.
section: str: Name of section containing the target age constraints.
depositional_age_idx: int: Position of depositional_age in model variable depositional_age_name. Only required if depositional_age is one of multiple ages modeled using a single multidimensional distribution.

stratmc.model.untransformed_initval(var_name, model)[source]#

Helper function to retrieve the untransformed initial values for a random variable in a pymc.model.core.Model object. Applies backward transform to the transformed initial values.

Parameters:

var_name: str: Name of variable (untransformed, as specified in model).
model: PyMC model: pymc.model.core.Model object that contains the target random variable.

Statistical model#

This Page