Statistical model#
Functions for building a proxy signal inference model.
Create a proxy signal (i.e., carbon isotope) inference model. |
|
Create a prior age model for each section. |
|
Helper function that resets the initial sample age values (in |
|
Helper function for enforcing detrital age constraints (maximum age constraint for all overlying samples; no constraint on underlyig samples) in the middle of a section. |
|
Helper function for enforcing intrusive age constraints (minimum age constraint for all underlying samples; no constraint on overlying samples) in the middle of a section. |
|
Helper function for explicitly enforcing stratigraphic superposition (any given sample must be younger than the underlying sample) for a group of radiometric age constraints. |
|
Helper function for ensuring that depositional age constraints respect superposition with limiting age constraints. |
|
Helper function to retrieve the untransformed initial values for a random variable in a |
- stratmc.model.build_model(sample_df, ages_df, proxies=['d13c'], proxy_sigma_default=0.1, approximate=False, hsgp_m=15, hsgp_c=1.3, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, white_noise_sigma=0.1, gp_mean_mu=None, gp_mean_sigma=None, offset_type='section', offset_prior='Laplace', offset_mu=0, offset_b=2, noise_type='section', noise_prior='HalfCauchy', noise_beta=1, noise_sigma=None, noise_sigma_single_sample=1, noise_sigma_studentT=1, noise_nu=1, jitter=0.001, proxy_observed=True, **kwargs)[source]#
Create a proxy signal (i.e., carbon isotope) inference model.
Note that while excluded samples (
Exclude?isTrueinsample_df) do not affect the proxy signal reconstruction, their ages are passively tracked within the inference model (if other samples from the same sections are included).- Parameters:
- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for all sections. Load from .csv file usingload_data()instratmc.data.- ages_df: pandas.DataFrame
pandas.DataFramecontaining age constraints for all sections. Load from .csv file usingload_data()instratmc.data.- sections:: list(str) or numpy.array(str), optional
List of sections to include in the inference model. Defaults to all sections in
sample_df.- proxies: str or list(str), optional
Column or columns containing proxy data in
sample_df. Defaults to ‘d13c’.- proxy_sigma_default: float or dict{float}, optional
Measurement uncertainty (\(1\sigma\)) to use for proxy observations if not specified in
proxy_stdcolumn ofsample_df. To set a different value for each proxy, pass a dictionary with proxy names as keys. Defaults to 0.1.- approximate: bool, optional
Build model with an unapproximated latent GP (
pymc.gp.Latent) ifFalse, or a Hilbert space Gaussian process approximation (pymc.gp.HSGP) ifTrue; defaults toFalse. If using the HSGP approximation, also pass thehsgp_mandhsgp_cparameters (the defaults are unlikely to work well for all problems). Appropriate values formandccan be estimated usingapprox_hsgp_hyperparams().- hsgp_m: int or dict{int}, optional
Number of basis vectors to use in the HSGP approximation (see
pymc.gp.HSGP). Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 15.- hsgp_c: float or dict{float}, optional
Proportion extension factor for the HSGP approximation (see
pymc.gp.HSGP). Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 1.3.- ls_dist: str or dict{str}, optional
Prior distribution for the lengthscale hyperparameter of the exponential quadratic covariance kernel (
pymc.gp.cov.ExpQuad); set toWald(pymc.Wald) orHalfNormal(pymc.HalfNormal). Defaults toWaldwithmu = 20andlambda = 50. To changemuandlambda, pass thels_muandls_lambdaparameters. ForHalfNormal, the variance defaults tosigma = 50; change by passingls_sigma. Pass as a dictionary with proxy names as keys to specify a different prior distribution for each proxy.- ls_min: float or dict{float}, optional
Minimum value for the lengthscale hyperparameter of the
pymc.gp.cov.ExpQuadcovariance kernel; shifts the lengthscale prior byls_min. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 0.- ls_mu: float or dict{float}, optional
Mean (mu) of the
pymc.gp.cov.ExpQuadlengthscale prior ifls_dist = `Wald`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 20.- ls_lambda: float or dict{float}, optional
Relative precision (lam) of the
pymc.gp.cov.ExpQuadlengthscale hyperparameter prior ifls_dist = `Wald`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 50.- ls_sigma: float or dict{float}, optional
Scale parameter (sigma) of the
pymc.gp.cov.ExpQuadlengthscale hyperparameter prior ifls_dist = `HalfNormal`. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 50.- var_sigma: float or dict{float}, optional
Scale parameter (‘sigma’) of the covariance kernel variance hyperparameter prior, which is a
pymc.HalfNormaldistribution. Pass as a dictionary with proxy names as keys to specify a different value for each proxy. Defaults to 10.- white_noise_sigma: float or dict{float}, optional
Amplitude of white noise component of GP covariance function. Defaults to 0.1. Should be equal to or less than the proxy measurement uncertainty; use a smaller value for proxies with low-amplitude signals (e.g., Sr isotopes). Pass as a dictionary with proxy names as keys to specify a different value for each proxy.
- gp_mean_mu: float or dict{float}, optional
Mean (mu) of the GP mean function prior, which is a
pymc.Normaldistribution. Defaults to the mean of the observations for each proxy. Pass as a dictionary with proxy names as keys to specify a different value for each proxy.- gp_mean_sigma: float or dict{float}, optional
Standard deviation (sigma) of the GP mean function prior, which is a
pymc.Normaldistribution. Defaults to twice the standard deviation of the observations for each proxy. Pass as a dictionary with proxy names as keys to specify a different value for each proxy.- offset_type: str or dict{str}, optional
Parameterize offset such that all samples from the same section have the same offset (set to
section), or such that custom sample groups share an offset term (set togroups, and specify the group for each sample and proxy with aoffset_group_proxycolumn insample_df). To omit the offset term, set tonone. Defaults tosection. Pass as a dictionary with proxy names as keys to specify a different offset type for each proxy.- offset_prior: str or dict{str}, optional
Type of distribution to use for the offset prior. Pass as a
stringto use the same prior for all proxies, or as adictofstring(with proxy names as keys) to specify a different prior distribution for each proxy. Defaults toLaplace(pymc.Laplace) withmu = 0andb = 2.muandbcan be changed by passingoffset_muandoffset_b. To use other types of priors, pass the name of a distribution frompymc.distributions, along with parameter values inoffset_params. Pass as a dictionary with proxy names as keys to specify a different prior for each proxy.- offset_params: dict{float} or dict{dict{float}}, optional
Only required if using a custom
offset_prior. Pass as a dictionary to use the same parameters for all proxies, or as a dictionary of dictionaries (with proxy names as keys) to specify different parameters for each proxy. Keys areparam_1_name,param_1,param_2_name, andparam_2(with parameter names corresponding to those required for the specifiedpymc.distributionsobject). If only one parameter is required, usenp.nanforparam_2_nameandparam_2.- noise_type: str or dict{str}, optional
Parameterize noise as per-section (set to
section) or per-group (set togroups, and specify the group for each sample and proxy with anoise_group_proxycolumn insample_df). Defaults tosection. Pass as a dictionary with proxy names as keys to specify a different noise type for each proxy.- noise_prior: str or dict{str}, optional
Type of distribution to use for the noise prior. Pass as a
stringto use the same prior for all proxies, or as adictofstring(with proxy names as keys) to specify a different prior distribution for each proxy. Defaults toHalfCauchy(pymc.HalfCauchy) withbeta = 1.betacan be changed by passingnoise_beta.Other implemented priors (note that noise must be positive-only) are``HalfStudentT`` (
pymc.HalfStudentT; defaults tonoise_nu = 1andnoise_sigma_studentT= 1) andHalfNormal(pymc.HalfNormal). By default, theHalfNormalprior hassigmaequal to the standard deviation of the data associated with each noise term.sigmacan be changed by passingnoise_sigma. For sections with only 1 sample, the standard deviation of the noise prior is set bynoise_sigma_single_sample(defaults to 1).- superposition_dict: dict{list(str)}, optional
Optional; dictionary specifying superposition relationships between different sections. Should only be used when superposition is not implicitly enforced by the age constraints; for example, when sections share the same minimum and maximum age constraints, but are from geological formations with a known stratigraphic relationship. Dictionary keys are section names, and the value for each key is a list of sections that must be older (stratigraphically lower).
- jitter: float, optional
Value of
jitterpassed topymc.gp.Latent.prior(). Defaults to 0.001.- proxy_observed: bool, optional
Whether to pass observed values to the likelihood function; defaults to
True. Only set toFalseto generate synthetic observations from the model prior insynthetic_observations_from_prior()instratmc.synthetics.
- Returns:
- model: PyMC model
pymc.model.core.Modelobject.- gp: pymc.gp.Latent or pymc.gp.HSGP
Gaussian process prior for the model.
pymc.gp.Latentifapproximate = True, orpymc.gp.HSGPifapproximate = False.
- stratmc.model.build_prior_age_model(sample_df, ages_df, proxies=['d13c'], **kwargs)[source]#
Create a prior age model for each section. Omits the proxy signal component of the statistical model.
- Parameters:
- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for all sections. Load from .csv file usingload_data()instratmc.data.- ages_df: pandas.DataFrame
pandas.DataFramecontaining age constraints for all sections. Load from .csv file usingload_data()instratmc.data.- proxies: str or list(str), optional
Column or columns containing proxy data in
sample_df. Defaults to ‘d13c’.- sections:: list(str) or numpy.array(str), optional
List of sections to include in the inference model. Defaults to all sections in
sample_df.
- Returns:
- prior_age_model: PyMC model
pymc.model.core.Modelobject.
- stratmc.model.get_valid_initial_ages(detrital_age_dist_names, intrusive_age_dist_names, maximum_age_dist_name, minimum_age_dist_name, sample_age_dist_name, sample_heights, detrital_heights, intrusive_heights, model, interval, sf1_name, sf2_name, shared_radiometric_age_dist)[source]#
Helper function that resets the initial sample age values (in
Model.initial_point()) such that all detrital and intrusive age constraints are respected.- Parameters:
- detrital_age_dist_names: list(str)
List of names for detrital age constraint distributions in
model.- intrusive_age_dist_names: list(str)
List of names for intrusive age constraint distributions in
model.- maximum_age_dist_name: str
Name of distribution for underlying maximum age constraint in
model.- minimum_age_dist_name: str
Name of distribution for overlying minimum age constraint in
model.- sample_age_dist_name: str
Name of sample age distribution (unsorted and unscaled) in the pymc.Model object.
- sample_heights: np.array
Array of heights for samples in the current interval.
- detrital_heights: list(float)
Heights of detrital age constraints.
- intrusive_heights: list(float)
Heights of intrusive age constraints.
- model: pymc.Model
pymc.model.core.Modelobject associated with the input distributions.- section: str
Name of the current section.
- interval: int
Current interval number.
- sf1_name: str
Name of the distribution associated with scaling factor 1 in
model.- sf2_name: str
Name of the distribution associated with scaling factor 2 in
model.- shared_radiometric_age_dist: bool, optional
Whether the radiometric age distributions are part of a single object (versus initiated as separate distributions). Defaults to
True.
- stratmc.model.intermediate_detrital_potential(detrital_age_dist, detrital_age_dist_name, sample_age_dist_sorted, sample_heights, detrital_height, section)[source]#
Helper function for enforcing detrital age constraints (maximum age constraint for all overlying samples; no constraint on underlyig samples) in the middle of a section.
- Parameters:
- detrital_age_dist: pymc.distributions
Detrital age distribution.
- detrital_age_dist_name: str
Name of
detrital_age_distinmodel.- sample_age_dist_sorted: pymc.distribtions
Sorted and scaled sample age distribitions.
- sample_age_dist: pymc.distributions
Unsorted and unscaled sample age distributions.
- sample_age_dist_name: str
Name of
sample_age_distin the pymc.Model object.- sample_heights: np.array
Array of heights for samples in the current interval.
- detrital_height: float
Height of detrital age constraint.
- section: str
Name of the current section.
- stratmc.model.intermediate_intrusive_potential(intrusive_age_dist, intrusive_age_dist_name, sample_age_dist_sorted, sample_heights, intrusive_height, section)[source]#
Helper function for enforcing intrusive age constraints (minimum age constraint for all underlying samples; no constraint on overlying samples) in the middle of a section.
- Parameters:
- intrusive_age_dist: pymc.distributions
Intrusive age distribution.
- intrusive_age_dist_name: str
Name of
intrusive_age_distinmodel.- sample_age_dist_sorted: pymc.distribtions
Sorted and scaled sample age distribitions.
- sample_heights: np.array
Array of heights for samples in the current interval.
- intrusive_height: float
Height of intrusive age constraint.
- section: str
Name of the current section.
- stratmc.model.superposition(age_dist, age_dist_names, model, section_age_df, section)[source]#
Helper function for explicitly enforcing stratigraphic superposition (any given sample must be younger than the underlying sample) for a group of radiometric age constraints. Each constraint must have a unique name in model.
- Parameters:
- age_dist: pymc.distribtions
pytensor.tensorcontaining radiometric age distributions (must be in stratigraphic order - lowest to highest).- age_dist_names: list(str)
Names of each entry in
age_distinmodel.- model: pymc.Model
pymc.model.core.Modelobject associated with the distributions inage_dist.- section_age_df: pandas.DataFrame
pandas.DataFramecontaining age constraints forsection.- section: str
Name of the section containing the radiometric age constraints.
- stratmc.model.superposition_depositional_and_limiting_ages(model, depositional_age_name, detrital_age_dist_names, intrusive_age_dist_names, section, depositional_age_idx=None)[source]#
Helper function for ensuring that depositional age constraints respect superposition with limiting age constraints. For example, a depositional age most be younger than any underlying detrital age constraints, even if their ages overlap within measurement uncertainty.
- Parameters:
- model: pymc.Model
pymc.model.core.Modelobject associated with the input distributions.- depositional_age_name: str
Name of distribution for target depositional age constraint in
model.- detrital_age_dist_names: list(str)
Names of underlying detrital age constraint distributions in
model.- intrusive_age_dist_names: list(str)
Names of overlying intrusive age constraint distributions in
model.- section: str
Name of section containing the target age constraints.
- depositional_age_idx: int
Position of
depositional_agein model variabledepositional_age_name. Only required ifdepositional_ageis one of multiple ages modeled using a single multidimensional distribution.
- stratmc.model.untransformed_initval(var_name, model)[source]#
Helper function to retrieve the untransformed initial values for a random variable in a
pymc.model.core.Modelobject. Applies backward transform to the transformed initial values.- Parameters:
- var_name: str
Name of variable (untransformed, as specified in
model).- model: PyMC model
pymc.model.core.Modelobject that contains the target random variable.