Tools for generating synthetic data#

Functions for creating synthetic proxy signals/stratigraphic observations and evaluating model performance for synthetic tests.

make_excursion

Function for generating a synthetic proxy signal that contains a number of user-specified excursions.

synthetic_sections

Function for generating synthetic proxy observations and age constraints using a predefined proxy signal.

synthetic_observations_from_prior

Given age constraints for a set of stratigraphic sections in ages_df, generate synthetic proxy observations by sampling the model prior.

synthetic_signal_from_prior

Draws synthetic signals from the model prior, and returns the signal conditioned over the points in ages.

quantify_signal_recovery

Calculates the likelihood of the true proxy signal (for synthetic tests, where the true signal is known) conditioned on the posterior (default) or prior proxy signal inference.

sample_age_recovery

Calculates the likelihood of the true sample ages (for synthetic tests, where the true age of each sample is known) given draws from the posterior (default) or prior.

sample_age_residuals

Calculates the residual (for each draw) between the true age and the posterior (default) or prior age of each sample.

synthetic_signal_to_df

Helper function for generating artificial sample and age data using synthetic_sections().

stratmc.synthetics.make_excursion(time, amplitude, baseline=0, rising_time=None, rate_offset=True, excursion_duration=None, min_duration=1, smooth=False, smoothing_factor=10, seed=None)[source]#

Function for generating a synthetic proxy signal that contains a number of user-specified excursions.

Parameters:
time: numpy.array(float)

Time vector over which to generate proxy signal.

amplitude: float, list(float), or numpy.array(float)

Amplitude of excursion; pass a list or array to generate multiple excursions.

baseline: float, optional

Baseline proxy value. Defaults to 0.

rising_time: float, list(float), or numpy.array(float), optional

Fraction of excursion duration spent on the rising limb (linear increase/decrease toward peak). Must be between 0 and 1. If not provided, randomly generated if rate_offset is True and set to 0.5 if rate_offset is False. Pass a list to specify different rising times for each excursion.

rate_offset: bool, optional

If False, rising and falling limbs of excursion have equal duration. If True, the fraction of the excursion duration spent on the rising limb is set by rising_time. Defaults to False.

excursion_duration: float, list(float), or numpy.array(float), optional

Duration of excursion; pass a list or array to generate multiple excursions. Random if not provided.

min_duration: float, optional

Minimum excursion duration if excursion_duration is not provided. Defaults to 1.

smooth: bool, optional

Whether to smooth excursion peaks. Defaults to False.

smoothing_factor: float, optional

Smoothing factor if smooth is True; higher values produce smoother signals. Defaults to 10.

seed: int, optional

Random seed used to generate signal.

Returns:
interp_proxy: np.array

Tracer signal interpolated to points in the time vector

stratmc.synthetics.quantify_signal_recovery(full_trace, true_signal, proxy='d13c', mode='posterior')[source]#

Calculates the likelihood of the true proxy signal (for synthetic tests, where the true signal is known) conditioned on the posterior (default) or prior proxy signal inference. The likelihood is evaluated at each age (the posterior signal and the true signal must be evaluated at the same ages). Provides a measure of signal recovery.

Parameters:
full_trace: arviz.InferenceData or list(arviz.InferenceData)

An arviz.InferenceData object containing the full set of prior and posterior samples from get_trace() in stratmc.inference. If passed as a list, the posterior draws for all traces will be combined when calculating posterior_likelihood.

true_signal: np.array

True values for the proxy signal, evaluated at the same ages as the posterior signal in full_trace.

proxy: str, optional

Tracer signal to evaluate. Defaults to ‘d13c’.

mode: str, optional

Whether to use the posterior or prior to calculate signal recovery. Defaults to ‘posterior’.

Returns:
posterior_likelihood: np.array

Array of posterior likelihoods (evaluated at each age).

stratmc.synthetics.sample_age_recovery(full_trace, sample_df, sections=None, mode='posterior')[source]#

Calculates the likelihood of the true sample ages (for synthetic tests, where the true age of each sample is known) given draws from the posterior (default) or prior. Provides a measure of age model recovery.

Parameters:
full_trace: arviz.InferenceData or list(arviz.InferenceData)

An arviz.InferenceData object containing the full set of prior and posterior samples from get_trace() in stratmc.inference. If passed as a list, the posterior draws for all traces will be combined when calculating posterior_likelihood.

sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for synthetic sections.

sections: list(str) or numpy.array(str), optional

List of sections to evaluate. Defaults to all sections in sample_df.

mode: str, optional

Whether to use the posterior or prior age models. Defaults to ‘posterior’.

Returns:
posterior_likelihood: dict{float} or np.array(float)

Posterior likelihoods for the true age of each sample. Returned as an array if only one section is evaluated, or a dictionary of arrays if multiple sections are evaluated.

stratmc.synthetics.sample_age_residuals(full_trace, sample_df, sections=None, mode='posterior')[source]#

Calculates the residual (for each draw) between the true age and the posterior (default) or prior age of each sample.

Parameters:
full_trace: arviz.InferenceData or list(arviz.InferenceData)

An arviz.InferenceData object containing the full set of prior and posterior samples from get_trace() in stratmc.inference. If passed as a list, the posterior draws for all traces will be combined when calculating age_residuals.

sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for synthetic sections.

sections: list(str) or numpy.array(str), optional

List of sections to evaluate. Defaults to all sections in sample_df.

mode: str, optional

Whether to use the posterior or prior age models. Defaults to ‘posterior’.

Returns:
age_residuals: np.array or dict{np.array}

Sample age residuals; shape is (number of samples, number of posterior draws). Returned as an array if only one section is evaluated, or a dictionary of arrays if multiple sections are evaluated.

stratmc.synthetics.synthetic_observations_from_prior(age_vector, ages_df, sample_heights=None, uniform_heights=False, samples_per_section=20, proxies=['d13c'], proxy_std=0.1, seed=None, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, white_noise_sigma=0.1, gp_mean_mu=0, gp_mean_sigma=10, approximate=False, hsgp_m=15, hsgp_c=1.3, offset_type='section', offset_prior='Laplace', offset_alpha=0, offset_beta=1, offset_sigma=1, offset_mu=0, offset_b=2, noise_type='section', noise_prior='HalfCauchy', noise_beta=1, noise_sigma=1, noise_nu=1, jitter=0.001, **kwargs)[source]#

Given age constraints for a set of stratigraphic sections in ages_df, generate synthetic proxy observations by sampling the model prior. Accepts all arguments that can be passed to build_model() in stratmc.model.

Parameters:
age_vector: np.array(float)

Vector of ages at which to evaluate synthetic proxy signal(s).

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for synthetic sections.

sample_heights: dict{list(float) or numpy.array(float)}, optional

Sample heights for each stratigraphic section in ages_df; must be a dictionary with section names as keys. Defaults to None, which results in either uniformly spaced or randomly spaced sample heights (depending on the uniform_heights argument).

uniform_heights: bool, optional

Whether to generate uniformly spaced (set to True) or randomly spaced (set to False) sample heights if dictionary of sample_heights not provided. Defaults to False (randomly spaced samples).

samples_per_section: int or dict(int), optional

Number of samples per section to generate if sample_heights not provided; either an integer (if the same for all sections) or a dictionary with section names as keys. Defaults to 20.

proxies: list(str), optional

List of proxies to generate synthetic observations for. Defaults to d13c.

proxy_std: float or dict(float), optional

Measurement uncertainty for each proxy; pass a dictionary of floats with the elements of proxies as keys to use a different value for each proxy, or an integer to use the same value for all proxies. Defaults to 0.1.

seed: int, optional

Seed to use while generating synthetic observations.

Returns:
signals: dict(float)

Tracers signals drawn from the model prior (evaluated at the points in age_vector) used to generate synthetic observations; dictionary keys are proxies.

sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for synthetic stratigraphic sections.

prior: arviz.InferenceData

An arviz.InferenceData object containing the prior draw from the model used to generate synthetic observations.

model: pymc.Model

pymc.model.core.Model object used to generate synthetic observations.

stratmc.synthetics.synthetic_sections(true_time, true_proxy, num_sections, num_samples, max_section_thickness, proxies=['d13c'], noise=False, noise_amp=0.1, min_constraints=2, max_constraints=3, seed=None, **kwargs)[source]#

Function for generating synthetic proxy observations and age constraints using a predefined proxy signal.

Parameters:
true_time: numpy.array(float)

True time vector for input signal.

true_proxy: numpy.array(float) or dict{numpy.array(float)}

True proxy vector for input signal. If generating synthetic data for multiple proxies, pass as a dictionary with proxy names as keys.

num_sections: int

Number of synthetic sections to generate.

num_samples: int

Number of samples per synthetic section.

max_section_thickness: float

Maximum thickness of synthetic sections.

proxies: str or list(str), optional

Column name(s) for synthetic proxy observations in sample_df. Defaults to ‘d13c’.

noise: bool, optional

Whether to add white noise to proxy observations. Defaults to False.

noise_amp: float or dict{float}, optional

Amplitude of white noise added to proxy observations (if noise is True). To specify a different noise amplitude for each proxy, pass as a dictionary with proxy names as keys. Defaults to 0.1.

min_constraints: int, optional

Minimum number of age constraints per synthetic section (must be at least 2). Defaults to 2.

max_constraints: int, optional

Maximum number of age constraints per synthetic section. Defaults to 3.

seed: int, optional

Random seed used to generate synthetic sections.

Returns:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for synthetic sections.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for synthetic sections.

stratmc.synthetics.synthetic_signal_from_prior(ages, num_signals=100, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, gp_mean_mu=0, gp_mean_sigma=5, seed=None)[source]#

Draws synthetic signals from the model prior, and returns the signal conditioned over the points in ages. To generate both signals and synthetic stratigraphic sections, instead use synthetic_observations_from_prior().

Parameters:
ages: numpy.array(float)

Array of ages over which to condition the signal.

num_signals: int, optional

Number of signals to draw from prior. Defaults to 100.

ls_dist: str, optional

Prior distribution for the lengthscale hyperparameter of the exponential quadratic covariance kernel (pymc.gp.cov.ExpQuad); set to Wald (pymc.Wald) or HalfNormal (pymc.HalfNormal). Defaults to Wald with mu = 20 and lambda = 50; to change mu and lambda, pass the ls_mu and ls_lambda parameters. For HalfNormal, the variance defaults to sigma = 50; change by passing ls_sigma.

ls_min: float, optional

Minimum value for the lengthscale hyperparameter of the pymc.gp.cov.ExpQuad covariance kernel; shifts the lengthscale prior by ls_min. Defaults to 0.

ls_mu: float, optional

Mean (mu) of the pymc.gp.cov.ExpQuad lengthscale prior if ls_dist = `Wald`. Defaults to 20.

ls_lambda: float, optional

Relative precision (lam) of the pymc.gp.cov.ExpQuad lengthscale hyperparameter prior if ls_dist = `Wald`. Defaults to 50.

ls_sigma: float, optional

Scale parameter (sigma) of the pymc.gp.cov.ExpQuad lengthscale hyperparameter prior if ls_dist = `HalfNormal`. Defaults to 50.

var_sigma: float, optional

Scale parameter (sigma’) of the covariance kernel variance hyperparameter prior, which is a :class:`pymc.HalfNormal distribution. Defaults to 10.

gp_mean_mu: float, optional

Mean (mu) of the GP mean function prior, which is a pymc.Normal distribution. Defaults to 0.

gp_mean_sigma: float, optional

Standard deviation (sigma) of the GP mean function prior, which is a pymc.Normal distribution. Defaults to 5.

seed: int, optional

Random seed used to generate signals.

Returns:
signal: numpy.ndarray(float)

Array with shape ages x number of signals containing the n = num_signals synthetic signals drawn from the prior.

stratmc.synthetics.synthetic_signal_to_df(proxy_vec, heights, section_ages, section_names, ages, age_std, age_heights, age_section_names, proxies=['d13c'])[source]#

Helper function for generating artificial sample and age data using synthetic_sections().

Parameters:
proxy_vec: np.array(float) or dict{np.array(float)}

Array of proxy observations. Pass as a dictionary if more than one proxy.

heights: np.array(float)

Array of heights corresponding to proxy observations in proxy_vec.

section_ages: np.array(float)

Array of ages corresponding to proxy observations in proxy_vec.

section_names: np.array(str)

Array of section names corresponding to proxy observations in proxy_vec.

ages: np.array(float)

Array of age constraints.

age_std: np.array(float)

Array of uncertainties for each age constraint in ages.

age_heights: np.array(float)

Array of heights for each age constraint in ages.

age_section_names: np.array(str)

Array of section names corresponding to age constraints in ages.

proxies: str or list(str), optional

Name(s) of proxies. Defaults to d13c.

Returns:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for synthetic sections.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for synthetic sections.