Tools for generating synthetic data#
Functions for creating synthetic proxy signals/stratigraphic observations and evaluating model performance for synthetic tests.
Function for generating a synthetic proxy signal that contains a number of user-specified excursions. |
|
Function for generating synthetic proxy observations and age constraints using a predefined proxy signal. |
|
Given age constraints for a set of stratigraphic sections in |
|
Draws synthetic signals from the model prior, and returns the signal conditioned over the points in |
|
Calculates the likelihood of the true proxy signal (for synthetic tests, where the true signal is known) conditioned on the posterior (default) or prior proxy signal inference. |
|
Calculates the likelihood of the true sample ages (for synthetic tests, where the true age of each sample is known) given draws from the posterior (default) or prior. |
|
Calculates the residual (for each draw) between the true age and the posterior (default) or prior age of each sample. |
|
Helper function for generating artificial sample and age data using |
- stratmc.synthetics.make_excursion(time, amplitude, baseline=0, rising_time=None, rate_offset=True, excursion_duration=None, min_duration=1, smooth=False, smoothing_factor=10, seed=None)[source]#
Function for generating a synthetic proxy signal that contains a number of user-specified excursions.
- Parameters:
- time: numpy.array(float)
Time vector over which to generate proxy signal.
- amplitude: float, list(float), or numpy.array(float)
Amplitude of excursion; pass a list or array to generate multiple excursions.
- baseline: float, optional
Baseline proxy value. Defaults to 0.
- rising_time: float, list(float), or numpy.array(float), optional
Fraction of excursion duration spent on the rising limb (linear increase/decrease toward peak). Must be between 0 and 1. If not provided, randomly generated if
rate_offsetisTrueand set to 0.5 ifrate_offsetisFalse. Pass a list to specify different rising times for each excursion.- rate_offset: bool, optional
If
False, rising and falling limbs of excursion have equal duration. IfTrue, the fraction of the excursion duration spent on the rising limb is set byrising_time. Defaults toFalse.- excursion_duration: float, list(float), or numpy.array(float), optional
Duration of excursion; pass a list or array to generate multiple excursions. Random if not provided.
- min_duration: float, optional
Minimum excursion duration if
excursion_durationis not provided. Defaults to 1.- smooth: bool, optional
Whether to smooth excursion peaks. Defaults to
False.- smoothing_factor: float, optional
Smoothing factor if
smoothisTrue; higher values produce smoother signals. Defaults to 10.- seed: int, optional
Random seed used to generate signal.
- Returns:
- interp_proxy: np.array
Tracer signal interpolated to points in the
timevector
- stratmc.synthetics.quantify_signal_recovery(full_trace, true_signal, proxy='d13c', mode='posterior')[source]#
Calculates the likelihood of the true proxy signal (for synthetic tests, where the true signal is known) conditioned on the posterior (default) or prior proxy signal inference. The likelihood is evaluated at each age (the posterior signal and the true signal must be evaluated at the same ages). Provides a measure of signal recovery.
- Parameters:
- full_trace: arviz.InferenceData or list(arviz.InferenceData)
An
arviz.InferenceDataobject containing the full set of prior and posterior samples fromget_trace()instratmc.inference. If passed as a list, the posterior draws for all traces will be combined when calculating posterior_likelihood.- true_signal: np.array
True values for the proxy signal, evaluated at the same ages as the posterior signal in
full_trace.- proxy: str, optional
Tracer signal to evaluate. Defaults to ‘d13c’.
- mode: str, optional
Whether to use the posterior or prior to calculate signal recovery. Defaults to ‘posterior’.
- Returns:
- posterior_likelihood: np.array
Array of posterior likelihoods (evaluated at each age).
- stratmc.synthetics.sample_age_recovery(full_trace, sample_df, sections=None, mode='posterior')[source]#
Calculates the likelihood of the true sample ages (for synthetic tests, where the true age of each sample is known) given draws from the posterior (default) or prior. Provides a measure of age model recovery.
- Parameters:
- full_trace: arviz.InferenceData or list(arviz.InferenceData)
An
arviz.InferenceDataobject containing the full set of prior and posterior samples fromget_trace()instratmc.inference. If passed as a list, the posterior draws for all traces will be combined when calculating posterior_likelihood.- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for synthetic sections.- sections: list(str) or numpy.array(str), optional
List of sections to evaluate. Defaults to all sections in sample_df.
- mode: str, optional
Whether to use the posterior or prior age models. Defaults to ‘posterior’.
- Returns:
- posterior_likelihood: dict{float} or np.array(float)
Posterior likelihoods for the true age of each sample. Returned as an array if only one section is evaluated, or a dictionary of arrays if multiple sections are evaluated.
- stratmc.synthetics.sample_age_residuals(full_trace, sample_df, sections=None, mode='posterior')[source]#
Calculates the residual (for each draw) between the true age and the posterior (default) or prior age of each sample.
- Parameters:
- full_trace: arviz.InferenceData or list(arviz.InferenceData)
An
arviz.InferenceDataobject containing the full set of prior and posterior samples fromget_trace()instratmc.inference. If passed as a list, the posterior draws for all traces will be combined when calculating age_residuals.- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for synthetic sections.- sections: list(str) or numpy.array(str), optional
List of sections to evaluate. Defaults to all sections in sample_df.
- mode: str, optional
Whether to use the posterior or prior age models. Defaults to ‘posterior’.
- Returns:
- age_residuals: np.array or dict{np.array}
Sample age residuals; shape is (number of samples, number of posterior draws). Returned as an array if only one section is evaluated, or a dictionary of arrays if multiple sections are evaluated.
- stratmc.synthetics.synthetic_observations_from_prior(age_vector, ages_df, sample_heights=None, uniform_heights=False, samples_per_section=20, proxies=['d13c'], proxy_std=0.1, seed=None, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, white_noise_sigma=0.1, gp_mean_mu=0, gp_mean_sigma=10, approximate=False, hsgp_m=15, hsgp_c=1.3, offset_type='section', offset_prior='Laplace', offset_alpha=0, offset_beta=1, offset_sigma=1, offset_mu=0, offset_b=2, noise_type='section', noise_prior='HalfCauchy', noise_beta=1, noise_sigma=1, noise_nu=1, jitter=0.001, **kwargs)[source]#
Given age constraints for a set of stratigraphic sections in
ages_df, generate synthetic proxy observations by sampling the model prior. Accepts all arguments that can be passed tobuild_model()instratmc.model.- Parameters:
- age_vector: np.array(float)
Vector of ages at which to evaluate synthetic proxy signal(s).
- ages_df: pandas.DataFrame
pandas.DataFramecontaining age constraints for synthetic sections.- sample_heights: dict{list(float) or numpy.array(float)}, optional
Sample heights for each stratigraphic section in
ages_df; must be a dictionary with section names as keys. Defaults toNone, which results in either uniformly spaced or randomly spaced sample heights (depending on theuniform_heightsargument).- uniform_heights: bool, optional
Whether to generate uniformly spaced (set to
True) or randomly spaced (set toFalse) sample heights if dictionary ofsample_heightsnot provided. Defaults toFalse(randomly spaced samples).- samples_per_section: int or dict(int), optional
Number of samples per section to generate if
sample_heightsnot provided; either an integer (if the same for all sections) or a dictionary with section names as keys. Defaults to 20.- proxies: list(str), optional
List of proxies to generate synthetic observations for. Defaults to d13c.
- proxy_std: float or dict(float), optional
Measurement uncertainty for each proxy; pass a dictionary of floats with the elements of
proxiesas keys to use a different value for each proxy, or an integer to use the same value for all proxies. Defaults to 0.1.- seed: int, optional
Seed to use while generating synthetic observations.
- Returns:
- signals: dict(float)
Tracers signals drawn from the model prior (evaluated at the points in
age_vector) used to generate synthetic observations; dictionary keys areproxies.- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for synthetic stratigraphic sections.- prior: arviz.InferenceData
An
arviz.InferenceDataobject containing the prior draw from the model used to generate synthetic observations.- model: pymc.Model
pymc.model.core.Modelobject used to generate synthetic observations.
- stratmc.synthetics.synthetic_sections(true_time, true_proxy, num_sections, num_samples, max_section_thickness, proxies=['d13c'], noise=False, noise_amp=0.1, min_constraints=2, max_constraints=3, seed=None, **kwargs)[source]#
Function for generating synthetic proxy observations and age constraints using a predefined proxy signal.
- Parameters:
- true_time: numpy.array(float)
True time vector for input signal.
- true_proxy: numpy.array(float) or dict{numpy.array(float)}
True proxy vector for input signal. If generating synthetic data for multiple proxies, pass as a dictionary with proxy names as keys.
- num_sections: int
Number of synthetic sections to generate.
- num_samples: int
Number of samples per synthetic section.
- max_section_thickness: float
Maximum thickness of synthetic sections.
- proxies: str or list(str), optional
Column name(s) for synthetic proxy observations in
sample_df. Defaults to ‘d13c’.- noise: bool, optional
Whether to add white noise to proxy observations. Defaults to
False.- noise_amp: float or dict{float}, optional
Amplitude of white noise added to proxy observations (if
noiseisTrue). To specify a different noise amplitude for each proxy, pass as a dictionary with proxy names as keys. Defaults to 0.1.- min_constraints: int, optional
Minimum number of age constraints per synthetic section (must be at least 2). Defaults to 2.
- max_constraints: int, optional
Maximum number of age constraints per synthetic section. Defaults to 3.
- seed: int, optional
Random seed used to generate synthetic sections.
- Returns:
- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for synthetic sections.- ages_df: pandas.DataFrame
pandas.DataFramecontaining age constraints for synthetic sections.
- stratmc.synthetics.synthetic_signal_from_prior(ages, num_signals=100, ls_dist='Wald', ls_min=0, ls_mu=20, ls_lambda=50, ls_sigma=50, var_sigma=10, gp_mean_mu=0, gp_mean_sigma=5, seed=None)[source]#
Draws synthetic signals from the model prior, and returns the signal conditioned over the points in
ages. To generate both signals and synthetic stratigraphic sections, instead usesynthetic_observations_from_prior().- Parameters:
- ages: numpy.array(float)
Array of ages over which to condition the signal.
- num_signals: int, optional
Number of signals to draw from prior. Defaults to 100.
- ls_dist: str, optional
Prior distribution for the lengthscale hyperparameter of the exponential quadratic covariance kernel (
pymc.gp.cov.ExpQuad); set toWald(pymc.Wald) orHalfNormal(pymc.HalfNormal). Defaults toWaldwithmu = 20andlambda = 50; to changemuandlambda, pass thels_muandls_lambdaparameters. ForHalfNormal, the variance defaults tosigma = 50; change by passingls_sigma.- ls_min: float, optional
Minimum value for the lengthscale hyperparameter of the
pymc.gp.cov.ExpQuadcovariance kernel; shifts the lengthscale prior byls_min. Defaults to 0.- ls_mu: float, optional
Mean (mu) of the
pymc.gp.cov.ExpQuadlengthscale prior ifls_dist = `Wald`. Defaults to 20.- ls_lambda: float, optional
Relative precision (lam) of the
pymc.gp.cov.ExpQuadlengthscale hyperparameter prior ifls_dist = `Wald`. Defaults to 50.- ls_sigma: float, optional
Scale parameter (sigma) of the
pymc.gp.cov.ExpQuadlengthscale hyperparameter prior ifls_dist = `HalfNormal`. Defaults to 50.- var_sigma: float, optional
Scale parameter (sigma’) of the covariance kernel variance hyperparameter prior, which is a :class:`pymc.HalfNormal distribution. Defaults to 10.
- gp_mean_mu: float, optional
Mean (mu) of the GP mean function prior, which is a
pymc.Normaldistribution. Defaults to 0.- gp_mean_sigma: float, optional
Standard deviation (sigma) of the GP mean function prior, which is a
pymc.Normaldistribution. Defaults to 5.- seed: int, optional
Random seed used to generate signals.
- Returns:
- signal: numpy.ndarray(float)
Array with shape
ages x number of signalscontaining then = num_signalssynthetic signals drawn from the prior.
- stratmc.synthetics.synthetic_signal_to_df(proxy_vec, heights, section_ages, section_names, ages, age_std, age_heights, age_section_names, proxies=['d13c'])[source]#
Helper function for generating artificial sample and age data using
synthetic_sections().- Parameters:
- proxy_vec: np.array(float) or dict{np.array(float)}
Array of proxy observations. Pass as a dictionary if more than one proxy.
- heights: np.array(float)
Array of heights corresponding to proxy observations in
proxy_vec.- section_ages: np.array(float)
Array of ages corresponding to proxy observations in
proxy_vec.- section_names: np.array(str)
Array of section names corresponding to proxy observations in
proxy_vec.- ages: np.array(float)
Array of age constraints.
- age_std: np.array(float)
Array of uncertainties for each age constraint in
ages.- age_heights: np.array(float)
Array of heights for each age constraint in
ages.- age_section_names: np.array(str)
Array of section names corresponding to age constraints in
ages.- proxies: str or list(str), optional
Name(s) of proxies. Defaults to d13c.
- Returns:
- sample_df: pandas.DataFrame
pandas.DataFramecontaining proxy data for synthetic sections.- ages_df: pandas.DataFrame
pandas.DataFramecontaining age constraints for synthetic sections.