Loading and processing data#

Functions for importing and processing proxy and age constraint data.

load_data

Import and pre-process proxy data and age constraints from .csv files formatted according to the Data table formatting guidelines.

combine_data

Helper function for merging pandas.DataFrame objects containing proxy observations or age constraints.

load_object

Custom load command for pickle (.pkl) object (variables can be saved as .pkl files with save_object()).

load_trace

Custom load command for NetCDF file containing a trace (arviz.InferenceData object saved with save_trace()).

save_object

Save variable as a pickle (.pkl) object.

save_trace

Save trace (arviz.InferenceData object) as a NetCDF file.

combine_traces

Helper function for combining multiple arviz.InferenceData objects (saved as NetCDF files) that contain prior and posterior samples for the same inference model (sampled with get_trace() in stratmc.inference).

drop_chains

Remove a subset of chains from a arviz.InferenceData object.

thin_trace

Remove a subset of draws from a arviz.InferenceData object.

accumulation_rate

Calculate apparent sediment accumulation rate between successive samples (if method = 'successive') or every possible sample pairing (method = 'all').

clean_data

Helper function for cleaning sample data before running an inversion.

depth_to_height

Helper function for converting depth in core to height in section.

combine_duplicates

Helper function for combining multiple proxy measurements from the same stratigraphic horizon.

stratmc.data.accumulation_rate(full_trace, sample_df, ages_df, method='all', age_model='posterior', include_age_constraints=True, **kwargs)[source]#

Calculate apparent sediment accumulation rate between successive samples (if method = 'successive') or every possible sample pairing (method = 'all').

Note that if method = 'all', rate is returned in mm/year, and duration is returned in years. If method = 'successive', rate is returned in m/Myr, and duration is returned in Myr. Input data are assumed to have units of meters and millions of years. Used as input to sadler_plot() and accumulation_rate_stratigraphy() in stratmc.plotting.

Parameters:
full_trace: arviz.InferenceData

An arviz.InferenceData object containing the full set of prior and posterior samples from get_trace() in stratmc.inference.

sample_df: pandas.DataFrame

pandas.DataFrame containing all proxy data.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints from all sections.

method: str, optional

Whether to calculate accumulation rates between every possible sample pairing (‘all`), or between successive samples (‘successive`); defaults to ‘all`.

age_model: str, optional

Whether to calculate accumulation rates using the the posterior or prior age model for each section; defaults to ‘posterior`.

include_age_constraints: bool, optional

Whether to include radiometric age constraints in accumulation rate calculations; defaults to True.

sections: list(str) or numpy.array(str), optional

List of sections to include. Defaults to all sections in sample_df.

Returns:
rate_df: pandas.DataFrame

pandas.DataFrame containing sediment accumulation rates and associated durations.

stratmc.data.clean_data(sample_df, ages_df, proxies, sections)[source]#

Helper function for cleaning sample data before running an inversion. Sets Exclude? to True for samples with no relevant proxy observations, removes sections where all samples have been excluded, and drops excluded age constraints.

Parameters:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections.

proxies: str or list(str)

Tracers to include in the inference.

sections: list(str) or numpy.array(str)

List of sections to include in the inference (as named in sample_df and ages_df).

Returns:
sample_df: pandas.DataFrame

pandas.DataFrame containing cleaned proxy data for all sections.

ages_df: pandas.DataFrame

pandas.DataFrame containing cleaned age constraint data for all sections.

stratmc.data.combine_data(dataframes)[source]#

Helper function for merging pandas.DataFrame objects containing proxy observations or age constraints. Data are merged using the section and height columns.

Parameters:
dataframes: list(pandas.DataFrame)

List of pandas.DataFrame objects to merge.

Returns:
merged_data: pandas.DataFrame

pandas.DataFrame containing merged data.

stratmc.data.combine_duplicates(sample_df, proxies, proxy_sigma_default=0.1)[source]#

Helper function for combining multiple proxy measurements from the same stratigraphic horizon. For each horizon with multiple proxy values, replaces the proxy value with the mean, and replaces the standard deviation with the combined uncertainty (proxy_std values summed in quadrature) for all measurements. The standard deviation of the population of proxy values for each horizon is stored in the proxy_population_std column of sample_df (in build_model(), the uncertainty of each proxy observation is modeled as the proxy_std and proxy_population_std values summed in quadrature).

Parameters:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections.

proxies: list(str)

List of proxies to include in the inference.

proxy_sigma_default: float or dict{float}, optional

Measurement uncertainty (\(1\sigma\)) to use for proxy observations if not specified in proxy_std column of sample_df. To set a different value for each proxy, pass a dictionary with proxy names as keys. Defaults to 0.1.

Returns:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data with duplicates combined.

stratmc.data.combine_traces(trace_list)[source]#

Helper function for combining multiple arviz.InferenceData objects (saved as NetCDF files) that contain prior and posterior samples for the same inference model (sampled with get_trace() in stratmc.inference). The arviz.InferenceData objects are concatenated along the chain dimension such that if two traces with 8 chains each are concatenated, the new combined trace will have 16 chains.

Parameters:
trace_list: list(str)

List of paths to arviz.InferenceData objects (saved as NetCDF files) to be merged.

Returns:
combined_trace: arviz.InferenceData

New arviz.InferenceData object containing the prior and posterior draws for all traces in trace_list.

stratmc.data.depth_to_height(sample_df, ages_df)[source]#

Helper function for converting depth in core to height in section.

Parameters:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections.

Returns:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections, with depth in core converted to height in section.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections, with depth in core converted to height in section.

stratmc.data.drop_chains(full_trace, chains)[source]#

Remove a subset of chains from a arviz.InferenceData object.

Parameters:
full_trace: arviz.InferenceData

An arviz.InferenceData object containing the full set of prior and posterior samples from get_trace() in stratmc.inference.

chains: list or np.array of int

Indices of chains to remove from full_trace.

Returns:
full_trace_clean: arviz.InferenceData

Copy of full_trace without the chains specified in chains.

stratmc.data.load_data(sample_file, ages_file, proxies=['d13c'], proxy_sigma_default=0.1, drop_excluded_samples=False, drop_excluded_ages=True)[source]#

Import and pre-process proxy data and age constraints from .csv files formatted according to the Data table formatting guidelines. To combine data from different .csv files, load each file separately and then combine the DataFrames with combine_data().

If sample_file.csv includes multiple proxy observations from the same stratigraphic horizon (for a given proxy), then all measurements marked Exclude? = False will be combined using combine_duplicates().

Parameters:
sample_file: str

Path to .csv file containing proxy data for all sections (without ‘.csv` extension).

ages_file: str

Path to .csv file containing age constraints for all sections (without ‘.csv` extension).

proxies: str or list(str), optional

Tracer names (must match column headers in sample_file.csv); defaults to ‘d13c`.

proxy_sigma_default: float or dict{float}, optional

Measurement uncertainty (\(1\sigma\)) to use for proxy observations if not specified in proxy_std column of sample_df. To set a different value for each proxy, pass a dictionary with proxy names as keys. Defaults to 0.1.

drop_excluded_samples: bool, optional

Whether to remove samples with Exclude? = True from the sample_df; defaults to False. If excluded samples are not dropped, their ages will be passively tracked within the inference model (but they will not be considered during the proxy signal reconstruction).

drop_excluded_ages: bool, optional

Whether to remove ages with Exclude? = True from the ages_df; defaults to True.

Returns:
sample_df: pandas.DataFrame

pandas.DataFrame containing proxy data for all sections.

ages_df: pandas.DataFrame

pandas.DataFrame containing age constraints for all sections.

stratmc.data.load_object(path)[source]#

Custom load command for pickle (.pkl) object (variables can be saved as .pkl files with save_object()).

Parameters:
path: str

Path to saved .pkl file (without the ‘.pkl` extension).

Returns:
var:

Variable saved in path.

stratmc.data.load_trace(path)[source]#

Custom load command for NetCDF file containing a trace (arviz.InferenceData object saved with save_trace()).

Parameters:
path: str

Path to saved NetCDF file (without the ‘.nc` extension).

Returns:
trace: arviz.InferenceData

Trace saved as NetCDF file.

stratmc.data.save_object(var, path)[source]#

Save variable as a pickle (.pkl) object.

Parameters:
var:

Variable to be saved.

path: str

Location (including the file name, without ‘.pkl` extension) to save var.

stratmc.data.save_trace(trace, path)[source]#

Save trace (arviz.InferenceData object) as a NetCDF file.

Parameters:
trace: arviz.InferenceData

An arviz.InferenceData object containing the full set of prior and posterior samples from build_model() in stratmc.model (the output of get_trace() in stratmc.inference).

path: str

Location (including the file name, without ‘.nc` extension) to save trace.

stratmc.data.thin_trace(full_trace, drop_freq=2)[source]#

Remove a subset of draws from a arviz.InferenceData object. Only applies to groups associated with the posterior (the prior draws will not be affected).

Parameters:
full_trace: arviz.InferenceData

An arviz.InferenceData object containing the full set of prior and posterior samples from get_trace() in stratmc.inference.

drop_freq: int

Frequency of draw removal. For example, 2 will remove every other draw, while 4 will remove every fourth draw.

Returns:
thinned_trace: arviz.InferenceData

Thinned version of full_trace.