mocca.dad_data package

Submodules

mocca.dad_data.models module

Created on Tue Aug 3 13:16:51 2021

@author: haascp

class mocca.dad_data.models.CompoundData(hplc_system_tag: str, experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput'], wl_high_pass: dataclasses.InitVar[float] = None, wl_low_pass: dataclasses.InitVar[float] = None)[source]

Bases: DadData

Data container for HPLC-DAD data with peaks originating from compounds.

data: ndarray

experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput']

hplc_system_tag: str

path: str

time: ndarray

warnings: List[str]

wavelength: ndarray

class mocca.dad_data.models.DadData(hplc_system_tag: str, experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput'], wl_high_pass: dataclasses.InitVar[float] = None, wl_low_pass: dataclasses.InitVar[float] = None)[source]

Bases: object

Base class for HPLC-DAD data.

data: ndarray

experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput']

hplc_system_tag: str

path: str

time: ndarray

warnings: List[str]

wavelength: ndarray

wl_high_pass: dataclasses.InitVar[float] = None

wl_low_pass: dataclasses.InitVar[float] = None

class mocca.dad_data.models.GradientData(hplc_system_tag: str, experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput'], wl_high_pass: dataclasses.InitVar[float] = None, wl_low_pass: dataclasses.InitVar[float] = None)[source]

Bases: DadData

Data container for gradient HPLC-DAD data.

original_data: ndarray

class mocca.dad_data.models.ParafacData(impure_peak: dataclasses.InitVar['mocca.peak.models.CorrectedPeak'], parafac_comp_tensor: dataclasses.InitVar[tuple], boundaries: dataclasses.InitVar[tuple], shift: dataclasses.InitVar[int], y_offset: dataclasses.InitVar[float])[source]

Bases: object

Data container for synthetic data generated from PARAFAC models.

boundaries: dataclasses.InitVar[tuple]

impure_peak: dataclasses.InitVar['mocca.peak.models.CorrectedPeak']

parafac_comp_tensor: dataclasses.InitVar[tuple]

shift: dataclasses.InitVar[int]

y_offset: dataclasses.InitVar[float]

mocca.dad_data.process_funcs module

mocca.dad_data.process_funcs.get_peak_locs(summed_data)[source]

Finds all peaks of data.

Parameters:: summed_data (numpy.ndarray) – A 1D array representing the absorbances over time. Best used on data that already had data below threshold zeroed (see function filter_absorbance_by_threshold).
Returns:: peaks – List of all peaks, as a list of BasePeak classes
Return type:: list

mocca.dad_data.process_funcs.merge_peaks(summed_data, peaks)[source]

Merges overlapping peaks in the data.

Parameters:

summed_data (numpy.ndarray) – A 1D array representing the absorbances over time. Best used on data that already had data below threshold zeroed (see function filter_absorbance_by_threshold).
peaks (list) – List of all peaks as BasePeak objects

Returns:

new_peaks – List of all peaks in dictionary format with keys maximum, left, and right. Peaks that overlap are merged together into one BasePeak.

Return type:

list

mocca.dad_data.process_funcs.pick_peaks(compound_data, experiment, absorbance_threshold, peaks_high_pass, peaks_low_pass)[source]

Finds all peaks of data and returns them as a chromatogram

Parameters:

data (numpy.ndarray) – Actual experimental data with shape [# of wavelengths] x [timepoints]. Generated from dataframe with absorbance_to_array function
absorbance_threshold (float) – The threshold below which peaks will. In other words, at at least one (wavelength, timepoint) will have absorbance greater than absorbance_threshold in order to be counted as a peak.
peaks_high_pass (float) – Time high pass filter only using peaks with a retention time greater than the here given value for data analysis
peaks_low_pass (float) – Time low pass filter only using peaks with a retention time lower than the here given value for data analysis
expand_peaks (boolean) – If True, then peaks will be expanded to their peak boundaries. If this is set to False, then only timepoints with cumulative absorbance greater than absorbance_threshold will be counted as part of the peak.

Returns:

peaks – List of all peaks, as a list of tuples (maximum, left, right)

Return type:

list

mocca.dad_data.process_gradientdata module

Created on Wed Aug 4 15:44:47 2021

@author: haascp

mocca.dad_data.process_gradientdata.bsl_als(absorbance_array)[source]

Applies the baseline als algorithm row-wise (for every wavelength) on an absorbance array

Parameters:: absorbance_arry (numpy 2D-array) – Absorbance values obtained by an HPLC run (time, wavelength dimension).
Returns:: baseline_array – Baseline absorbance values
Return type:: numpy 2D-array

mocca.dad_data.process_gradientdata.bsl_als_alg(y, lam=100000.0, p=0.01, niter=3)[source]

Baseline correction algorithm: Optimized Python implementation of “Asymmetric Least Squares Smoothing” by P. Eilers and H. Boelens in 2005: https://stackoverflow.com/questions/29156532/python-baseline-correction-library, answer by Rustam Guliev.

Parameters:

y (list) – List of absorbance values for which the baseline should be determined.
lam (numeric, optional) – Smoothness parameter. 10^2 ≤ λ ≤ 10^9, but exceptions may occur. In any case one should vary λ on a grid that is approximately linear for log λ. Often visual inspection is sufficient for good values. The default is 1e6.
p (numeric, optional) – Asymmetry parameter. 0.001 ≤ p ≤ 0.1 (for a signal with positive peaks), but exceptions may occur. Often visual inspection is sufficient to get good parameter values. The default is 0.01.
niter (integer, optional) – To emphasize the basic simplicity of the algorithm, the number of iterations has been fixed to 10 (original documentation). In practical applications one should check whether the weights show any change; if not, convergence has been attained. The default is 3.

Returns:

z – Simulated baseline of the given absorbance data.

Return type:

list

mocca.dad_data.utils module

Created on Fri Dec 10 13:31:37 2021

@author: haascp

mocca.dad_data.utils.absorbance_to_array(df)[source]: Generates a 2D absorbance array of the absorbance values.

mocca.dad_data.utils.apply_filter(dataframe, wl_high_pass, wl_low_pass, bandwidth=2, reference_wl=True)[source]: Filters absorbance data of tidy 3D DAD dataframes to remove noise and background systematic error.

mocca.dad_data.utils.df_to_array(df)[source]: Takes a tidy dataframe of HPLC-DAD data and returns a numpy array of ” absorbance values as well as a vector for the time domain and a vector for ” the wavelength domain.

mocca.dad_data.utils.get_reference_signal(dataframe, bandwidth=5)[source]: Returns the averaged signal over the last number of wavelengths as given by the bandwidth.

mocca.dad_data.utils.sum_absorbance_by_time(data)[source]

Sums the absorbances for each time point over all wavelengths

Parameters:: data (numpy.ndarray) – Actual experimental data with shape [# of wavelengths] x [timepoints]. Generated from dataframe with absorbance_to_array function
Returns:: A 1D array containing the sum of wavelengths at each time point
Return type:: numpy.ndarray

mocca.dad_data.utils.trim_data(data, time, length)[source]: Trims the 2D DADData in the time dimension to the length provided.

mocca.dad_data package

Subpackages

Submodules

mocca.dad_data.models module

mocca.dad_data.process_funcs module

mocca.dad_data.process_gradientdata module

mocca.dad_data.utils module

Module contents