mocca.dad_data package

Subpackages

Submodules

mocca.dad_data.models module

Created on Tue Aug 3 13:16:51 2021

@author: haascp

class mocca.dad_data.models.CompoundData(hplc_system_tag: str, experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput'], wl_high_pass: dataclasses.InitVar[float] = None, wl_low_pass: dataclasses.InitVar[float] = None)[source]

Bases: DadData

Data container for HPLC-DAD data with peaks originating from compounds.

data: ndarray
experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput']
hplc_system_tag: str
path: str
time: ndarray
warnings: List[str]
wavelength: ndarray
class mocca.dad_data.models.DadData(hplc_system_tag: str, experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput'], wl_high_pass: dataclasses.InitVar[float] = None, wl_low_pass: dataclasses.InitVar[float] = None)[source]

Bases: object

Base class for HPLC-DAD data.

data: ndarray
experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput']
hplc_system_tag: str
path: str
time: ndarray
warnings: List[str]
wavelength: ndarray
wl_high_pass: dataclasses.InitVar[float] = None
wl_low_pass: dataclasses.InitVar[float] = None
class mocca.dad_data.models.GradientData(hplc_system_tag: str, experiment: dataclasses.InitVar['mocca.user_interaction.user_objects.HplcInput'], wl_high_pass: dataclasses.InitVar[float] = None, wl_low_pass: dataclasses.InitVar[float] = None)[source]

Bases: DadData

Data container for gradient HPLC-DAD data.

original_data: ndarray
class mocca.dad_data.models.ParafacData(impure_peak: dataclasses.InitVar['mocca.peak.models.CorrectedPeak'], parafac_comp_tensor: dataclasses.InitVar[tuple], boundaries: dataclasses.InitVar[tuple], shift: dataclasses.InitVar[int], y_offset: dataclasses.InitVar[float])[source]

Bases: object

Data container for synthetic data generated from PARAFAC models.

boundaries: dataclasses.InitVar[tuple]
impure_peak: dataclasses.InitVar['mocca.peak.models.CorrectedPeak']
parafac_comp_tensor: dataclasses.InitVar[tuple]
shift: dataclasses.InitVar[int]
y_offset: dataclasses.InitVar[float]

mocca.dad_data.process_funcs module

mocca.dad_data.process_funcs.get_peak_locs(summed_data)[source]

Finds all peaks of data.

Parameters:

summed_data (numpy.ndarray) – A 1D array representing the absorbances over time. Best used on data that already had data below threshold zeroed (see function filter_absorbance_by_threshold).

Returns:

peaks – List of all peaks, as a list of BasePeak classes

Return type:

list

mocca.dad_data.process_funcs.merge_peaks(summed_data, peaks)[source]

Merges overlapping peaks in the data.

Parameters:
  • summed_data (numpy.ndarray) – A 1D array representing the absorbances over time. Best used on data that already had data below threshold zeroed (see function filter_absorbance_by_threshold).

  • peaks (list) – List of all peaks as BasePeak objects

Returns:

new_peaks – List of all peaks in dictionary format with keys maximum, left, and right. Peaks that overlap are merged together into one BasePeak.

Return type:

list

mocca.dad_data.process_funcs.pick_peaks(compound_data, experiment, absorbance_threshold, peaks_high_pass, peaks_low_pass)[source]

Finds all peaks of data and returns them as a chromatogram

Parameters:
  • data (numpy.ndarray) – Actual experimental data with shape [# of wavelengths] x [timepoints]. Generated from dataframe with absorbance_to_array function

  • absorbance_threshold (float) – The threshold below which peaks will. In other words, at at least one (wavelength, timepoint) will have absorbance greater than absorbance_threshold in order to be counted as a peak.

  • peaks_high_pass (float) – Time high pass filter only using peaks with a retention time greater than the here given value for data analysis

  • peaks_low_pass (float) – Time low pass filter only using peaks with a retention time lower than the here given value for data analysis

  • expand_peaks (boolean) – If True, then peaks will be expanded to their peak boundaries. If this is set to False, then only timepoints with cumulative absorbance greater than absorbance_threshold will be counted as part of the peak.

Returns:

peaks – List of all peaks, as a list of tuples (maximum, left, right)

Return type:

list

mocca.dad_data.process_gradientdata module

Created on Wed Aug 4 15:44:47 2021

@author: haascp

mocca.dad_data.process_gradientdata.bsl_als(absorbance_array)[source]

Applies the baseline als algorithm row-wise (for every wavelength) on an absorbance array

Parameters:

absorbance_arry (numpy 2D-array) – Absorbance values obtained by an HPLC run (time, wavelength dimension).

Returns:

baseline_array – Baseline absorbance values

Return type:

numpy 2D-array

mocca.dad_data.process_gradientdata.bsl_als_alg(y, lam=100000.0, p=0.01, niter=3)[source]

Baseline correction algorithm: Optimized Python implementation of “Asymmetric Least Squares Smoothing” by P. Eilers and H. Boelens in 2005: https://stackoverflow.com/questions/29156532/python-baseline-correction-library, answer by Rustam Guliev.

Parameters:
  • y (list) – List of absorbance values for which the baseline should be determined.

  • lam (numeric, optional) – Smoothness parameter. 10^2 ≤ λ ≤ 10^9, but exceptions may occur. In any case one should vary λ on a grid that is approximately linear for log λ. Often visual inspection is sufficient for good values. The default is 1e6.

  • p (numeric, optional) – Asymmetry parameter. 0.001 ≤ p ≤ 0.1 (for a signal with positive peaks), but exceptions may occur. Often visual inspection is sufficient to get good parameter values. The default is 0.01.

  • niter (integer, optional) – To emphasize the basic simplicity of the algorithm, the number of iterations has been fixed to 10 (original documentation). In practical applications one should check whether the weights show any change; if not, convergence has been attained. The default is 3.

Returns:

z – Simulated baseline of the given absorbance data.

Return type:

list

mocca.dad_data.utils module

Created on Fri Dec 10 13:31:37 2021

@author: haascp

mocca.dad_data.utils.absorbance_to_array(df)[source]

Generates a 2D absorbance array of the absorbance values.

mocca.dad_data.utils.apply_filter(dataframe, wl_high_pass, wl_low_pass, bandwidth=2, reference_wl=True)[source]

Filters absorbance data of tidy 3D DAD dataframes to remove noise and background systematic error.

mocca.dad_data.utils.df_to_array(df)[source]

Takes a tidy dataframe of HPLC-DAD data and returns a numpy array of ” absorbance values as well as a vector for the time domain and a vector for ” the wavelength domain.

mocca.dad_data.utils.get_reference_signal(dataframe, bandwidth=5)[source]

Returns the averaged signal over the last number of wavelengths as given by the bandwidth.

mocca.dad_data.utils.sum_absorbance_by_time(data)[source]

Sums the absorbances for each time point over all wavelengths

Parameters:

data (numpy.ndarray) – Actual experimental data with shape [# of wavelengths] x [timepoints]. Generated from dataframe with absorbance_to_array function

Returns:

A 1D array containing the sum of wavelengths at each time point

Return type:

numpy.ndarray

mocca.dad_data.utils.trim_data(data, time, length)[source]

Trims the 2D DADData in the time dimension to the length provided.

Module contents