Functions

This main module contains all the functions available in spectrapepper. Please use the search function to look up specific functionalities and keywords.

functions.alsbaseline(y, lam=100, p=0.001, niter=10, remove=True)[source]

Calculation of the baseline using Asymmetric Least Squares Smoothing. This script only makes the calculation but it does not remove it. Original idea of this algorithm by P. Eilers and H. Boelens (2005):

Parameters:
  • y (list[float]) – Spectra to calculate the baseline from.

  • lam (int) – Lambda, smoothness. The default is 100.

  • p (float) – Asymmetry. The default is 0.001.

  • niter (int) – Niter. The default is 10.

  • remove (True) – If True, calculates and returns data - baseline. If False, then it returns the baseline.

Returns:

Returns the calculated baseline.

Return type:

list[float]

functions.areacalculator(y, x=None, limits=None, norm=False)[source]

Area calculator using the data (x_data) and the limits in position, not values.

Parameters:
  • y (list[float]) – Data to calculate area from.

  • x (list[float]) – X axis of the data.

  • limits (list[int]) – Limits that define the areas to be calculated. Axis value.

  • norm (bool) – If True, normalized the area to the sum under all the curve.

Returns:

A list of areas according to the requested limits.

Return type:

list[float]

functions.asymmetry(y, x, peak, s=5, limit=10)[source]

Compares both sides of a peak, or list of peaks, and checks how similar they are. It does this by calculating the MRSE and indicating which side is larger or smaller by area. If it is a negative (-), then left side is smaller.

Parameters:
  • y (list) – spectrocopic data to calculate the asymmetry from. Single vector or list of vectors.

  • x (list) – Axis of the data. If none, then the axis will be 0..N where N is the length of the spectra or spectras.

  • peak (float, list[float]) – Aproximate axis value of the position of the peak. If sinlge peak then a float is needed. If many peaks are requierd then a list of them.

  • s (int) – Shift to sides to check real peak. The default is 5.

  • limit (int) – Comparison limits to each side of the peak. Default is 10.

Returns:

R2 value comparing both sides of the peak and a sign to tell if either left side is smaller (negative) or tight side is smaller (positive, no sign included).

Return type:

list[float]

functions.autocorrelation(y, x, lag, lims=None, normalization=False, average=False)[source]

Calculates the correlation of a signal with a delayed copy of itself as a function of the lag. In other words, it calculates the PEarsoon coefficient for a vector with itself shifted by lag positions.

Parameters:
  • y (list[float]) – Data to fit. Single or list of vectors.

  • x (list[float]) – x axis.

  • lag (int) – positions shifetd to be analyzed, not in x-axis values.

  • lims (list[float, float]) – limits of the data to autocorrelate. Defauilt is None and will analyze with all the vector ([0:len(y)]).

  • normalization (boolean) – divides the autocorrelation value by the product of the standard deviation of the series in t and t+lag.

  • average (boolean) – divides the autocorrelation value by the length of the signal.

Returns:

autocorrelation values, either in a list or single value.

Return type:

float, list[float]

functions.avg(y)[source]

Calculates the average vector from a list of vectors.

Parameters:

y (list[float]) – List of vectors.

Returns:

The average of the vectors in the list.

Return type:

list[float]

functions.bincombs(n, s_min=1, s_max=0)[source]

Returns all possible unique combinations.

Parameters:
  • n (int) – Amount of positions.

  • s_min (int) – Minimum amount of 1s.

  • s_max (int) – Maximum amount of 1s.

Returns:

List of unique combinations.

Return type:

list[tuple]

functions.bspbaseline(y, x, points, avg=5, remove=True, plot=False)[source]

Calcuates the baseline using b-spline.

Parameters:
  • y (list[float]) – Single or several spectras to remove the baseline from.

  • x (list[float]) – x axis of the data, to interpolate the baseline function.

  • points (list[float], list[[float, float]]) – Axis values of points to calculate the bspline. Axis ranges are also acepted. In this case, the avg value will be 0.

  • avg (int) – Points to each side to make average. Default is 5. If points are axis ranges, then it is set to 0 and will not have any effect.

  • remove (True) – If True, calculates and returns data - baseline. If False, then it returns the baseline.

  • plot (bool) – If True, calculates and returns (data - baseline).

Returns:

The baseline.

Return type:

list[float]

functions.classify(data, gnumber=3, glimits=[], var='x')[source]

Classifies targets according to either defined limits or number of groups. The latter depends on the defined parameters.

Parameters:
  • data (list[float]) – Vector with target values to be classified.

  • gnumber (int) – Number of groups to create. The default is 3 and is the default technique.

  • glimits (list[float]) – Defined group limits. The default is [].

  • glimits – Name of the variable that is being classified.

Returns:

Vector with the classification from 0 to N. A list with strings with the name of the groups, useful for plotting.

Return type:

list[list,list]

functions.cmscore(x_points, y_points, target)[source]

Calculates the distance between points and the center of mass (CM) of clusters and sets the prediction to the closest CM. This score may be higher or lower than the algorithm score.

Parameters:
  • x_points (list[float]) – Coordinates of x-axis.

  • y_points (list[float]) – Coordinates of y-axis.

  • target (list[int]) – Targets of each point.

Returns:

Score by comparing CM distances. Prediction using CM distances. X-axis coords of ths CMs. Y-axis coords of the Cms.

Return type:

list[float, list[int],list[float],list[float]]

functions.confusionmatrix(tt, tp, gn=None, plot=False, save=False, title='', ndw=True, cmm='Blues', fontsize=20, figsize=(12, 15), ylabel='True', xlabel='Prediction', filename='cm.png', rotation=(45, 0))[source]

Calculates and/or plots the confusion matrix for machine learning algorithm results.

Parameters:
  • tt (list[float]) – Real targets.

  • tp (list[float]) – Predicted targets.

  • plot (boolean) – If true, plots the matrix. The default is False

  • gn (list[str]) – Names, or lables , of the classification groups. Default is None.

  • ndw (bool) – No data warning. If True, it warns about no havingdata to evaluate. Default is True.

  • title (str) – Name of the matrix. Default is empty.

  • cmm (str) – Nam eof the colormap (matplotlib) to use for the plot. Default is Blue.

  • fontsize (int) – Font size for the labels. The default is 20.

  • figsize (Tuple) – Size of the image. The default is (12, 15).

  • ylabel (str) – Label for y axis. Default is True.

  • xlabel (str) – Label for x axis. Default is Prediction.

Returns:

The confusion matrix

Return type:

list[float]

functions.cosmicdd(y, th=100, asy=0.6745, m=5)[source]

It identifies CRs by detrended differences, the differences between a value and the next (D. A. Whitaker and K. Hayes, https://doi.org/10.1016/j.chemolab.2018.06.009).

Parameters:
  • y (list[float]) – List of spectras to remove cosmic rays.

  • th (float) – Factor to modify the criteria to identify a cosmic ray.

  • asy (float) – Asymptotic bias correction

  • m (float) – Number of neighbor values to use for average.

Returns:

Data with removed cosmic rays.

Return type:

list[float]

functions.cosmicmed(y, sigma=1.5)[source]

Precise cosmic ray elimination for measurements of the same point or very similar spectras.

Parameters:
  • y (list[float]) – List of spectras to remove cosmic rays.

  • sigma (float) – Factor to modify the criteria to identify a cosmic ray. Multiplies the median of each bin.

Returns:

Data with removed cosmic rays.

Return type:

list[float]

functions.cosmicmp(y, alpha=1, avg=2)[source]

It identifies CRs by comparing similar spectras and paring in matching pairs. Uses randomnes of CRs (S. J. Barton, B. M. Hennelly, https://doi.org/10.1177/0003702819839098)

Parameters:
  • y (list[float]) – List of spectras to remove cosmic rays.

  • alpha (float) – Factor to modify the criteria to identify a cosmic ray.

  • avg (int) – Moving average window.

Returns:

Data with removed cosmic rays.

Return type:

list[float]

functions.count(y, value=0)[source]

Counts the number of values that coincide with value.

Parameters:
  • y (list[float]) – vector or list of vectors to search for values.

  • value (float) – Value or list of values to search in y. If many values are passed, then it returns a list with the counted equalities. Default is 0.

Return type:

list[float]

Returns:

Reversed data.

functions.crosscorrelation(y1, y2, lag, x=None, lims=None, normalization=False, average=False)[source]

Calculates the correlation of a signal with a delayed copy of a second signal in function of the lag. In other words, measures the similarity between two series that are displaced relative to each other.

Parameters:
  • y (list[float]) – Data to fit. Single or list of vectors.

  • x (list[float]) – x axis.

  • lag (int) – positions shifetd to be analyzed, not in x-axis values.

  • lims (list[float, float]) – limits of the data to autocorrelate. Defauilt is None and will analyze with all the vector ([0:len(y)]).

  • normalization (boolean) – divides the cross-correlation value by the product of the standard deviation of the series in t and t+lag.

  • average (boolean) – divides the cross-correlation value by the length of the signals.

Returns:

autocorrelation values, either in a list or single value.

Return type:

float, list[float]

functions.decbound(x_points, y_points, groups, limits=None, divs=0.5)[source]

Calculates the Decision Boundaries.

Parameters:
  • x_points (list[float]) – X coordinates of each point.

  • y_points (list[float]) – Y coordinates of each point.

  • groups (list[int]) – List of targets for each point.

  • limits (list[float]) – Plot and calculation limits. The default is ‘None’.

  • divs (float) – Resolution. The default is 0.01.

Returns:

The decision boundaries.

Return type:

list[float]

functions.decdensity(x, y, groups, limits=None, divs=0.5, th=2)[source]

Calculates the density decision map from a cluster mapping.

Parameters:
  • x (list[float]) – X coordinates of each point.

  • y (list[float]) – Y coordinates of each point.

  • groups (list[int]) – List of targets for each point.

  • limits (list[float]) – Plot and calculation limits. The default is ‘None’.

  • divs (float) – Resolution ti calculate density. The default is 0.5.

  • th (int) – Threshold from where a area is defined as a certain group.

Returns:

The density decision map.

Return type:

list[float]

functions.deconvolution(y, x, pos, method='gauss', shift=5, look=None, pp=False)[source]
BETA: Deconvolutes a spectra into a defined number of distributions. This will

fit dsitributions on the declared positions pos of peaks and change their shape acoording to the difference betwwen the spectra and the sum (convoilution) of the fittings.

Parameters:
  • y (list[float]) – vector to deconvolute.

  • x (list[float]) – axis.

  • pos (list[float]) – positions of peaks in x values

  • method (string) – The selected shape of the fittings. Options include: gauss, lorentz, and voigt.

  • pp (boolean) – If True, it prints parameters of the fittings. Default is False.

Return type:

list[list[float]]

Returns:

2-D list of the fittings.

functions.derivative(y, x=None, s=1, deg=1)[source]

Calculates the derivative function of a vector. It does so by calculating the slope on a point using the position of the neighboring points.

Parameters:
  • y (list[float]) – Data to fit. Single or list of vectors.

  • x (list[float]) – x axis. If None, an axis will be created from 1..N, where N is the length of y. Default is `None.

  • s (int) – size of the range to each side of the point to use to calculate the slope. Default is 1.

  • deg (int) – degree of the derivative. That is, number of timess the vector will be derivated. In other words, if deg=2, will calculate the second derivative. Default is 1.

Returns:

derivative values, either in a list or single value.

Return type:

float, list[float]

functions.evalgrau(data)[source]

This function evaluates the MSE in 3 dimensions (x,y,z) for a set of data vectors.

Parameters:

data (list[float]) – A list of lists of variables to compare.

Returns:

A list with each combination and the R2 score obtained.

Return type:

list[float]

functions.fwhm(y, x, peaks, alpha=0.5, s=10)[source]

Calculates the Full Width Half Maximum of specific peak or list of peaks for a single or multiple spectras.

Parameters:
  • y (list) – spectrocopic data to calculate the fwhm from. Single vector or list of vectors.

  • x (list) – Axis of the data. If none, then the axis will be 0..N where N is the length of the spectra or spectras.

  • peaks (float or list[float]) – Aproximate axis value of the position of the peak. If single peak then a float is needed. If many peaks are requierd then a list of them.

  • alpha (float) – multiplier of maximum value to find width ratio. Default is 0.5 which makes this a full width half maximum. If alpha=0.25, it would basically find the full width quarter maximum. alpha should be ´0 < alpha < 1´. Default is 0.5.

  • s (int) – Shift to sides to check real peak. The default is 10.

  • interpolate (boolean) – If True, will interpolte according to step and s.

Returns:

A list, or single float value, of the fwhm.

Return type:

float or list[float]

functions.gaussfit(y=[0], x=[0], pos=0, look=5, shift=2, sigma=4.4, alpha=1, manual=False, params=False)[source]

Fits peak as an optimization problem or manual fit. A curve y is only mandatory if the optimixzation is needed (manual=False, default). If no axis ‘ax’ is defined, then a default axis is generated for both options.

Parameters:
  • y (list[float]) – Data to fit. Single vector.

  • x (list[float]) – x axis.

  • pos (int) – Peak index to fit to.

  • look (int) – axis positions to look to each side in axis units. The default is 5.

  • shift (int) – Possible axis shift of the peak in axis units. The default is 2.

  • sigma (float) – Sigma value for Gaussian fit. The default is 4.4.

  • alpha (float) – Multiplier of the fitting and initial value of the optimizer when manual=False. The maximum value fo the fitting is proportional to this value, but is not necesarly its value. The default is 1.

  • manual (boolean) – If True, 1 curve will be generated using the declared parameter sigma and perform a manual fit. Default is False.

  • params (boolean) – If True, it returns the parameters of the fit in the order of alpha, sigma and peak position. Default is False.

Returns:

Fitted curve.

Return type:

list[float]

functions.grau(data, labels=[], cm='seismic', fons=20, figs=(25, 15), tfs=25, ti='Grau (Beta)', marker='s', marks=100, plot=True)[source]

Performs Grau correlation matrix and plots it.

Parameters:
  • data (list[float]) – Data to be correlated.

  • labels (list[str]) – Labels of the data to be ploted.

  • cm (str) – Color map for the plot, from matplotlib. The default is “seismic”.

  • fons (int, optional) – Plot font size. The default is 20.

  • figs (tuple) – Figure size. The default is (25,15).

  • tfs (int) – Plot title font size. The default is 25.

  • ti (str) – Plot title. The default is “Grau (Beta)”.

  • marker (str) – Plot marker type (scatter). The default is “s”.

  • plot (bool) – If True plots the matrix. The default is True.

  • marks (int) – Marker size. The default is 100.

Returns:

Grau plot in a 2d list.

Return type:

list[float]

functions.groupscores(all_targets, used_targets, predicted_targets)[source]

Calculates the individual scores for a ML algorithm (i.e.: LDA, PCA, etc).

Parameters:
  • all_targets (list[int]) – List of all real targets (making sure all groups are here).

  • used_targets (list[int]) – Targets to score on.

  • predicted_targets (list[int]) – Prediction of used_targets.

Returns:

List of scores for each group.

Return type:

list[float]

functions.interpolation(y, x, step=1, start=0, finish=0)[source]

Interpolates data to a new axis.

Parameters:
  • y (list[float]) – List of data to interpolate

  • x (list[list[float]]) – list of the axises of the data

  • step (float) – new step for the new axis

Returns:

Interpolated data and the new axis

Return type:

list[float], list[float]

functions.intersections(y1, y2)[source]

Find approximate intersection points of two curves.

Parameters:
  • y1 (list[float]) – first curve.

  • y2 (list[float]) – second curve.

Return type:

list[list[float]]

Returns:

coordinates of aproiximate intersecctions.

functions.isaxis(y)[source]

Detects if there is an axis in the data.

Parameters:

data (list[float]) – Data containing spectras an possible axis.

Returns:

True if there is axis.

Return type:

bool

functions.issinglevalue(y)[source]

Checks if a vector, or a list of vectors, is composed of the same value (single value vector). Not to be confused with a single element vector were the length of the vector is 1.

Parameters:

y (list) – Vector, or list of vectors, that needs to be checked.

Returns:

True if contains the same value. False if there are different. If y is a list of vectors then it returns a list of booleans with the respective answer.

Return type:

bool

functions.load(file, fromline=0, toline=None, line=None, transpose=False, dtype=<class 'float'>, separators=[';'], blanks=['NaN', 'nan', '--'], replacewith='0')[source]

Load data from a text file obtained from LabSpec and other spectroscopy software. Normally, single measurements come in columns with the first one being the x-axis. When it is a mapping, the first row is the x-axis and the following are the measurements. It can also peroform random access to a particular ´line´ and also it is possibel to define specific range to load with ‘fromline’ and ‘toline’.

Parameters:
  • file (str) – Url of data file. Must not have headers and separated by ‘spaces’ (LabSpec).

  • fromline (int) – Line of file from which to start loading data. The default is 0.

  • fromline – Line of file to which to end loading data. The default is ‘None’ which idicates the full range of data.

  • line (int) – Random access to file. Loads a specific line in a file. Default is ´None´.

  • transpose (boolean) – If True transposes the data. Default is False.

  • dtype (str) – Type of data. If its numeric then ‘float’, if text then ‘string’. Default is ‘float’.

Returns:

List of the data.

Return type:

list[float]

functions.lorentzfit(y=[0], x=[0], pos=0, look=5, shift=2, gamma=5, alpha=1, manual=False)[source]

Fits peak as an optimization problem or manual fit for Lorentz distirbution, also known as Cauchy. A curve y is only mandatory if the optimixzation is needed (manual=False, default). If no axis ‘x’ is defined, then a default axis is generated for both options.

Parameters:
  • y (list[float]) – Data to fit. Single vector.

  • x (list[float]) – x axis.

  • pos (int) – X axis position of the peak.

  • look (int) – axis positions to look to each side in axis units. The default is 5.

  • shift (int) – Possible axis shift of the peak in axis units. The default is 2.

  • gamma (float) – Lorentz fit parameter. The default is 5.

  • alpha (float) – Multiplier of the fitting. The maximum value fo the fitting is proportional to this value, but is not necesarly its value. The default is 1.

  • manual (boolean) – If True, 1 curve will be generated using the declared parameter gamma and perform a manual fit. Default is False.

Returns:

Fitted curve.

Return type:

list[float]

functions.lowpass(y, cutoff=0.25, fs=30, order=2, nyq=0.75)[source]

Butter low pass filter for a single or spectra or a list of them.

Parameters:
  • y (list[float]) – List of vectors in line format (each line is a vector).

  • cutoff (float) – Desired cutoff frequency of the filter. The default is 0.25.

  • fs (int) – Sample rate in Hz. The default is 30.

  • order (int) – Sin wave can be approx represented as quadratic. The default is 2.

  • nyq (float) – Nyquist frequency, 0.75*fs is a good value to start. The default is 0.75*30.

Returns:

Filtered data

Return type:

list[float]

functions.mahalanobis(v)[source]

Calculates the Mahalanobis distance for a groups of vectors to the center of mass, or average coordinates.

Parameters:

v (list) – vectors to calculate the distance

Returns:

List of the respectve distances.

Return type:

list

functions.makeaxisdivs(start, finish, divs, rounded=-1)[source]

Creates an axis, or vector, from ‘start’ to ‘finish’ with ‘divs’ divisions.

Parameters:
  • start (float) – First value of the axis.

  • finish (float) – Last value of the axis.

  • divs (int) – Number of divisions

  • rounded (int) – Number of decimals to consider. If -1 then no rounding is performed.

Returns:

Axis with the set parameters.

Return type:

list[float]

functions.makeaxisstep(start=0, step=1.0, length=1000, adjust=False, rounded=-1)[source]

Creates an axis, or vector, from ‘start’ with bins of length ‘step’ for a distance of ‘length’.

Parameters:
  • start (float) – first value of the axis. Default is 0.

  • step (float) – Step size for the axis. Default is 1.00.

  • length (int) – LEngth of axis. Default is 1000.

  • adjust (boolean) – If True, rounds (adjusts) the deimals points to the same as the step has. Default is False.

  • rounded (int) – Number of decimals to consider. If -1 then no rounding is performed.

Returns:

Axis with the set parameters.

Return type:

list[float]

functions.mdscore(x_p, y_p, tar)[source]

Calculates the distance between points and the median center (MD) of clusters and sets the prediction to the closest MD. This score may be higher or lower than the algorithm score.

Parameters:
  • x_p (list[float]) – Coordinates of x-axis.

  • y_p (list[float]) – Coordinates of y-axis.

  • tar (list[int]) – Targets of each point.

Returns:

Score by comparing MD distances. Prediction using MD distances. X-axis coords of ths CMs. Y-axis coords of the Cms.

Return type:

list[float, list[int],list[float],list[float]]

functions.median(y)[source]

Calculates the median vector of a list of vectors.

Parameters:

y (list[float]) – List of vectors.

Returns:

median curve

Return type:

list[float]

functions.mergedata(data)[source]

Merges data, it can merge large vectors. Useful to merge features before performing ML algorithms.

Parameters:

data (list[list[float]]) – List of arrays.

Returns:

List with the merged data.

Return type:

list[float]

functions.minmax(y)[source]

Calculates the vectors that contain the minimum and maximum values of each bin from a list of vectors.

Parameters:

y (list) – List of vectors to calculate the minimum and maximum vectors.

Returns:

minimum and maximum vectors.

Return type:

list[float]

functions.moveavg(y, move=2)[source]

Calculate the moving average of a single or multiple vectors.

Parameters:
  • y (list[float]) – Data to calculate the moving average. Single or multiple vectors.

  • move (int) – Average range to each side (total average = move + 1).

Returns:

Smoothed vector(s).

Return type:

list[float]

functions.normsum(y, x=None, lims=None)[source]

Normalizes the sum under the curve to 1, for single or multiple spectras.

Parameters:
  • y (list[float]) – Single spectra or a list of them to normalize.

  • lims (list[float, float]) – Limits of the vector to normalize the sum. Default is None. For example, if ´lims = [N, M]´, the the sum under the curve betwween N and M is normalized to 1.

Returns:

Normalized data

Return type:

list[float]

functions.normtoglobalmax(y, globalmin=False)[source]

Normalizes a list of spectras to the global max.

Parameters:
  • y (list[float]) – List of spectras.

  • globalmin (Bool) – If True, the global minimum is reescaled to 0. Default is False.

Returns:

Normalized data

Return type:

list[float]

functions.normtoglobalsum(y)[source]
Normalizes a list of spectras to the global max sum under the curve. In

other words, looks to the largest sum under the curve and sets it to 1, then the other areas are scaled in relation to that one.

Parameters:

y (list[float]) – List of spectras.

Returns:

Normalized data

Return type:

list[float]

functions.normtomax(y, to=1, zeromin=False)[source]

Normalizes spectra to the maximum value of each, in other words, the maximum value of each spectra is set to the specified value.

Parameters:
  • y (list[float], numpy.ndarray) – Single or multiple vectors to normalize.

  • to (float) – value to which normalize to. Default is 1.

  • zeromin (boolean) – If True, the minimum value is translated to 0. Default value is False

Returns:

Normalized data.

Return type:

list[float], numpy.ndarray

functions.normtopeak(y, x, peak, shift=10)[source]

Normalizes the spectras to a particular peak.

Parameters:
  • y (list[float]) – Data to be normalized.

  • x (list[float]) – x axis of the data

  • peak (float) – Peak position in x-axis values.

  • shift (int) – Range to look for the real peak. The default is 10.

Returns:

Normalized data.

Return type:

list[float]

functions.normtoratio(y, r1, r2, x=None)[source]

Normalizes a peak to the ratio value respect to another. That is, the peak found in the range of r1 is normalized to the ratio r1/(r1+r2).

Parameters:
  • y (list[float]) – Single spectra or a list of them.

  • r1 (list[float, float]) – Range of the first area according to the axis.

  • r2 (list[float, float]) – Range of the second area according to the axis.

  • x (list[float]) – Axis of the data. If None then it goes from 0 to N, where N is the length of the spectras.

Returns:

Normalized data

Return type:

list[float]

functions.normtovalue(y, val)[source]

Normalizes the spectras to a set value, in other words, the defined value will be reescaled to 1 in all the spectras.

Parameters:
  • y (list[float]) – Single or multiple vectors to normalize.

  • val (float) – Value to normalize to.

Returns:

Normalized data.

Return type:

list[float]

functions.peakfinder(y, x=None, ranges=None, look=10)[source]

Finds the location of the peaks in a single vector.

Parameters:
  • y (list[float]) – Data to find a peak in. Single spectra.

  • x (list[float]) – X axis of the data. If no axis is passed then the axis goes from 0 to N, where N is the length of the spectras. Default is None.

  • ranges (list[[float, float]]) – Aproximate ranges of known peaks, if any. If no ranges are known or defined, it will return all the peaks that comply with the look criteria. If ranges are defined, it wont use the look criteria, but just for the absolute maximum within the range. Default is None.

  • look (int) – Amount of position to each side to decide if it is a local maximum. The default is 10.

Returns:

A list of the index of the peaks found.

Return type:

list[int]

functions.peaksimilarity(y1, y2, p1, p2, n=5, x=None, plot=False, cmm='inferno', fontsize=10, title='Peak similarity')[source]

Calculates the similarity matrix as described in ´doi:10.1142/S021972001350011X´. It quantifyes the difference of the derivative function of the peaks and puts them into a matrix. This can be used for peak alignment.

Parameters:
  • y1 (list[float]) – First vector to compare.

  • y2 (list[float]) – Second vector to compare.

  • p1 (list[float]) – List of peaks to compare from y1. It can be different in positions and length than p2.

  • p2 (list[float]) – List of peaks to compare from y2. It can be different in positions and length than p1.

  • x (list[float]) – x axis. If ´None´, an axis will be created from 1..N, where N is the length of ´y1´. Default is `None.

  • n (int) – size of the range to each side of the point to use to calculate the slope. Default is 5.

  • plot (boolean) – If true, plots the similarity matrix. Default is ´False´.

  • cmm (string) – Color map for the plot. Default is ´inferno´.

  • fontsize (int) – Size of the font in the plot. Default is ´10´.

  • title (string) – Title of the plot. Default is ´´.

Returns:

similarity matrix in the form of a 2-d list.

Return type:

list[float]

functions.pearson(data, labels=[], cm='seismic', fons=20, figs=(20, 17), tfs=25, ti='Pearson', plot=True)[source]

Calculates Pearson matrix and plots it.

Parameters:
  • data (list[float]) – Data to correlate.

  • labels (list[str]) – Labels of the data.

  • cm (str) – Color map as for matplolib. The default is “seismic”.

  • fons (int) – Plot font size. The default is 20.

  • figs (tuple) – Plot size. The default is (20,17).

  • tfs (int) – Title font size. The default is 25.

  • plot (bool) – If True plots the matrix. The default is True.

  • ti (str) – Plot title/name. The default is “spearman”.

Returns:

Pearson plot in a 2d list.

Return type:

list[float]

functions.plot2dml(train, test=[], names=['D1', 'D2', 'T'], train_pred=[], test_pred=[], labels=[], title='', xax='x', yax='y', fs=15, lfs=10, loc='best', size=20, xlim=[], ylim=[], plot=True)[source]

Plots 2-dimensional results from LDA, PCA, NCA, or similar machine learning algoruthms where the output has 2 features per sample.

Parameters:
  • train (pandas frame) – Results for the training set. Pandas frame with the 2 dimensions and target columns.

  • test (pandas frame) – Results for the test set. Pandas frame with the 2 dimensions and target columns.

  • names (list[str]) – Name of the lables in the dataframe. For example, for LDA: D1, D2 and T.

  • train_pred (list) – Prediction of the training set.

  • test_pred (list) – Prediction of the test set.

  • labels (list) – Names for the classification groups, if any.

  • title (str) – Title of the plot.

  • xax (str) – Name ox x-axis

  • yax (str) – Name of y-axis

  • lfs (int) – Legend font size. Default is 15.

  • loc (str) – Location of legend. Default is best.

  • size (int) – Size of the markers. Default is 20.

  • xlim (list) – Limits of the x axis.

  • ylim (list) – Limits of the y axis.

  • plot (bool) – If True it plot. Only for test purposes.

Returns:

Plot

functions.polybaseline(y, axis, points, deg=2, avg=5, remove=True, plot=False)[source]

Calcuates the baseline using polynomial fit.

Parameters:
  • y (list[float]) – Single or several spectras to remove the baseline from.

  • axis (list[float]) – x axis of the data, to interpolate the baseline function.

  • points (list[int]) – positions in axis of points to calculate baseline.

  • deg (int) – Polynomial degree of the fit.

  • avg (int) – points to each side to make average.

  • remove (True) – if True, calculates and returns (y - baseline).

  • plot (bool) – if True, calculates and returns (y - baseline).

Returns:

The baseline.

Return type:

list[float]

functions.regression(target, variable, cov=0)[source]

Performs an N dimensional regression.

Parameters:
  • target (list[float]) – Y-axis values, values to predict.

  • variable (list[float]) – X-axis values.

  • cov (int) – If 1 is regression with covariance, like spearman. The default is 0.

Returns:

Prediction of the fitting and the Fitting parameters.

Return type:

list[list[float],list[float]]

functions.representative(y)[source]

Looks for the representative spectra of a dataset. In other words, it calculates the median spectra and looks for the one that is closer to it in relation to standard deviation. Do not confuse with a Typical spectra.

Parameters:

y (list[list[float]]) – A list of spectras.

Returns:

The representative spectra of the set.

Return type:

list[float]

functions.reverse(y)[source]

Reverses a vector or a list of vectors.

Parameters:

y (list[float]) – list of vectors to reverse. Single or multiple.

Return type:

list[float]

Returns:

Reversed data.

functions.rwm(y, ws, plot=False)[source]

Computes the median of an array in a running window defined by the user. The window size needs to be a 2D tuple or list, with the first element being the length of the spectra and the second the total width that will be taken into account for the statistics.

Parameters:
  • y (numpy array) – The spectras

  • ws (tuple/list) – Window size parameters

  • plot (boolean) – If you want to plot the new spectra change to True

Returns:

Array containing the computed 1D spectra.

Return type:

numpy array[float]

functions.sdev(y)[source]

Calculates the standard deviation for each bin from a list of vectors.

Parameters:

y (list[float]) – List of vectors.

Returns:

Standard deviation curve

Return type:

list[float]

functions.shiftref(ref_data, ref_axis, ref_peak=520, mode=1, plot=True)[source]

Shifts the x-axis according to a shift calculated prior.

Parameters:
  • ref_data (list[float]) – Reference measurement.

  • ref_axis (list[float]) – X-axis of the reference measurement.

  • ref_peak (float) – Where the reference peak should be in x-axis values. The default is 520 (Raman Si).

  • mode (int) – Fitting method, Lorentz, Gaussian, or none (1,2,3). The default is 1.

  • plot (bool) – If True plots a visual aid. The default is True.

Returns:

Shift amount

Return type:

float

functions.shuffle(arrays, delratio=0)[source]

Merges and shuffles data and then separates it so it is shuffles together.

Parameters:
  • arrays (list[list[float]]) – List of arrays of data.

  • delratio (float) – Ratio of the data to be deleted, 0 < delratio < 1

Returns:

List of the shuffled arrays.

Return type:

list[list[float]]

functions.spearman(data, labels=[], cm='seismic', fons=20, figs=(20, 17), tfs=25, ti='Spearman', plot=True)[source]

Calculates Pearson matrix and plots it.

Parameters:
  • data (list[float]) – Data to correlate.

  • labels (list[str]) – Labels of the data.

  • cm (str) – Color map as for matplolib. The default is “seismic”.

  • fons (int) – Plot font size. The default is 20.

  • figs (tuple) – Plot size. The default is (20,17).

  • tfs (int) – Title font size. The default is 25.

  • plot (bool) – If True plots the matrix. The default is True.

  • ti (str) – Plot title/name. The default is “spearman”.

Returns:

Spearman plot in a 2d list.

Return type:

list[float]

functions.stackplot(y, offset, order=None, xlabel='', ylabel='', title='', cmap='Spectral', figsize=(6, 9), fs=20, lw=1, xlimits=None, plot=True)[source]

Plots a stack plot of selected spectras.

Parameters:
  • y (list[float]) – Data to plot. Must be more than 1 vector.

  • offset (float) – displacement, or difference, between each curve.

  • order (list[int]) – Order of the curves in which they are plotted. If None, is the order as they appear in the list.

  • xlabel (str) – Label of axis.

  • ylabel (str) – Label of axis.

  • title (str) – Title of the plot.

  • cmap (str) – Colormap, according to matplotlib options.

  • figsize (tuple) – Size of the plot. Default is (3, 4.5)

  • fs (float) – Font size. Default is 20.

  • lw (float) – Linewidth of the curves.

  • xlim (list[float]) – Plot limits for x-axis. If None it plots all.

  • plot (bool) – If True it plot. Only for test purposes.

Returns:

plot

Return type:

bool

functions.studentfit(y=[0], x=[0], pos=0, v=0.01, alpha=1, look=5, shift=2, manual=False)[source]

Fits peak as an optimization problem.

Parameters:
  • y (list[float]) – Data to fit. Single vector.

  • x (list[float]) – x axis.

  • pos (int) – Peak position to fit, in axis values.

  • v (float) – Student fit parameter. The default is 0.01.

  • alpha (float) – Multiplier of the fitting. The maximum value fo the fitting is proportional to this value, but is not necesarly its value. The default is 1.

  • look (int) – axis positions to look to each side in axis units. The default is 5.

  • shift (int) – Possible axis shift of the peak in axis units. The default is 2.

  • manual (boolean) – If True, 1 curve will be generated using the declared parameter sigma and perform a manual fit. Default is False.

Returns:

Fitted curve.

Return type:

list[float]

functions.subtractref(data, ref, axis=0, alpha=0.9, sample=0, lims=[0, 0], plot=False)[source]

Subtracts a reference spectra from the measurements.

Parameters:
  • data (list[float]) – List of or single vector.

  • ref (list[float]) – reference data to remove.

  • axis (list[float]) – Axis for both ´data´ and ‘ref’, only for plotting purposes.

  • alpha (float) – Manual multiplier. The default is 0.9.

  • sample (int) – Sample spectra to work with. The default is 0.

  • lims (list[int]) – Limits of the plot.

  • plot (bool) – To plot or not a visual aid. The default is True.

Returns:

Data with the subtracted reference.

Return type:

list[float]

functions.trim(y, start=0, finish=0)[source]

Deletes columns in a list from start to finish.

Parameters:
  • y (list) – Data to be trimmed.

  • start (int, optional) – Poistion of the starting point. The default is 0.

  • finish (int, optional) – Position of the ending point (not included). The default is 0.

Returns:

Trimmed data.

Return type:

list[]

functions.typical(y)[source]

Looks for the typical spectra of a dataset. In other words, it calculates the average spectra and looks for the one that is closer to it in relation to standard deviation. Do not confuse with a Representative spectra.

Parameters:

y (list[list[float]]) – A list of spectras.

Returns:

The typical spectra of the set.

Return type:

list[float]

functions.valtoind(vals, x)[source]

To translate the value in an axis to its index in the axis, basically searches for the position of the value. It approximates to the closest.

Parameters:
  • vals (list[float]) – List of values to be searched and translated.

  • x (list[float]) – Axis.

Returns:

Index, or position, in the axis of the values in vals

Return type:

list[int], int

functions.vectortoimg(y, negatives='remove', inverted=False)[source]
Transforms 1-D vecctors into an image. Useful for more insightful results

when performing deep learning.

Parameters:
  • y (list[float]) – vector or list of vectors.

  • negatives (str) – Specify what to do with negative values. If remove then negative values are set to 0. If globalmin then all the data is shifted up so the global minimum is 0. Default it remove.

  • inverted (boolean) – If True, inverts the process to take an image to a vector. Default is False.

Return type:

list[list[list[int]]]

Returns:

3-D list containing 0 or 1.

functions.voigtfit(y=None, x=None, pos=0, look=5, shift=2, gamma=5, sigma=4, alpha=1, manual=False)[source]

Fits peak as an optimization problem or manual fit for Voigt distirbution, also known as a convoluted Gaussian-Lorentz curve. A curve y is only needed for the optimization (manual=False, default). If no axis x is defined, then a default axis is generated for both options. It is reccomended to Normalize the data before fitting.

Parameters:
  • y (list[float]) – Data to fit. Single vector.

  • x (list[float]) – x axis.

  • pos (int) – X axis position of the peak.

  • look (int) – axis positions to look to each side in axis units. The default is 5.

  • shift (int) – Possible axis shift of the peak in axis units. The default is 2.

  • gamma (float) – Initial value of fit. The default is 5.

  • sigma (float) – Initial value of fit. The default is 4.

  • alpha (float) – Multiplier of the fitting. The maximum value fo the fitting is proportional to this value, but is not necesarly its value. The default is 1.

  • manual (boolean) – If True, 1 curve will be generated using the declared parameter gamma and perform a manual fit. Default is False.

Returns:

Fitted curve.

Return type:

list[float]