Orange-Spectroscopy documentation¶
Orange-Spectroscopy is an add-on for Orange for the analysis of spectral data.
To use it, download and install the Quasar distribution of Orange, which comes with Orange-Spectroscopy pre-installed. You can also install it into your Orange data mining suite with the "Add-ons" menu (Options->Add-ons).
For an introduction to Orange and widgets from this add-on, see the following YouTube channels:
- Getting started with Orange - introduces data analysis with Orange
- Spectral Orange - tutorials that use the Spectroscopy add-on on spectral data
Widgets¶
Spectra¶
Visually explore series of spectra with no spatial information.
Inputs
- Data: input dataset
- Data Subset:subset of the data
Outputs
- Selection: selected spectra
The Spectra widget allows visual exploration of multiple spectra. To output some spectra, select them by clicking. For multiple selection, hold the modifier key (Ctrl or Cmd) or use line selection (see the plot options menu). Selected spectra will appear dashed.
- Open the plot options menu
- A spectrum
- The X and Y position of the cursor
- The legend (appears only is spectra are colored)
Navigation
- Click + drag: move the plot
- Right-click: zoom to fit
- Right-click + drag: zoom with mouse movement
- Scroll: zoom X axis
- Scroll + modifier: zoom Y axis
Plot options
- Resample curves (R): resample the displayed a subset (only a subset is displayed for performance)
- Resampling reset (Mod + R): resample to the default view
- Zoom in (Z): zoom to a region (selected afterwards)
- Zoom to fit (Backspace): return to the original plot
- Rescale Y to fit (D): rescale the Y axis to fit the screen (useful if zoomed-in)
- Show averages (A): show the average and standard deviation (per group)
- Show grid (G): show the grid for a better inspection of the plot
- Invert X (X): invert the order of the X axis
- Add Peak Label (P) : Add an adjustable vertical line with label. Remove lines with a right click.
- Select (line) (S): select the spectra touching a line (draw a line with a mouse)
- Save graph (Mod + S): export the visualization to an imags
- Define view range: define a specific range to display
- Color by: a categorical feature for coloring
- Title, X-axis, Y-axis: annotate the plot
Example¶
The Spectra widget is used to visualize spectral data. X axis normally shows wavenumbers, while the Y axis shows the absorbance. We will plot the Liver spectroscopy data from the Datasets widget as an example.
We have used Color by option to display the type of each spectrum. Or you can also press 'C' and the plot will show colors. Colors are defined with the data; to change colors, use the Color widget.
Now, let's say I am interested in those spectra, that are quite separated from the rest at wavenumber around 1027. I will press 'S' and drag a line. This will select the spectra under the line I have dragged.
I can observe the selection in another Spectra widget or use it for further analysis.
HyperSpectra¶
Plots 2D map of hyperspectra.
Inputs
- Data: input dataset
Outputs
- Selection: spectra from selected area
- Data: dataset with information whether a spectrum was selected or not
The HyperSpectra widget plots hyperspectra that were read from the .map file. To use this widget with infrared spectral data, you need to transform it with Reshape Map widget.
At the top, HyperSpectra shows a 2D map of a slice of the spectra. At the bottom, a spectra plot is shown with the red line indicating the wavenumber slice we are observing at the top.
- Image values: define the transformation (usually an integral) of the spectra or use a feature to use as values for the plot. The former transformation can be an integral from 0, integral from baseline, peak from 0, peak from baseline, closest value, X-value of maximum from 0 or X-value of maximum from baseline.
- The hyperspectral plot of the slice of the spectra.
- Zoom in (Z): zoom in to the area selected from the hyperspectral plot
- Zoom to fit (backspace): return to the original plot
- Select (square) (S): select an area from the plot by clicking at the top left corner and then the bottom right corner of the desired selection area
- Select (polygon) (P): select an area by circumscribing a polygon
- Save graph (Mod + S): save the visualization as a .png, .svg or .pdf file.
- Axis x: define the attribute for the x axis
- Axis y: define the attribute for the y axis
- Color: select the color for the plot
- The spectral plot of the selected image region. It behaves like the Spectra widget.
- Region selectors for the chosen integration method.
- Split between image and spectral view: move it to increase the image size.
Interpolate¶
Interpolate spectra.
Inputs
- Data: input dataset
- Points: a reference data set
Outputs
- Interpolated Data: aligned dataset
The Interpolate widget enables you to align datasets with different wavenumbers. It has automatic interpolation or you can provide the reference data set to align with.
- Enable automatic interpolation: creates a new domain, which consequently enables interpolation of values on test data.
- Linear interval:
- Min: minimum cutoff
- Max: maximum cutoff
- Delta: the difference between the cutoffs
- Reference data: the data is aligned to the reference data
Examples¶
The first example shows how to use Interpolate to align the train and test data set of spectral data. We will use collagen-interpolate-train.tab to train our model. Let us load the data with the File widget. Then connect it to Test & Score and add a learner, say, Logistic Regression. The scores in Test & Score look great.
Now we would like to test on a separate data set, which has different wavenumbers. We will use collagen-interpolate-test.tab for testing. If we connect this data directly to Test & Score and select the option Test on test data, our results will be horrible. What has happened?
Well, Orange couldn't find any similarity between the two datasets, since the wavenumbers differ. This is why we need to interpolate first, to align the two data sets to the same scale. I will insert the Interpolate widget between File - Test and Test & Score. I will also provide the File - Train as the reference data set and select this as an option in Interpolate. Now the results in Test & Score are much better.
The second use case is a tad more advanced. We will use Interpolate to determine how much granularity we can afford to lose in our measurement. Say we wish to perform a diagnostic much faster. Could we measure only every 10th wavenumber? Or every 50th?
We will use the Liver spectroscopy data from the Datasets widget. Connect the widget to Interpolate and use the Linear interval option. The delta is set to 10. Then observe the performance of predictive models in Test & Score. Use any classifier you want; we chose Logistic Regression and Random Forest. The AUC is quite high.
Now, set the delta to, say, 50 and observe how the AUC changes. Not much. Try setting the delta to 100 or 150. The AUC is still high, which means the classifier is stable even at such a low resolution. This is a nice way to determine how much granularity you can afford to lose to be still able to achieve a good separation between class values.
Preprocess Spectra¶
Construct a data preprocessing pipeline.
Inputs
- Data: required input data set
- Reference: optional reference data set used in some preprocessing methods
Outputs
- Preprocessed Data: transformed data set
- Preprocessor: preprocessing methods
The Preprocess Spectra widget applies a series of preprocessing methods to spectral data. You can select the preprocessing method from the list and press the triangle button on the right to visualize the result. The order of the preprocessing matters, so to change the order of the preprocessing, just drag and drop the method to its proper place.
The input data for the selected method is displayed in the top plot, while the preprocessed data is shown in the bottom plot.
You can observe each preprocessing step by pressing the triangle button on the right. To apply all of then and observe the final result plot, press Final preview. To output the data, press Commit.
The reference data set is processed along the input data: only the first preprocessor uses the reference as on the input. If the reference needs to stay fixed, split your preprocessing methods among multiple Preprocess Spectra widgets and connect references accordingly.
Below is an example of the Preprocess Spectra widget in action with some explanation of the main features.
- Add a preprocessor from the dropdown menu.
- Preview plot with its editor menu like in the Spectra widget. The top plot shows the data before and the bottom after preprocessing.
- Preview a single preprocessor (the upper plot will show its input, the plot below its output).
- Observe the final result of preprocessing by clicking the Final Preview button. Change the number of spectra shown in the plot.
- Press Commit to calculate and output the preprocessed data.
Preprocessing Methods¶
- Cut (keep): Select the cutoff value of the spectral area you wish to keep.
- Cut (remove): Select the cutoff value of the spectral area you wish to discard.
- Gaussian smoothing: apply Gaussian smoothing.
- Savitzky-Golay Filter: apply Savitzky-Golay filter.
- Baseline Correction: correct the baseline
- Normalize Spectra: apply normalization.
- Vector normalization: calculates the L2 norm
- Min-Max normalization: divides each spectra with its Ymax - Ymin range
- Area normalization: provides several methods, also allows the selection of a specific range for the calculation
- Attribute normalization: normalize each spectrum with one of the available pre-calculated attributes
- Standard Normal Variate (SNV): \(\tilde{X}^{SNV}_i = (X_i - \tilde{X}_i) / \sigma_i\)
- Normalize by Reference: divides each spectrum with the reference spectrum on the input
- Integrate: compute integrals of selected area. Similar to the Integrate Spectra widget.
- PCA denoising: denoise the data with PCA.
- Transmittance to Absorbance: convert from transmittance to absorbance spectra.
- Absorbance to Transmittance: convert absorbance spectra to transmittance.
- Shift spectra: shift things around.
- EMSC: special Norweigan method.
- Spike Removal: Removes spikes in spectra through a modified z-score. More...
- Asymmetric Least Squares Smoothing: Three ALS methods which can be used for baseline subtraction. More...
- Atmospheric gas correction: remove H20/CO2 contributions using a reference spectrum. More...
Example¶
Normally, we would use Preprocess Spectra at the beginning of the analysis. We will use the liver spectroscopy data from the Datasets widget.
In Preprocess Spectra we will select a couple of preprocessing methods and observe their output. First, let us use the Baseline Correction which removes the baseline from the spectra.
Then we will cut an area of interest with the Cut (keep) method. To set the area we wish to keep, drag the red lines left or right in the plot. You will see how the bottom changes with a change in selection.
To see the end result of preprocessing, press Final preview and once you are satisfied with the results, press Commit. We can observe the end result in a Spectra widget or use the preprocessed data in the downstream analysis.
Integrate Spectra¶
Integrate spectra in various ways.
Inputs
- Data: input dataset
Outputs
- Integrated Data: data with integrals appended
- Preprocessor: preprocessing method
The Integrate Spectra widget allows you to add integrals to your data by selecting regions of interest and integrating them with several methods.
- Add integral:
- Integral from 0:
- Integral from baseline:
- Peak from 0:
- Peak from baseline:
- Closest value:
- X-value of maximum from 0:
- X-value of maximum from baseline
- Toggle preview.
- Preview plot with its editor menu like in the Spectra widget.
- Show a subsample of the spectra (implemented for performance).
- Output integrals as meta attributes. Otherwise only integrals will be output. Commit to send the changes to the output.
Example¶
This is a simple example on how to use the Integrate Spectra widget. The widget provides many options for integrating spectral areas and the results are appended as additional columns to the data.
We are using the liver spectroscopy data set from the Datasets widget. In Integrate Spectra we have selected integral from 0 and set the lower and upper limit with the red lines. We could also do it by setting the Low limit and High limit values on the left.
To observe the integrated area, we need to press the triangular play button next to the method. To output the data, we need to press Commit.
Finally, we can observe the additional column with the integral values of the area in a Data Table.
Multifile¶
Read data from input files and send a data table to the output.
Outputs
- Data: a data table of all the loaded files
The Multifile widget loads data from different sources and works like Concatenate widget for spectroscopy. The widget will output a union of attributes and features, with missing values for non-matching wavenumbers. To interpolate missing data, use the Interpolate widget.
- Loaded files.
- Load local files.
- Remove the selected file.
- Clear all files.
- Label the concatenated data.
- Reload the files.
- Domain editor. Features can be edited by double-clicking on them. The user can change the attribute names, select the type of variable per each attribute (Continuous, Nominal, String, Datetime), and choose how to further define the attributes (as Features, Targets or Meta). The user can also decide to ignore an attribute.
- Add Multifile to the report. Apply to commit the changes.
Example¶
Here is a simple example on how to use the Multifile widget. We have loaded two data set that were stored on our local machine. We used the folder icon to access the files and load them. Now our files are displayed in the top box. We have labelled the files collagen, make it clear what it is about.
We can observe the concatenated data in the Spectra widget or in a Data Table.
Tile File¶
Read data tile-by-tile from input file(s), preprocess the spectra, and send a data table to the output.
Inputs
- Preprocessor: A preprocessor list from the Preprocess Spectra widget
Outputs
- Data: preprocessed dataset read from the input file(s)
The Tilefile widgets loads data from compatible mosaic spectral images and applies the supplied preprocessor(s) to the data. The preprocessing is applied one mosaic tile at a time, and the resulting processed dataset is combined into a single Data Table.
At least one of the preprocessors should reduce the dataset size (such as Cut, Integrate) to take advantage of this file loader and reduce total memory usage.
By default, the widget will not load the dataset automatically. This prevents loading a large dataset into memory before the desired preprocessor chain is configured. Press the "Reload" button to load the data.
- Browse through previously opened data files, or load any of the sample ones.
- Browse for a data file.
- (Re)loads currently selected data file.
- Insert data from URL addresses.
- Information on the preprocessed dataset: dataset size, number and types of data features.
- Additional information on the features in the preprocessed dataset. Features can be edited by double-clicking on them. The user can change the attribute names, select the type of variable per each attribute (Continuous, Nominal, String, Datetime), and choose how to further define the attributes (as Features, Targets or Meta). The user can also decide to ignore an attribute.
- Browse documentation datasets.
- Information on the applied preprocessor list.
- Produce a report.
Example¶
Here is a simple example on how to use the Tilefile widget. We configured a preprocessor list in Preprocess Spectra and connected the Preprocessor output to the input on the Tilefile widget. We have loaded a mosaic data set that was stored on our local machine. We used the folder icon to access the file and load them. We check the preprocessor that will be applied and press "Reload" to load the data. Now information about the preprocessed dataset is displayed in the info box and domain editor.
We can observe the preprocessed data in the HyperSpectra widget or in a Data Table.
This example workflow can be found in Help/Example Workflows.
Average Spectra¶
Average spectra.
Inputs
- Data: input dataset
Outputs
- Averages: averaged dataset
The Average Spectra widget enables you to calculate average spectra. It can output the average of the entire dataset, or average into groups defined by a Categorical feature.
Use Group by to output averages defined by a Categorical feature.
Columns of non-Numerical data will return a value if every row in that group has the same value, otherwise it will return Unknown.
Interferogram to Spectrum¶
Performs Fast Fourier Transform on an interferogram, including zero filling, apodization and phase correction.
Inputs
- Interferogram: input interferogram
Outputs
- Spectra: dataset with spectra
- Phases: phases
Reshape Map¶
Builds or modifies the shape of the input dataset to create 2D maps from series data or change the dimensions of existing 2D datasets.
Inputs
- Data: input dataset
Outputs
- Map Data: data as a map
The Reshape Map widget transforms the input data to a map.
- Map shape:
- The X dimension.
- The Y dimension.
- Send data automatically or press Send.
PLS¶
Partial Least Squares Regression widget for multivariate data analysis.
Inputs
- Data: input dataset
- Preprocessor: preprocessing method(s)
Outputs
- Learner: PLS regression learning algorithm
- Model: trained model
- Coefficients: PLS regression coefficients
PLS (Partial Least Squares) widget acts as a regressor for data with numeric target variable. In its current implementation, it is the same as linear regression, but with a different kind of regularization. Here, regularization is performed with the choice of the components - the more components, the lesser the effect of regularization.
PLS widget can output coefficients, just like Linear Regression. One can observe the effect of each variable in a Data Table.
- The learner/predictor name
- Parameters:
- Components: the number of components of the model, which act as regularization (the more components, the lesser the regularization)
- Iteration limit: maximum iterations for stopping the algorithm
- Press Apply to commit changes. If Apply Automatically is ticked, changes are committed automatically.
Example¶
Below, is a simple workflow with housing dataset. We trained PLS and Linear Regression and evaluated their performance in Test & Score.
Peak Fit¶
Fit data to a composite peak model.
Inputs
- Data: Input data set
Outputs
- Fit Parameters: Best fit values for the model parameters
- Fits: Total evaluated best fit
- Residuals: Difference between Fits and Data
- Data: Input data set annotated with Fit Parameters
The Peak Fit widget computes the least-squares minimization curve fit for arbitrary, user-defined composite peak models. It outputs the best fit parameters for the defined model and the resulting total fit.
- Add a model component from the dropdown menu.
- Input model initial parameters and constraints.
- Visualize the initial peak and peak color.
- Select subsample of data for preview fit calculation.
- Preview plot of fit results for subsample. Center line of selected model is visualized along with
the fit results for the selected curve:
- Black dash: selected curve
- Red line: total fit
- Colored line: individual component fit
- Colored dash: individual component initial values
- Light black: other subsample spectra which can be selected
- Commit to start fit calculation on entire dataset.
Models and Parameters¶
The Peak Fit widget uses the excellent lmfit
package
for model definitions and non-linear optimization calculations.
The varied model parameters are specific to the model, however each peak-like model includes at least:
- center: the centroid x value of the peak
- amplitude: multiplicative factor for peak strength or area
- sigma: the characteristic width of the peak
The following models are available:
- Gaussian: A model based on a Gaussian or normal distribution function.
- Lorentzian: A model based on a Lorentzian or Cauchy-Lorentz distribution function.
- Split Lorentzian: A Lorentzian model with independent left/right width parameters.
- Voigt: A model based on a Voigt distribution function.
- pseudo-Voigt: A Voigt approximation from a weighted sum of Gaussian and Lorentzian functions.
- Moffat: A model based on the Moffat distribution function.
- Pearson VII: A model based on a Pearson VII distribution.
- Student's t: A model based on a Student’s t-distribution function.
- Breit-Wigner-Fano: A model based on a Breit-Wigner-Fano function.
- Log-normal: A model based on the Log-normal distribution function.
- Damped Harmonic Oscillator Amplitude: A model based on the Damped Harmonic Oscillator Amplitude.
- Damped Harmonic Oscillator (DAVE): A Damped Harmonic Oscillator model with the DAVE definition.
- Exponential Gaussian: A model of an Exponentially modified Gaussian distribution.
- Skewed Gaussian: A Gaussian model using a skewed normal distribution.
- Skewed Voigt: A Voigt model using a skewed normal distribution.
- Thermal Distribution: A thermal model based on one of Bose-Einstein, Maxwell-Boltzmann, or Fermi-Dirac distributions.
- Doniach Sunjic: A model of a Doniach-Sunjic asymmetric lineshape.
Some baseline models are included, however preprocessing baselines (in Preprocess Spectra) reduces the number of varied parameters in the model and may improve fitting performance.
Constraints¶
Each of the varied parameters can have constraints applied to improve fitting performance.
The type of constraints can be:
- fixed: The parameter is not varied
- limits: Specified minimum and maximum values
- delta: Minimum and maximum values are set to the initial value ± the delta value.
- expr: Some models default to calculating a parameter from another parameter. It is not possible to input custom expressions.
Common uses of these constraints would be:
- Limiting the center position to some range of x values
- Setting a minimum amplitude to force positive peaks
- Setting a maximum sigma to exclude unreasonably wide peaks
SNR¶
Signal-to-Noise Ratio (SNR)
Inputs
- Data: input dataset
Outputs
- Signal-to-noise ratio: signal-to-noise ratio dataset
- SNR = \(\frac{\overline{Spectra_{x, y}}}{\sigma _{x, y}}\)
- Averages: averaged dataset
- Averages = \(\overline{Spectra_{x, y}}\)
- Standard Deviation: standard deviation dataset
- Standard Deviation = \(\sigma _{x, y}\)
The SNR widget computes the SNR, average, or standard deviation of spectra. It can output the results of an entire dataset or by coordinates (x, y).
Use Select axis: x to select an axis that will act as the first element for your coordinate system defined by a numeric meta.
Use Select axis: y to select an axis that will act as the second element for your coordinate system defined by a numeric meta.
In the example above, the result will be:
output = Signal-to-noise ratio(column, row)
SNR = \(\frac{\overline{Spectra_{column, row}}}{\sigma _{column, row}}\)
If you want to select only one axis:
output = Average(x)
Average = \(\overline{Spectra_{column}}\)
or
output = Standard Deviation(x)
Standard Deviation = \(\sigma _{column}\)
If you want the result of the complete data set, you can just leave both as None.