siMPleR: User Manual

Dennis Walvoort and Willem van Loon

2024-12-19

Introduction

The siMPle FTIR data analysis software produces files with many microplastic records. Reporting of results and optimization experiments may require the processing of a substantial amount of data files. To make the data analysis of these result files correct, efficient, transparent and reproducible, a data analysis script is highly needed for Dutch microplastic monitoring and data analysis. The R-package siMPleR that serves these needs is described in this manual.

The functional requirements and technical specifications are given in van Loon & Walvoort (2024) and are shipped with the siMPleR-package.

Prerequisites

Installing R

One of the design goals was to create a simple R-script that should be able to run in a standard R-console of the R Project for Statistical Computing.

If R is not installed on your computer you can download it here for free. R is available for MS-Windows, Linux, and macOS. Follow the instructions on this website to install R on your computer. You should install R version >= 4.4.0.

Installing required packages

The siMPleR software requires a limited number of packages to run. On MS-Windows, these can be installed via the menu option: Packages | Install package(s)... | Select CRAN mirror | Select packages. You should select the following packages:

dplyr, readr, purrr, stringr, yaml, tidyr, ggplot2

Perhaps more convenient is to install packages by running the following code in the R-console:

install.packages("dplyr", "readr", "purrr", "stringr", "yaml", "tidyr", "ggplot2")

Copy and paste this line into the R-console and press the return-key. This also works on Linux and MacOS.

Installing packages is only needed once.

Installing siMPleR

The siMPleR-package is provided in a zip file. You should unzip this file in a directory of choice. This will be your working directory.

In R you can select this working directory by typing, e.g.:

setwd("c:/simpler")

in the R-console, where c:/simpler is a placeholder name for your working directory. As an alternative, you can also select the menu option File | Change dir... in the R Console (MS-Windows only).

This working directory now contains the polymer-colours.yaml-file and the following subdirectories:

The siMPleR-package can be installed by typing:

install.packages("./pkg/siMPleR_1.0.0.tar.gz", repos = NULL, type = "source")

in the R console. As an alternative, the package can also be installed via the RGui menu (Packages | Install package(s) from local files…). Installing the siMPleR-package needs to be done only once.

Input file

The siMPle FTIR software processes FTIR instrument files and produces an list of MP identifications. The format of this list is:

These files are included in the input directory. On some computers, the character set that siMPle FTIR uses gives weird looking characters in the units of the header in the data files. The R-package siMPleR can handle these characters.

In the input directory, all the individual siMPle FTIR files are placed.

Each siMPle FTIR file must have the following filename format:

@location_Y1234_Rx_Ey_G12

The metadata are encoded in the filename: @**** gives the location code, Y*** the year, R* the replicate number, E* the extraction number, and G* the sample mass analyzed in gram. For example: @NW2_Y2023_R1_E2_G20. The order of the metadata is irrelevant. Note that the metadata are connected via underscores. Additional text is allowed in the filename. However, this text should not start with any of the reserved prefixes given above (@, Y, R, E, or G).

In each siMPle file, a QC column (with header QC) must be added manually. This QC column may contain the following QC-codes (added by the analyst or by the script):

The QC field may often be empty, if no QC has been performed on that record. Additional QC codes are allowed in the siMPle files, but are not processed by the script.

Additional metadata can be added to the siMPle FTIR output for reporting purposes.

An example of the desired input format is given in the input directory’. This directory is provided with the siMPleR software.

Note that siMPle is not restricted to analyzing output files from the siMPle FTIR software. As long as the file formats comply with the specifications given above, the output files of any software will do.

Running the script

First you have to select your working directory (see Section “Installing siMPleR”). In the R Console this can be done by means of the menu option: File | Change dir.... A simple browser will appear that you can use to navigate to your working directory.

The siMPleR-package can be loaded by typing library(siMPleR) in the console.

The siMPleR-software can be started by running simpler() in the console. Two questions will be asked:

  1. the minimum particle length to be analysed (default: 50 mu);
  2. if duplicates should be removed (default: yes).

Quality control

The data file will first be checked for possible errors. The following checks will be performed:

  1. Does the siMPleR-input directory exist? The script will give an error message when this directory is not found;
  2. Does the siMPleR-output directory exist? The script will give an error message when this directory is not found;
  3. Are all required column names available? The script will give an error message when a required column is missing. The required column names are ‘Coord. [um]’, ‘Group’, ‘Major dim [um]’, and ‘QC’. The first three are column names are given by siMPle. The QC-column has to be added by the user;
  4. Are all required metadata available in the file names. The required metadata are ‘location’, ‘year’, ‘replicate’, ‘extract’, and ‘sample mass [g dw]’;
  5. Are the QC-codes in the QC-column valid?
  6. Are duplicates available? Records are duplicates (or triplicates etc.) if they have exactly the same values for columns ‘location’, ‘year’, ‘replicate’, ‘Coord. [um]’, and ‘Group’. If duplicates are found, only the record with the greatest ‘major_dim [um]’ will be retained.

Data analysis

Only records that have successfully passed quality control will be used in the data analysis.

The script performs several data analysis steps:

The colours of the graphs may be specified by changing the coloor codes in polymer-colours.yaml. How to change colour codes is explained in the header section of this file.

Output

siMPleR produces the following outputs, each with a timestamp to uniquely identify individual runs:

Group total # QCs # = # plastic # natural % false pos
APU 7 6 0 6 0 0
EVA 3 3 0 3 1 25
PA 2 2 2 0 0 0

The following classification rules are applied to construct this table. The siMPle records with the QC-codes ‘ppring’, ‘< 50 um’ and ‘duplicate’ are excluded from the calculation because they fall outside the valid dataset (results are not valid). These records can still be found in the basic-data file for information/QC. A QC action using an external database may result in 3 results:

  1. if the external database reports the same polymer, the QC result is ‘=’;
  2. if the external database reports a different polymer, but still a plastic, the result is ‘plastic’;
  3. if the external database reports a natural material (inorganic or organic material), the result is ‘natural’;
  4. the % false positives is calculated as: [#natural / total] × 100%, in which ‘total’ is the valid total number of microplastic particles.

References

Van Loon, W. and D. Walvoort. Technical Specifications of the siMPleR script: post-analysis of microplastic data. Version 13, 14-01-2025