You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

730 lines
33 KiB

---
layout: post
title: "Measurement errors and dimensional analysis in R"
date: "2020-11-17T18:00:00+01:00"
slug: "measurement-errors"
aliases:
- /2020/11/06/measurement-errors
bibliography:
- /media/bay/taha/sites/hugo/solarchemist/static/bib/library.bib
- /media/bay/taha/sites/hugo/solarchemist/static/bib/references.bib
csl: /media/bay/taha/sites/hugo/solarchemist/static/bib/materials-chemistry-and-physics.csl
link-citations: true
tags: ["R", "Python", "WSL"]
draft: no
comments: false
output:
blogdown::html_page:
toc: true
---
```{r packages, echo=F, results='hide', message=FALSE}
library(knitr)
library(errors)
library(constants)
library(units)
library(quantities)
library(common)
library(oceanoptics)
library(dplyr)
library(magrittr)
library(ggplot2)
library(ggforce)
```
```{r global_options, echo=FALSE, results='hide', message=FALSE}
options(
digits = 7,
width = 84,
continue = " ",
prompt = "> ",
warn = 0,
knitr.graphics.error = FALSE,
stringsAsFactors = FALSE)
opts_chunk$set(
dev = 'svg',
fig.align = 'center',
echo = TRUE,
eval = TRUE,
cache = FALSE,
collapse = TRUE,
results = 'hide',
message = FALSE,
warning = TRUE,
tidy = FALSE)
```
<p>
<div style="border: 2px solid #f03838; padding-left: 10px; padding-top: 4px; background-color: #EBEBEB; color: black;">
I wrote this blog post in preparation for a hands-on web presentation I gave on Nov 13, 2020.
It's published mostly as-is in the hopes it might benefit others.
[Comments or questions on the text](https://hyp.is/LwjIgikrEeu6xX800vMY4g/solarchemist.se/about/) or
[regarding the code](https://codeberg.org/taha/2020-11-17-measurement-errors/issues) are most welcome!
</div>
</p>
Today I would like to demonstrate how to get automatic error propagation for your data analysis in R
using the [`errors` package by Iñaki Ucar et al.](https://r-quantities.github.io/errors/) @Ucar2018
I would also like to demonstrate how to combine that with dimensional analysis
(with automatic unit conversion) using the [`units` package by Pebesma et al.](https://r-quantities.github.io/units/) @Pebesma2016
This is usually known as *quantity calculus*.
To make it easy for you, dear reader, to follow along, we will take it from the start
and assume all you have is a Windows 10 computer.
If you work on MacOS[^1] or Linux, you can [skip right ahead to "Working with measurement errors"](#working-with-quantities-and-measurement-errors)
(but you may have to adapt some of the steps, as they were written with Windows 10 in mind).
[^1]: If you run MacOS, you probably want to install [Homebrew, a Linux-like package manager](https://brew.sh).
Most of the fundamental tools used for reproducible scientific work,
such as [R](https://www.r-project.org/),
[knitr](http://kbroman.org/knitr_knutshell/),
[git](https://www.freecodecamp.org/news/what-is-git-and-how-to-use-it-c341b049ae61/),
[Python](https://www.python.org/about/gettingstarted/),
and not to be forgotten, the whole chain of document generation tools such as [LaTeX](https://www.latex-project.org/), [Markdown](https://www.markdownguide.org/), and [pandoc](https://pandoc.org/),
"just work" on any Linux distribution, but can be a bit of a hassle on Windows.
It is not hard to see why that is. All these tools are available for free and with libre software licences.
Windows is neither free nor libre, and has a long history of working against this academic model of software sharing.
These days, Microsoft, Google and other tech behemoths embrace open source software, [simply because it allows them to monetise work without paying for it](http://marktarver.com/thecathedralandthebizarre.html). Although, to be fair, they are also contributing back (to varying degrees) to this free/libre pool of software.
For an academic, open source tooling offers many tangible and intangible benefits, prime among them by [making the entire work reproducible](https://juretriglav.si/thoughts-on-reproducibility-of-open-scientific-software/), and effectively future-proof.
So, let's get to it.
We will make use of the fact that Microsoft has added something called the
Windows subsystem for Linux (WSL) on Windows 10,
which allows you to run a (nearly feature-complete) Linux shell.
In fact, recently Microsoft released WSL version 2 (WSL2), which attempts to bring the Linux shell natively to the Windows 10 desktop.
In order to work with R, the next sections will show you how to make use of WSL2 to install
Ubuntu 20.04, and then we will walk through installing R and RStudio Server using WSL2 Ubuntu.
You will then be able to run R (in RStudio) from a browser window on your Windows desktop.
Pretty cool, right?
With that in place, we will explore how to use R and the suite of `r-quantities` packages
to seamlessly manipulate vectors that have associated measurement uncertainties *and* units
through the steps of your data analysis, by way of an example using a subset of data from
a real experiment in the Edvinsson lab.
## Install WSL2 on Windows 10
In the near future, installing Ubuntu on Windows will be as simple as `wsl --install` (it's [in the development version of Windows 10](https://devblogs.microsoft.com/commandline/distro-installation-added-to-wsl-install-in-windows-10-insiders-preview-build-20246/)), but for now, we have to [follow the steps](https://docs.microsoft.com/en-us/windows/wsl/install-win10) below to achieve the same.
Open Powershell as administrator, then enable WSL1:
~~~powershell
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
~~~
To enable WSL2, we must first enable the Windows virtual machine (VM) feature:
~~~powershell
dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
~~~
Now you have to restart Windows.
Next, [download the WSL2 Linux kernel update package for x64](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi) (or [ARM64](https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_arm64.msi), depending on your machine), and install it.
Then open Powershell as administrator again, and set WSL2 as default for all future VMs:
~~~powershell
wsl --set-default-version 2
~~~
Now it is time to [install our Linux distribution of choice (i.e., Ubuntu) from the Microsoft Store](https://www.microsoft.com/store/apps/9n6svws3rx71). Open the Microsoft Store, find Ubuntu, and install it.
(And also, by the way, the Microsoft Store is awful. Such nasty underhanded nudging by Microsoft to create a wholly unnecessary "account").
Anyway, you just installed Ubuntu on your Windows machine, congratulations!
Now you get to create your Linux username and password: simply start the app `Ubuntu` from the Start menu.
### Parenthesis: if your Windows 10 is itself a virtual machine
This is unlikely to apply to 99% of my readers, so just skip ahead to the next section.
Who does this section apply to? Well, if you, like me, run Windows 10 as a VM on top of your regular OS,
you will need to make sure that a) your host machine (layer 0) supports nested virtualisation, and b) that
your hypervisor has nested virtualisation enabled.
Details on how for the KVM hypervisor on a Linux host in footnote.[^2]
[^2]: Check that the bare-metal Linux host [supports "Nested virtualisation"](https://www.linux-kvm.org/page/Nested_Guests) by inspecting either one of `/etc/modprobe.d/qemu-system-x86.conf` or `/sys/module/kvm_intel/parameters/nested`.
The first file should contain `nested=1`, and the second should contain `Y` or `1`. If not, set them accordingly.
With the Windows VM shut down, edit the KVM XML config file so that the `<cpu>` tag includes `mode=host-passthrough`, like this: `<cpu mode='host-passthrough'>`. Note that these settings must be in place *before* installing WSL2 Ubuntu on Windows.
Also note that WSL2 supports nested virtualisation, but not WSL1.
### Parenthesis: managing and rebooting your WSL2 virtual machines
Closing the Ubuntu shell in Windows does not restart the Ubuntu VM, and sometimes it can be useful to effect a reboot.
So how do we do that?
[It's not obvious](https://askubuntu.com/questions/1131122/cant-restart-shutdown-ubuntu-under-wsl),
but the `wsl` executable has two useful attributes,
~~~powershell
wsl --list --verbose
wsl --terminate Ubuntu-20.04
~~~
that allows you to list/view all installed VMs (running or not), and then you simply `--terminate` to shut down,
and `wsl -d Ubuntu-20.04` to start again. Starting the VM can also be achieved from the app link in the Start menu,
assuming the VM has one.
## Install R and RStudio server on WSL2 Ubuntu
With a complete Ubuntu shell in place, we can now go about our business and install R.
In your Ubuntu shell, add the public key of the CRAN repository, then add the Ubuntu 20.04 CRAN repo of R version 4.x to your package manager's list of sources:
```{bash, echo=TRUE, eval=FALSE}
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
echo "deb https://cran.uni-muenster.de/bin/linux/ubuntu focal-cran40/" | sudo tee /etc/apt/sources.list.d/r-project.list
```
Note that you can change the domain to any one of the [CRAN mirrors](https://cran.r-project.org/mirmon_report.html) if you want.
At this point, remember to tell `apt` about the new source by:
```{bash, echo=TRUE, eval=FALSE}
sudo apt update
```
Install R and its dependencies using the `aptitude` package manager
(we will also install `libudunits2-dev`, which is [a C library and utility for handling physical units](https://github.com/Unidata/UDUNITS-2/) that the R package `units` depends on).
```{bash, echo=TRUE, eval=FALSE}
sudo apt install r-base r-base-core r-recommended r-base-dev gdebi-core build-essential libcurl4-gnutls-dev libxml2-dev libssl-dev libudunits2-dev
```
This installs about 400 MB of software in a jiffy. That's R!
Now download the latest RStudio Server package file for Ubuntu 18+ and install it:
```{bash, eval=FALSE, echo=TRUE}
wget https://rstudio.org/download/latest/stable/server/bionic/rstudio-server-latest-amd64.deb
sudo gdebi rstudio-server-latest-amd64.deb
```
Great! Now all you got to do is launch your RStudio Server:
```{bash, eval=FALSE, echo=TRUE}
sudo rstudio-server start
```
With that, you can open your browser (Chromium/Chrome recommended) at `http://localhost:8787`
and enjoy your very own instance of RStudio Server.
(You sign in with the same username and password as in your Ubuntu Linux VM).
And with that, all the preparatory steps are done.[^3]
We can now start working in our data analysis environment in R.
[^3]: We should mention here that there is a known limitation to WSL2 virtualisation regarding
having services (such as RStudio Server in this case) persist across reboots of the VM or even
across reboots of the Windows machine itself. In short, [WSL2 does not support `systemd` nor `init`](https://github.com/microsoft/WSL2-Linux-Kernel/issues/44),
so any services on the Linux VM will [not autostart when the VM is rebooted](https://stackoverflow.com/a/55582694).
Even so, the web is [full](https://github.com/DamionGans/ubuntu-wsl2-systemd-script) of [hacks](https://github.com/arkane-systems/genie) [attempting to circumvent this limitation](https://reddit.com/r/bashonubuntuonwindows/comments/ezpcyu/how_to_make_wsl2_run_services_at_startup/).
On the other hand, [this might not be such a big problem](https://dev.to/bowmanjd/you-probably-don-t-need-systemd-on-wsl-windows-subsystem-for-linux-49gn)
for the kind of experimentation that having an otherwise
complete Linux shell available on the Windows desktop allows.
## Working with quantities and measurement errors
The `quantities` package (which ties together the capabilities of
the `units` and `errors` packages) has an [excellent vignette on quantity calculus in R](https://cran.r-project.org/web/packages/quantities/vignettes/introduction.html).
The `errors` package is [available on CRAN](https://cran.r-project.org/web/packages/errors/index.html), and its [git repo is published on Github](https://github.com/r-quantities/errors).
The package is also [presented by its authors in an R Journal article from 2018](https://journal.r-project.org/archive/2018/RJ-2018-075/index.html), and there is also [an informative blog post by the main author for the very first release of the package](https://www.enchufa2.es/archives/errors-0-0-1.html).
The [`units` package](https://github.com/r-quantities/units) handles physical quantities
(values with associated units) in R,
and builds on UNIDATA's [`udunits-2` C library](https://github.com/Unidata/UDUNITS-2) of
over 3000 supported units, making it a very robust implementation while avoiding the sin
of re-inventing the wheel. It does lead to a dependency on the system-level package `udunits2`
however, making it very hard to install and work with `units` on Windows.
I think that these packages have finally brought R up to par with Python with regards to
quantity calculus, which is a crucial aspect for any scientist or engineer handling
physical measurement values.
Although it should be noted that the [`uncertainties`](https://pythonhosted.org/uncertainties/) [package](https://github.com/lebigot/uncertainties/) in Python is more mature
(v1.0 was released in 2010 whereas the `errors` package in R still has not achieved
that milestone), the `errors` package is very usable, indeed I used it for practically
all the analysis steps of the latest paper I co-authored (I even managed to cite the package). @Ahmed2020
Let's take the `errors` and `units` packages out for a spin!
Since this is a clean R installation, first we will need to install these packages and
any other packages we would like to have available.
### Install necessary R packages
Let's install all the `r-quantities` packages, and a subset of the [tidyverse packages](https://www.tidyverse.org/)
(feel free to install any other packages you may like).
```{r 'rstudio-server-install-packages', results="asis", echo=FALSE}
include_graphics("assets/rstudio-server-install-packages.jpg")
```
```{r, eval=FALSE}
install.packages(c("errors", "units", "constants", "quantities"))
install.packages(c("dplyr", "magrittr", "ggplot2", "ggforce", "tibble"))
install.packages("remotes")
remotes::install_github("solarchemist/R-common")
remotes::install_github("solarchemist/oceanoptics")
```
(This operation may take a minute).
Now let's load all the packages we will need and get to work on some real-life data.
```{r, echo=TRUE, eval=FALSE}
library(errors)
library(constants)
library(units)
library(quantities)
library(common)
library(oceanoptics)
library(dplyr)
library(magrittr)
library(ggplot2)
library(ggforce)
```
```{r units-options, echo=FALSE}
# aim: get proper (mathematically) division sign between value and unit in plot axis labels
units_options(
# as soon as a division sign is included in sep, plot fails!
sep = c("~", "~"),
# but it seems we can achieve div sign between value and unit by hacking the group variable
group = c("/", ""), # appending ~ or space to group[1] throws an error, weird
# use negative powers, as division signs between units can be ambiguous
negative_power = TRUE,
# for my slash-sign hacking to work at all, parse must be FALSE, it seems
parse = FALSE)
```
### Back to work
```{r 'create-shared-rda', eval=FALSE, echo=FALSE}
## this chunk is left here to document how the shared rda file was created
# the H02A files
this.files <-
list.files(
path="/media/bay/taha/chepec/laboratory/OOHR2000/160329/H02A",
pattern = "\\.txt$",
full.names = TRUE)
# add the MB dye concentration for each experiment
this.conc <-
c(100, 100, 50, 50, 20, 20, 10, 10,
5, 5, 2, 2, 1, 1, 0.5, 0.5) # uM
# collect all experiments into a single dataframe
df.MB <- NULL
for (i in 1:length(this.files)) {
df.MB <-
bind_rows(
df.MB,
bind_cols(
OO2df(datafile = this.files[i], version = "1"),
conc = this.conc[i]))
}
df.MB <-
df.MB %>%
# H02AA experiments were bad (saturated detector), drop them from df
filter(!sampleid %in% c("H02AA", "H02AA+EtOH")) %>%
# the timestamp column is unnecessary, drop it
select(-DateTime)
# save this dataframe as RDA file to disk
save(
df.MB,
file = here::here("assets/uvvis-abs-MB-conc-series.rda"))
```
For the sake of keeping this post short, I will spare you the legwork of reading the
individual spectra files into a dataframe. I have done that and saved to resultant dataframe
to an rda-file which has been uploaded to a publicly accessible URL.
Let's import the data to a variable, and have a quick look at the dataframe:
```{r 'read-uvvis-MB-conc-series', echo=TRUE, results="markup"}
# read the MB abs spectra from the public URL
df.MB <-
common::LoadRData2Variable(url="https://public.solarchemist.se/data/uvvis-abs-MB-conc-series.rda")
df.MB %>% glimpse()
```
As you can see, it is raw data from the spectrometer, and it comes with neither
measurement errors nor units.
(Unfortunately, it is still all-to-common for lab instruments to export data without such attributes).
It is perfectly acceptable to add *estimated* standard errors to this raw data
(we will add units at the same time using the `set_quantities()` function of the `quantities` package).
```{r, echo=TRUE}
# specify unit and error for the measured concentration values
df.MB$conc <-
set_quantities(
df.MB$conc * 1e-6,
"mol/L",
# set relative error 2% (estimate)
df.MB$conc * 1e-6 * 0.02)
# specify unit and error for wavelength values to instrumental resolution
df.MB$wavelength <-
set_quantities(df.MB$wavelength, "nm", 1.5)
# specify error for absorbance values to 1% rel error (estimate)
df.MB$intensity <-
# absorbance is unitless, or more precisely, has a unit of "1"
set_quantities(df.MB$intensity, "1", df.MB$intensity * 0.01)
# add a column describing the solvent (H2O or H2O+EtOH)
df.MB$solvent <- "H2O"
df.MB$solvent[grep("+EtOH$", df.MB$sampleid)] <- "H2O+EtOH"
```
For plotting, we will use the package `ggplot2`, which is like a cousin to the `dplyr` data manipulation package. Both come with a sort of "coding grammar", which is very comfortable to use once you
get used to it.
In my opinion it is much easier to express chains of manipulations using this grammar
(you can find plenty of examples in the code below).
We should note that ggplotting two columns that both are `units` objects **will fail** with `Error in Ops.units(): both operands of the expression should be "units" objects`,
due to the `units` package (rightfully) determining that they don't have the same units
(or units that can be converted into each other).
The (very elegant) solution to this finicky behaviour is
[provided by the `ggforce` package](https://ggforce.data-imaginist.com/reference/scale_unit.html).
All we have to do is make sure we load `library(ggforce)`,
and then we can get back to using ggplot2 as usual (mostly as usual, you will notice
that we now use a `scale_*_unit()` function instead of the usual `scale_*_continuous()`).
(ref:uvvis-MB-conc-series) Recorded UV-Vis spectra of MB in water and MB in water with 10 vol% EtOH (with the estimated measurement error of the absorbances shown as vertical error bars in red).
```{r 'uvvis-MB-conc-series', echo=TRUE, fig.cap="(ref:uvvis-MB-conc-series)", fig.align="center"}
ggplot(
df.MB %>%
filter(wavelength %>% as.numeric() >= 250) %>%
filter(wavelength %>% as.numeric() <= 850)) +
facet_wrap(~solvent) +
geom_errorbar(
colour = "red",
aes(x = wavelength,
ymin = errors_min(intensity),
ymax = errors_max(intensity))) +
geom_line(aes(x = wavelength, y = intensity, group = sampleid)) +
scale_x_unit(
breaks = seq(300, 800, 100),
expand = expansion(0.03, 0)) +
scale_y_unit(expand = expansion(0.03, 0)) +
labs(x = "Wavelength", y = "Abs") +
theme(legend.position = "none")
```
> To check whether a unit already exists in the `udunits2` library, I like
> to use the `udunits2::ud.is.parseable()` function.
> If you know of a better way, please let me know!
Now, as you have surely already noticed, this dataset is simply a series of UV-Vis
absorbance spectra at varying concentrations
(of methylene blue in water and water with some ethanol, in this case).
Here's how the data looks at this point (after we set errors and/or units for some columns):
```{r echo=TRUE, results="markup"}
df.MB %>% glimpse()
```
Most columns are of type "character", but note how `$wavelength` (and two other columns)
say something else in that field. We can take a closer look at the structure of the dataframe
using the `str()` function:
```{r echo=TRUE, results="markup"}
df.MB %>% str()
```
Having thus set suitable estimates (or perhaps even properly calculated) errors for the measured
quantities *wavelength*, *absorbance*, and *concentration*, we will use this data
to calculate the spectral absorption coefficient of MB in the two solvents we used.
As an introductory exercise, let's define the wavelength of the strongest absorption band along
with its unit and a synthetic standard error value:
```{r echo=TRUE}
wl.max.abs <- set_quantities(665, "nm", 5)
```
We can change the notation of the error from `parenthesis` to `plus-minus` when printing a quantity:
```{r, echo=TRUE, results="markup"}
print(wl.max.abs, notation="plus-minus", digits=1)
```
> By default, `errors` uses *parenthesis* notation (which is more compact) and a single significant digit for errors.
We can also print more digits for the error part by setting `digits`.
Both of these settings can also be set globally (for our entire R session) by using the `options()` function, like this:
```{r, echo=TRUE}
options(errors.notation="plus-minus", errors.digits=2)
```
Before we get into calculating the spectral absorption coefficients, recall that the
Beer-Lambert law
\begin{equation}
A = \epsilon lc \;\Longleftrightarrow\; \epsilon = \frac{A}{lc}
\end{equation}
includes the optical path length of the cuvette, $l=\SI{1}{\cm}$ in our case.
Let's set that as a variable so we can use it when calculating $\epsilon$:
```{r, echo=TRUE}
# I don't know the tolerances of the cuvette dimensions, (quite the oversight, I know)
# but since these cuvettes are mass-produced I am assuming they are pretty tight and
# so estimating the error at 1/10 mm
optical.length <- set_quantities(1, "cm", 0.01)
```
```{r 'lm-wrapper', echo=FALSE}
# wrap lm() to drop quantities and get around error when calling summary(lm())
# https://www.r-spatial.org/r/2018/08/31/quantities-final.html
qlm <- function(formula, data, ...) {
# get units info, then drop quantities
row <- data[1, ]
for (var in colnames(data)) if (inherits(data[[var]], "quantities")) {
data[[var]] <- drop_quantities(data[[var]])
}
# fit linear model and add units info for later use
fit <- lm(formula, data, ...)
fit$units <- lapply(eval(attr(fit$terms, "variables"), row), units)
class(fit) <- c("qlm", class(fit))
fit
}
# get quantities attributes back for the coef method
# (which is the only class we care about in this demonstration)
coef.qlm <- function(object, ...) {
# compute coefficients' units
coef.units <- lapply(object$units, as_units)
for (i in seq_len(length(coef.units) - 1) + 1)
coef.units[[i]] <- coef.units[[1]] / coef.units[[i]]
coef.units <- lapply(coef.units, units)
# use units above and vcov diagonal to set quantities
coef <- mapply(set_quantities, NextMethod(), coef.units,
sqrt(diag(vcov(object))), mode="symbolic", SIMPLIFY=FALSE)
# use the rest of the vcov to set correlations
p <- combn(names(coef), 2)
for (i in seq_len(ncol(p)))
covar(coef[[p[1, i]]], coef[[p[2, i]]]) <- vcov(object)[p[1, i], p[2, i]]
coef
}
```
With a series of UV-Vis absorbance spectra at varying concentrations, we can of course
plot the linear relationship between $A \propto c$ at a particular wavelength (such
as the main absorbance peak), but
for the sake of this exercise we will take the calculation one step further and calculate
$\epsilon(\lambda, A, c)$, i.e., the absorption coefficient for all wavelengths.
It is not pretty code, but it gets the job done. Here goes:
<!-- Calculate the spectral absorption coefficient by wavelength -->
```{r 'create-shared-abscoeff', eval=FALSE, echo=TRUE}
# loop over each spectrum by solvent
solvents <- df.MB %>% pull(solvent) %>% unique()
# beware, unique() silently drops all quantity attributes
wl.steps <- df.MB %>% filter(sampleid == unique(df.MB$sampleid)[1]) %>% pull(wavelength)
# hold the results of the loop in an empty dataframe, pre-made in the right dimensions
# complication! quantities cols must use the same (or convertable) units from the start
abs.coeff <-
data.frame(
wavelength = rep(wl.steps, length(solvents)),
solvent = "",
# slope of linear fit
k = rep(set_quantities(1, "L/mol", 1), length(solvents) * length(wl.steps)),
# intercept of linear fit
m = rep(set_quantities(1, "1", 1), length(solvents) * length(wl.steps)),
# calculated abs coeff
coeff = rep(set_quantities(1, "L*cm^-1*mol^-1", 1), length(solvents) * length(wl.steps)),
rsq = NA, # R-squared of linear fit
adj.rsq = NA) # adj R-squared of linear fit
i <- 0
# Fair warning: this loop is horribly slow, and may take several minutes to complete!
for (s in 1:length(solvents)) {
message("Solvent: ", solvents[s])
# varying conc of a particular solvent
for (w in 1:length(wl.steps)) {
# to keep track of the current row in abs.coeff, we will use a counter i
i <- i + 1
message("[", i, "] ", solvents[s], " :: ", wl.steps[w], " nm")
# temporary df which we will use to calculate linear fits (abs.coeff) for each wl.step
intensity.by.wl <-
df.MB %>%
filter(solvent == unique(df.MB$solvent)[s]) %>%
# even though we drop the errors when filtering, it does not affect the returned rows
filter(as.numeric(wavelength) == unique(as.numeric(df.MB$wavelength))[w]) %>%
select(sampleid, wavelength, intensity, conc, solvent)
# now we can calculate abs coeff for this solvent at the current wavelength
this.fit <-
# now this is the really tricky one
# https://github.com/r-quantities/quantities/issues/10
qlm(data = intensity.by.wl, formula = intensity ~ conc)
# save R-squared and adjusted R-squared
abs.coeff$rsq[i] <- summary(this.fit)$r.squared
abs.coeff$adj.rsq[i] <- summary(this.fit)$adj.r.squared
# save coefficients of linear fit
abs.coeff$m[i] <- coef(this.fit)[[1]]
abs.coeff$k[i] <- coef(this.fit)[[2]]
abs.coeff$coeff[i] <- coef(this.fit)[[2]] / optical.length
# solvent
abs.coeff$solvent[i] <- unique(intensity.by.wl$solvent)
}
}
save(
abs.coeff,
file = here::here("assets/uvvis-abscoeff-MB-conc-series.rda"))
```
```{r 'read-abscoeff-MB-conc-series', echo=TRUE}
# read the calculated MB abs coeff from the public URL
abs.coeff <-
LoadRData2Variable(url="https://public.solarchemist.se/data/uvvis-abscoeff-MB-conc-series.rda")
```
Well, that was quite a workout![^4] But now that we have saved the calculated absorption coefficients
to an rda-file on disk, we won't have to repeat that calculation (for a while at least).
But the proof is in eating the pudding, so let's plot the absorption coefficients we got.
[^4]: Note that we had to redefine the `lm()` function using our own wrapper that takes
care of unsetting and resetting the units and errors. [See above](#wrapper-function-redefining-lm-to-support-unitserrors).
The code was copied verbatim from [Ucar's r-spatial article](https://www.r-spatial.org/r/2018/08/31/quantities-final.html).
(ref:abscoeff-MB-conc-series) Calculated spectral absorption coefficient for MB in water and MB in water with 10 vol% EtOH. With the estimated standard error due to propagated measurement errors of the underlying quantities (red errorbars).
```{r 'abscoeff-MB-conc-series', echo=TRUE, fig.cap="(ref:abscoeff-MB-conc-series)", fig.align="center"}
ggplot(
abs.coeff %>%
filter(wavelength %>% as.numeric() >= 250) %>%
filter(wavelength %>% as.numeric() <= 850)) +
geom_errorbar(
colour = "red",
aes(x = wavelength,
ymin = errors_min(coeff),
ymax = errors_max(coeff))) +
geom_line(aes(x = wavelength, y = coeff, group = solvent)) +
scale_x_unit(
breaks = seq(300, 800, 100),
expand = expansion(0.03, 0)) +
scale_y_unit(expand = expansion(0.03, 0)) +
labs(x = "Wavelength", y = "Abs. coeff.") +
theme(legend.position = "none")
```
That looks good. Note how the vertical errorbars of the two curves (different solvents)
overlap completely at certain spectral regions, notably at the main absorbance peak.
And not only did our entire analysis automatically propagate the measurement errors,
we also managed to do the same for the units, doing automatic dimensional analysis
at the same time (note how the units in the plot axes labels is automatically derived from the data).
That's quantity analysis, in R, with real-life data!
I hope you learned something new, and that this will inspire you to use quantity analysis
for your future work.
## But what about Python?
I feel it would be bad form to end this post without even mentioning Python's support
for error propagation and dimensional analysis, which far predates R's.
Python has a whole array of [packages](https://pypi.org/project/units/) to
[choose](https://pypi.org/project/python-units/) from for [working with physical units](https://kitchingroup.cheme.cmu.edu/blog/2013/01/19/Using-units-in-python/),
but as far as I can tell none of them build on the UNIDATA database.
However, the [`pint` package](https://github.com/hgrecco/pint) appears to be the most
popular by far (with 1200 stars on Github, it's actually an order of magnitude more popular
than all the related R packages), and it integrates with both NumPy and the `uncertainties`
package and Pandas.
Support for errors and error propagation has been around since 2010 courtesy
of the [`uncertainties` package](https://pythonhosted.org/uncertainties) by Eric O. Lebigot.
I am not aware of any package for integrating between the `uncertainties` and `pint` packages,
but perhaps that is not necessary in Python. I'm not sure on that point.
is a little lacking.
In any case, getting up and running with Python is fairly easy.
In fact, Python 3.x comes pre-installed on the Ubuntu shell we just configured.
You should be aware, that you could just *write and execute* your Python code directly from
RStudio Server (or R in general) using the [`reticulate` package](https://rstudio.github.io/reticulate/) for R.
But if you prefer working with Python in its own IDE, you can actually
[install Jupyter Notebook](https://www.digitalocean.com/community/tutorials/how-to-set-up-jupyter-notebook-with-python-3-on-ubuntu-18-04)
and work from a browser, just like for RStudio Server. Just follow these steps:
In the WSL Ubuntu shell, run the following commands to install Jupyter Notebook:
~~~bash
sudo apt install python3-pip python3-dev
sudo -H pip3
sudo -H pip3 install --upgrade pip
sudo -H pip3 install jupyter
~~~
then run `jupyter notebook` and copy-paste the provided URL into your browser.
> Note: for serious work, you should look into installing JupyterLab and/or JupyterHub instead of Jupyter Notebook.
[This template for a student lab report](https://nbviewer.jupyter.org/gist/solarchemist/f9eb4f93518d16e0e6d3)
I co-authored way back when uses both `uncertainties` and `numpy` for its calculations, and may be of interest.
## Wrapper function redefining `lm()` to support units/errors
```{r, ref.label='lm-wrapper', echo=TRUE, eval=FALSE}
```
## sessionInfo()
```{r results='markup', cache=F}
sessionInfo()
```
## Read more on your own...
+ [Does R understand physical quantities?, Edzer Pebesma, R-spatial (2016)](https://www.r-spatial.org/r/2016/06/10/units.html)
+ [Measurement units in R now simplify, Edzer Pebesma, R-spatial (2016)](https://www.r-spatial.org/r/2016/08/16/units2.html)
+ [Automatic units in axis labels, Edzer Pebesma, R-spatial (2016)](https://www.r-spatial.org/r/2016/09/29/plot_units.html)
+ [Linear model where the data has uncertainty using R, Gimelist, StackExchange (2016)](https://stats.stackexchange.com/questions/235693/linear-model-where-the-data-has-uncertainty-using-r)
+ [Using RStudio Server in Windows WSL2, JooYoung Seo, RStudio Support (2020)](https://support.rstudio.com/hc/en-us/articles/360049776974-Using-RStudio-Server-in-Windows-WSL2)
+ [Using RStudio Server on the Windows 10 subsystem for Linux, Axel-Cleris Gailloty (2020)](https://agailloty.github.io/Rstudio-server-windows/)
+ [Enabling virtual machine control structure shadowing on a nested virtual machine with the Intel Xeon E5-2600 v3 product family, David L. Mulnix, Intel (2014)](https://software.intel.com/content/www/us/en/develop/blogs/enabling-virtual-machine-control-structure-shadowing-on-a-nested-virtual-machine.html)
+ [Upgrading RStudio Server, Ian Pylvainen, RStudio Support (2020)](https://support.rstudio.com/hc/en-us/articles/216079967-Upgrading-RStudio-Server)
+ [R installation and administration, R Core Team, CRAN (2020)](https://cran.r-project.org/doc/manuals/r-release/R-admin.html)
## References