Ansible role that installs R itself, R packages (from CRAN, archived and remote repos) and associated tools.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Taha Ahmed a9a7df755f Group name rstudio_server was changed in playbook hosts file. 3 months ago
defaults Install Hugo by default (don't need to specify `install-R` var) 4 months ago
docs Upgraded to R 4.1.3, installed on Ubuntu Jammy 4 months ago
meta Install Hugo by default (don't need to specify `install-R` var) 4 months ago
tasks Group name rstudio_server was changed in playbook hosts file. 3 months ago
templates ~/.Renviron was owned by root, causing warnings. Fixed. 3 months ago
vars ~/.Renviron was owned by root, causing warnings. Fixed. 3 months ago
.gitignore Reduced the dependencies needed for R to run on "guests" 1 year ago
README.md ~/.Renviron was owned by root, causing warnings. Fixed. 3 months ago

README.md

R

What does this Ansible role do:

  • builds R of specified version from source,
  • installs specified R packages (from CRAN, archived and remote repos),
  • installs Hugo of specified version and its dependencies.

Note that this role installs R along with all its associated tooling inside a single location on the filesystem (/opt/R/ by default). I have purposefully chosen to avoid the approach used by R installed from Debian repos, which installs components in many different places in the filesystem.

By compiling R itself from source (not just its packages) one certainly gains a certain degree of flexibility, but originally my impetus for doing so was only to make it possible to share a single R installation across computers by remotely mounting its directory. This worked well, and in combination with renv, made it possible to share multiple versions of R concurrently from a single server to multiple workstations over sshfs.

But, over time, I found myself mostly working on the server itself anyway, either via ssh in the terminal, or via RStudio server or similar web services, making the remotely mounted R installation less important.

Perhaps this remote mounting is still of value for the rstudio_server containers (in order to allow any projects using renv that still use older R versions to "simply work")

The Debian way of installing R is usually just fine, but I have found it is unworkable if you want to share the R installation across other computers (such as virtual machines on the same host, or even remote hosts).

Keeping the R installation centralised was the only way I found to make my approach of sharing a single R installation across several hosts work.

How to build R and install all packages (first time)

You need to set the following variables (reset them inside the role or on the command-line using --extra-vars):

ansible-playbook playbook-host.yml --ask-become-pass --ask-vault-pass \
--extra-vars '{"install_R": true, "R_version": 4.0.5, "hugo_version": 0.87.0}'

How to upgrade R itself (and Hugo)

You need to set the following variables:

--extra-vars '{"update_R": true, "R_version": 4.0.5}'

This will install all R packages for the new R version, and also install Hugo. Note that Hugo will not be upgraded unless you stepped forward the variable hugo_version (by adding it to extra-vars, for example).

How to update all installed R packages to the their latest available version

Note that this does not install any packages newly added to the *_packages variables, it only updates packages already installed on the system! You need to set the following variables:

--extra-vars '{"update_packages_R": true}'

How to reinstall all R packages

This reinstalls all R packages defined in this role in place, without upgrading. This can be useful if some packages suddenly break, disappear, etc.

--extra-vars '{"reinstall_packages_R": true}'

Stop TinyTeX from auto-updating TeXLive

This role sets options(tinytex.install_packages = FALSE) to stop TinyTeX from auto-updating R packages.

https://tex.stackexchange.com/questions/575230/what-could-be-causing-texlive-packages-to-update-without-explicit-user-intervent

Expected failure modes of this role

The tasks in this role use fairly complicated logic, in particular the tasks responsible for installing R packages, which look like this (simplified code):

- ansible.builtin.command: >
    Rscript -e
    "if (! ('{{ item }}' %in% installed.packages()[, 'Package'])) {
      install.packages(pkgs='{{ item }}');
      print('Added {{ item }}');
    } else {
      print('Already installed {{ item }}');
    }"
  register: r_cran_package
  failed_when: >
    r_cran_package.rc != 0 or
    'had non-zero exit status' in r_cran_package.stderr or
    'had non-zero exit status' in r_cran_package.stdout
  changed_when: "'Added' in r_cran_package.stdout"
  loop: "{{ R_CRAN_packages }}"

It is important to realise that this loop never breaks, even if a package fails to install and is correctly reported as failed by our failed_when logic. This is simply the way Ansible handles loops - they run until their end, and if any iteration reported as fail, the entire task is marked as failed and playbook execution is halted.

For our purposes, this is both good and bad.

It is bad because it is not easy to identify which package failed, because the output from the task is very long (thousands of lines). In the terminal output, there is at least colour highlighting (failed loop item will have red text) but not so in the log output.

Adding some sort of break functionality to these tasks would therefore be of great value. As soon as an item fails, stop executing the rest of the loop, and mark the task as failed. Unfortunately, this is not something I have been able to hack together yet.

This functionality has been requested, and a PR exists, but nothing merged yet:

Around the web, a few posts can be found, but they seem geared towards solving the simpler problem of skipping an item in the loop based on some conditional (easily achieved by putting a when-statement into it):

It is good because if only a single or only a few packages fail to install, at least all other packages are installed in one go without halting the package installation process. Although the rest of the playbook won't execute until it is rerun, the most time-consuming step of package installation need not be rerun.

So before you attempt to rewrite the logic of these tasks, I suggest you consider how you would like those tasks (and by extension, this role) to ideally behave.