Using R in a Space

To use the R programming language in a Space you must

  1. Active a space session via your secure desktop

  2. Select the RStudio tool icon

The primary development tool for R is RStudio which provides a core set of capabilities of :

  • Interactive R development environment - including variables explorer

  • Inline browser based outputs - including visulalisations

  • A common, Space wide collaboration area for the management of user defined code, scripts, notebooks and artefacts

  • Code auto-complete assistant

  • Integrated package management

Once you have clicked on the RStudio icon the RStudio workbench will be loaded. This workbench provides tile based access to all of the major RStudio features, including a script editing pane, interactive R console, diagnostic outputs and R session properties. These tiles can be maximised, minimised and re-sized as preferable by the user.

Spaces persistent home area

RStudio sessions within Spaces are automatically configured to run against a directory called spaces_persistent_home - this directory provides a persistent store for all user generated artefacts that is shared between all collaborators in a Space and across Space sessions.

It is recommended to use only this directory (or folders within it) when using RStudio in Spaces so that your work is persisted across different sessions of your Space. Additionally this directory :

  • Provides the default location for saving a new document

  • Provide access to guidance, tutorial and use case specific analytical resources provided by the platform host. These will often be provided within structured folders that are accessible from the home area and provide generic guidance for tool usage, as well as data product and use case specific walk-throughs that support the rapid generation of insights and outcomes.

Accessing Data Products - SparkR

The SparkR package is utilised to access data products within RStudio - allowing for the R language to be used against a pre-configured Spark environment that facilitates the usage of distributed computing approaches for highly performant data exploration, analysis and model development workloads.

SparkR can be leveraged within RStudio by adding the following lines to your R scripts :

# Set Spark Configuration Sys.setenv(SPARK_HOME="/usr/lib/spark") # Load the SparkR library library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"))) # Start a SparkR session with an 8Gb Driver (local session) sparkR.session(sparkConfig = list(spark.driver.memory = "8g")) # Spark operations can create noisy console outputs # This command will reduce console output to error messages only setLogLevel("ERROR")

These commands will create a Spark session that exposes the Spaces Metastore so that you can interact with data product databases and tables, as well as the collaborate_db and publish_db databases using SparkR methods.

Detailed walk-throughs on the utilisation of these SparkR methods are provided via in platform guidance materials

Installing R Packages

The R environments are provisioned with a core set of packages that can be seen by inspecting the ‘System Library’ listing within the RStudio Packages pane :

Additional packages may also have been made available via a secured CRAN instance. If such an instance is available additional packages can be installed by either :

  • The RStudio Packages UI pane, by clicking the 'Install' button and providing the package name in the pop-up modal

  • The RStudio console with the command : install.packages(“<package_name>”)

Some R packages required additional OS level dependencies. If you attempt to install a package and it fails due to missing dependencies, contact your platform administrator with the details of your required package for remediation.


References and FAQs

RStudio

Asset from Space

Assets from Tasks

About the Tools in Spaces