Configuration

There are several requirements for nodes within RStudio clusters:

  1. All nodes must run the same version of RStudio Workbench.

  2. Server configurations (i.e. contents of the /etc/rstudio directory) must be identical, with the exception of options related to the address of each node (www-host-name in load-balancer, www-address and www-port in rserver.conf).

  3. User accounts must be accessible from each node and usernames and user ids must be identical on all nodes. The same applies for any groups used by RStudio users, and also to the rstudio service user account.

  4. The clocks on all nodes must be synchronized.

  5. User home directories must be accessible via shared storage (e.g. all nodes mounting the same NFS volume or Amazon EFS. See Using Amazon EFS with RStudio Team).

  6. An explicit server-wide shared storage path also must be defined. See the Shared Storage section for additional details.

  7. RStudio must be configured to use a PostgreSQL database, and an empty database must be present for RStudio to write important cross-node state. If you have previously run RStudio with a SQLite database, it is strongly advised that you execute the database Migration to the PostgreSQL database first. For more information, see Database.

  8. When using Launcher Sessions, see additional requirements under Launcher Considerations.

Defining The Cluster

Only one load balancing cluster can exist per database, and this cluster is defined by the first node that comes online within the cluster. The cluster data contains the hash of the secure cookie key and the communication protocol (http, https, or https no verify). When each node comes online, it verifies its own secure cookie key and protocol against the cluster’s data and will only come online if this data matches. There are two ways to reset the data stored in the cluster:

  1. Bring all nodes offline; then reconfigure each node. The first node that comes online will be able to update the cluster data.
  2. Manually reset the cluster by running rstudio-server reset-cluster from the command line. The next node that is started, restarted, or reloaded will update the cluster data.

To view the nodes and their current statuses in the load balancer cluster, run the command rstudio-server list-nodes. This will display all nodes in the cluster, ordered by whether or not they are online and the node’s ID. It shows the status, health, and last seen time for each node using the server’s local time zone. When a node is Online and healthy, it updates the last seen time every 5 minutes. A node status of “Missed check-in” is displayed when its status is Online but the last seen time has not been updated in 10 minutes. When a node’s status is listed as “Missed check-in”, other nodes will continue to try to send it requests. Failed requests to any node are routed to the next available one but this might introduce delays in processing. If you see the “Missed check-in” status, check to see if the node is running. If so, check the error logs to see if there are errors related to the database when updating the last seen time.

Defining Nodes

To define a cluster node, two configuration files need to be provided:

/etc/rstudio/load-balancer
/etc/rstudio/secure-cookie-key

The first of these defines the load balancing strategy and the node’s public-facing address. The second defines a shared key used for signing cookies (in single-node configurations this key is generated automatically, however with multiple nodes explicit coordination is required. The same secure-cookie-key value must be used on each node).

Each setting in the load balancing configuration file has a default value, so the file may be empty, but its presence is required to activate load balancing. Most users should at a minimum set www-host-name to indicate the address that other nodes in the cluster can reach this node at. Fallback methods for determining this address exist, but may not return the desired result for all configurations. The following ordered strategies are used to determine this address, with preference given to the first that is successful:

  1. Use the value www-host-name provided in the configuration file.

  2. Use the www-address defined in rserver.conf in combination with www-port or the default port.

  3. Use the first non-loopback, non-multicast IP address found by resolving the system’s hostname.

  4. Use a system call to determine the machine’s IP addresses and use the last v4, non-loopback, non-multicast address provided.

Most users will want to configure RStudio Workbench to use one of the first two approaches.

Note

If www-host-name is provided without a port, but a port has been set with www-port in rserver.conf, www-host-name will be used with the custom www-port as the load balancing address.

When load balancing is configured, during startup each node will query the internal database for information about the active cluster and nodes. If the relevant data doesn’t exist for a particular node, that node will insert it. It will then alert existing nodes of its presence and configuration.

For example, to use the www-host-name option to define a cluster with two nodes that load balances based on the number of actively running R sessions you could use the following configuration:

On the first node, which can be reached at server1.example.com:

# /etc/rstudio/load-balancer
balancer=sessions
www-host-name=server1.example.com

On the second node, which can be reached at server2.example.com:

# /etc/rstudio/load-balancer
balancer=sessions
www-host-name=server2.example.com
# /etc/rstudio/secure-cookie-key
a55e5dc0-d6ae-11e3-9334-000c29635f71

The secure cookie key file above is only an example; you need to generate your own unique key to share among the nodes in your cluster.

Note

Previous versions of RStudio Workbench required the host name of each node be included on every active node under a [nodes] title in lieu of the www-host-name field, and a [config] title prior to the balancing options. This configuration will continue to work, but it is no longer the preferred configuration method. It is highly recommended that you update your config files accordingly.

The following table lists the various configuration options that are available to be specified in the load-balancer configuration file:

Config Option Description Possible Values Default Value
www-host-name The hostname or IP address that other nodes in the cluster should use to communicate with this node. If not provided, the node defines its hostname by following steps 2-4 above.
balancer

The balancing method used by the cluster.

See Balancing Methods for details.

sessions

system-load

user-hash

custom

sessions
diagnostics

Enables detailed diagnostic logging for load balancing traffic and state.

See Diagnostics for details.

tmp

stderr

Not set, detailed diagnostics are not available.
timeout

Amount of time in seconds that a node will wait for a response.

See Node network instability for details.

A positive integer 10
verify-ssl-certs Whether to skip SSL certificate verification. Should never be set to 1 in production, except for troubleshooting purposes or if the connection between nodes is secured by other means. See SSL for details. 0 or 1 1

Key File Requirements

The following are the requirements for the secure cookie key file:

  • The key value must have a minimum length of 128 bits (16 bytes/characters). RStudio won’t start if the key is too weak.
  • The key file must have restrictive permissions (i.e. 0600) to protect its contents from other users.
  • The key file must be identical on all nodes in a load-balanced cluster, so that the nodes can communicate with each other.
  • The key must have a secret value that cannot be guessed. Randomly generating the value is recommended; see below for one mechanism for doing so.

Generating a Key

You can create a secure cookie key using the uuid utility as follows:

$ sudo sh -c "echo `uuid` > /etc/rstudio/secure-cookie-key"
$ sudo chmod 0600 /etc/rstudio/secure-cookie-key

This is the recommended method, but any mechanism that generates a unique, random value will work.

You do not need to generate a secure-cookie-key file on each server; generate it once, and copy it to each node along with the rest of the /etc/rstudio directory.

This secure cookie key will also be used for encrypting and decrypting the PostgreSQL database password, if applicable. See PostgreSQL Password Encryption for more details.

Key File Location

You may optionally change the path of the secure-cookie-key by changing the secure-cookie-key-file setting in rserver.conf, though it is not necessary. Changing the path in this manner is only recommended in very specific circumstances when running the launcher with both RStudio Workbench and Package Manager simultaneously. For example:

# /etc/rstudio/rserver.conf
secure-cookie-key-file=/mnt/rstudio/secure-cookie-key

In addition, an explicit server-wide shared storage path must be defined (this is used for inter-node synchronization). This path is defined in the /etc/rstudio/rserver.conf file. For example:

# /etc/rstudio/rserver.conf
server-shared-storage-path=/shared/rstudio-server/shared-storage

For convenience, this path will often be located on the same volume used for shared home directory storage (e.g. at path /home/rstudio-server/shared-storage).

Launcher Considerations

If you are running RStudio Workbench load balancing in addition to using Launcher sessions, you will need to ensure that the /etc/rstudio/launcher.pub and /etc/rstudio/launcher.pem files match on all Workbench nodes in the cluster. Failure to do so will prevent users from being able to connect to their sessions from Workbench nodes other than where their sessions were initiated.

For more information, see RStudio Workbench Integration.

Local Launcher Plugin

When the Job Launcher is configured to use Local sessions, RStudio Workbench chooses the server to start the session using the configured load balancing strategy.

Note

Note that currently VS Code and Jupyter sessions are not counted when balancing by sessions with Local Launcher Sessions.

Follow these additional steps to configure load balancing with Local Launcher Sessions:

  1. Ensure that each cluster node has a unique hostname.

  2. Set the scratch-path attribute in launcher.conf to the path of a network shared directory that’s writable by the rstudio-server user. This directory is used by different launcher instances to share cluster and job information.

  3. Ensure that individual servers in the cluster can reach each other by their configured ip-addresses. You can see the list of addresses used by the Job Launcher for one cluster member to route requests to sessions running on another by looking in the Local/jobs sub-directory of the scratch-path. As each launcher instance is started, it creates a sub-directory of the jobs directory using its hostname. Inside of that directory, it creates a file called addrs that contains the list of ip-addresses found that for that host. Make sure that at least one ip-address in that list can be used by other nodes in the cluster.

  4. Ensure the following ports are open to allow the exchange of session metadata, proxying requests between rservers, the launcher servers, and sessions started by the local launcher plugin:

  • The rserver port (configured in rserver.conf with www-port)
  • The launcher server port (configured in launcher.conf with the port value in the [server] section)
  • The port range defined by linux in the file: /proc/sys/net/ipv4/ip_local_port_range as used by sessions started by the launcher.
  1. If you have enabled SSL for the Job Launcher with launcher-use-ssl=1 in launcher.conf, make sure the SSL certificate for each launcher node uses the list of cluster ip-addresses in its Subject Alternative Names field. Currently the Job Launcher only uses ip-addresses to connect to each node in the cluster and so certificate verification will not work using hostnames.

For more information, see the Local Plugin section of the Job Launcher documentation.

File Locking

In order to synchronize the creation of sessions across multiple nodes RStudio Workbench uses a cross-node locking scheme. This scheme relies on the clocks on all nodes being synchronized. RStudio Workbench includes a locktester utility which you can use to verify that file locking is working correctly. To use the locktester you should login (e.g. via SSH or telnet) to at least two nodes using the same user account and then invoke the utility from both sessions as follows:

$ /usr/lib/rstudio-server/bin/locktester

The first node you execute the utility from should indicate the types of locks it was able to acquire, for example:

* Acquired advisory lock
* Acquired link-based lock

After the message is printed the process will pause so that it can retain the lock (you can cause it to release the lock by interrupting it e.g. via Ctrl+C).

The second and subsequent nodes you execute the utility will attempt to acquire the lock. A message will be printed to the console indicating which type of locks are supported, for example:

* Acquired advisory lock
* Unable to acquire link-based lock

Your filesystem appears to support link-based locks.

In this example, advisory locks are not supported (because both nodes were able to acquire an advisory lock), but link-based locks are. See Lock Configuration for more information on configuring lock types.

If you interrupt the first node (e.g. via Ctrl+C) the lock will be released and you can then acquire it from the other nodes.

If either of the following occurs then there is an issue with file locking capabilities (or configuration) that should be addressed prior to using load balancing:

  1. All nodes successfully acquire the file lock (i.e. more than one node can hold it concurrently).
  2. No nodes are able to acquire the file lock.

If either of the above conditions hold then RStudio won’t be able to correctly synchronize the creation of R sessions throughout the cluster (potentially resulting in duplicate sessions and lost data due to sessions overwriting each others state).

Lock Configuration

RStudio’s file locking scheme can be configured using a file at /etc/rstudio/file-locks. Valid entries are:

  • lock-type=[linkbased|advisory]
  • refresh-rate=[seconds]
  • timeout-interval=[seconds]
  • enable-logging=[0|1]
  • log-file=[path]

The default locking scheme, linkbased, uses a file locking scheme whereby locks are considered acquired when the process successfully hardlinks a dummy file to a location within the folder RStudio uses for client state (typically ~/.local/share/rstudio). This scheme is generally more robust with older network file systems, and the locks should survive temporary filesystem mounts / unmounts.

Note

If you are using EFS, the default lock type of link-based will not work. Instead, use the advisory lock type.

The timeout-interval and refresh-rate options can be used to configure how often the locks generated in the linkbased locking scheme are refreshed and reaped. By default, a process refreshes any locks it owns every 20 seconds, and scans for stale locks every 30 seconds. If an rsession process crashes, it can leave behind stale lock files; those lock files will be cleaned up after they expire by any newly-launched rsession processes.

advisory can be selected to use advisory file locks (using e.g. fcntl() or flock()). These locks are robust, but are not supported by all network file systems.

If you are having issues with file locking, you can set enable-logging=1, and set the log-file option to a path where output should be written. When logging is enabled, RStudio will report its attempts to acquire and release locks to the log file specified by log-file. When log-file is unset, log entries will be emitted to the system logfile, typically located at /var/log/messages or /var/lib/syslog.

Managing Nodes

Starting Up

When configuring each node, be sure to copy all of the configuration files from /etc/rstudio/ to each node. Then add the load-balancer configuration file with the optional www-host-name option which will be unique for each node. Assuming that the server is already installed and running on each node, you can then apply the load balancing configuration by restarting the server:

$ sudo rstudio-server restart

Current Status

Endpoint Status

Once the cluster is running you can inspect its state (which sessions are running where) using the load balancing status HTTP endpoint. For example, when running the server on the default port (8787):

$ curl http://localhost:8787/load-balancer/status

Note that the status endpoint is accessed using localhost rather than an external IP address. This is because this endpoint is IP restricted to only be accessible within the cluster, so needs to be accessed directly from one of the nodes.

The status endpoint will return output similar to the following:

192.168.55.101:8787  Load: 0.45, 0.66, 0.32
   12108 - jdoe
   12202 - kmccurdy

192.168.55.102:8787  Load: 1, 0.75, 0.31
   3404 - bdylan

192.168.55.103:8787 (unreachable)  Load: 0, 0, 0

192.168.55.104:8787 (offline)  Load: 0.033, 0.38, 0.24

This output will show all of the nodes in the cluster. Each node is indicated by its address and an optional status indicating whether the node is unreachable or offline. If the node does not indicate a status, then it is healthy and servicing requests. Following the node address is its CPU Load, indicated by three decimal values indicating the last known 1-minute, 5-minute, and 15-minute load averages, represented as a fraction of total CPU load. On subsequent output lines, each RStudio IDE session that is running on that particular node is listed along with its process ID and running user.

An unreachable node indicates an issue connecting to it via the network. In most cases, this indicates that the rstudio-server service is not running on the node and should be troubleshooted by viewing any startup issues in the system logs for that particular node (see Diagnostics if the service is running and healthy) and by checking the node’s database status by running the command rstudio-server list-nodes. An offline node is one that was specifically put into offline mode via the command sudo rstudio-server offline, which prevents new sessions from being started on the node.

Database Status

While the status endpoint retrieves information from each node via HTTP requests, each node also maintains its own Status field within the Postgres database. These statuses can be viewed in the output of the rstudio-server list-nodes command under the ‘Status’ column. The possible statuses are found in the following table.

Node Database Statuses

Status Description
Offline The node was properly configured and is currently Offline.
Starting The node is processing and validating its configuration; each node only spends a short amount of time in this state.
Deleting A request is being processed to delete this node or a request was received to delete this node but it could not complete. If the request could not complete, an error is logged in the system logs with more information.
Failed to resolve The node attempted to come online, but could not resolve its host name to an IP address. To manually specify the node’s host name, specify www-host-name in the load balancer configuration file.
Invalid secure cookie key The node attempted to come online, but its secure-cookie-key does not match the existing cluster’s key. To reset the cookie key, run the command rstudio-server reset-cluster.
Invalid permissions on secure cookie key The node attempted to come online, but its secure cookie key file has invalid permissions. The permissions must be set to 0600.
Missing secure cookie key The node attempted to come online, but it did not find a secure cookie key file. The default location of the key file is /etc/rstudio/secure-cookie-key or alternatively can be set by setting the secure-cookie-key-file in rserver.conf and launcher.conf.
Online The node is properly configured and online. A node may also have this status when it was not properly shutdown. When rstudio-server list-nodes is run and the node’s Last Seen column has not been updated in the last 10 minutes, this value will be displayed as “Missed check-in”.

Adding and Removing Nodes

To temporarily remove a node from the cluster you can simply stop it:

$ sudo rstudio-server stop

R sessions running on that node will be automatically moved to another active node. Note that only the session state is moved, not the running processes. The node will now appear in the list-nodes command with an offline status. To restore the node you can simply start it back up again:

$ sudo rstudio-server start

To add a new node, create the file /etc/rstudio/load-balancer. Leave it empty for default settings. When the rstudio-server is restarted, it will broadcast its arrival to the other online nodes in the cluster. They do not have to be restarted or reloaded. All nodes sharing a database will be part of the same cluster.

Reloading the load balancer configuration will also cause the rserver-http proxy configuration to be updated as well, which affects the RStudio’s running HTTP server. It is recommended that you do not make any other HTTP-related changes when updating the load balancer configuration unless you are aware of the potential side-effects!

The rstudio-server delete-node command can be used to permanently remove nodes from the database and cluster, but data loss can occur if this command is run for a node that is actively running sessions. To prevent this, the node should first be stopped. Alternatively, you can follow the instructions in Endpoint Status to view active sessions on the node and suspend them using any of the [Session Management] suspend commands. The delete-node command can also be used to remove entries from the database that do not represent any physical node in the cluster. These entries may exist with an invalid host or IP address and a status that is not “Online”, if an attempt was made to bring a node online before it was properly configured.

The node must be deleted from an active node and requires knowing the to-be-deleted node’s ID, which can be retrieved with the list-nodes command. For example, after starting the server with a typo in www-host-name, your commands may look like the following:

$ sudo rstudio-server list-nodes
Cluster
-------
Protocol
Http

Nodes
-----
ID  Host                          IPv4            Port    Status
1   rsw-primaryyy                                 8787    Failed to resolve
2   rsw-secondary                 123.456.78.100  8787    Online
3   rsw-primary                   123.456.78.101  8787    Online                     

$ sudo rstudio-server delete-node 1
Node 1 deleted.
Note

The output from the rstudio-server list-nodes command above was shortened to improve readability.

When the command is run, the node’s database status will shortly change to ‘Deleting’ and then the node will be removed from the database. The node that the delete-node command was ran from will broadcast a message to all other online nodes in the cluster that this node has been deleted. On receipt of this message, nodes will stop routing messages to the deleted node.

To permanently remove a node from the database, first stop rstudio server on that node. From an active node, retrieve the to-be-deleted node’s ID, then pass it to the delete-node command. For example, your commands may look like the following:

$ sudo rstudio-server list-nodes
Cluster
-------
Protocol
Http

Nodes
-----
ID  Host                          IPv4            Port    Status
1   rsw-primaryyy                                 8787    Failed to resolve
2   rsw-secondary                 123.456.78.100  8787    Online
3   rsw-primary                   123.456.78.101  8787    Online                     

$ sudo rstudio-server delete-node 1
Node 1 deleted.
Note

The output from the rstudio-server list-nodes command above was shortened to improve readability.

When the command is run, the node’s database status will shortly change to ‘Deleting’ and then the node will be removed from the database. All other nodes in the cluster will be notified that this node has been removed and stop routing messages to it.

Troubleshooting

If users are having difficulty accessing RStudio in a load balanced configuration it’s likely due to one of the load balancing requirements not being satisfied. This section describes several scenarios where a failure due to unsatisfied requirements might occur.

Node network instability

Some scenarios may causes RStudio to wait a long time for a node to respond due to network instability. You can limit how long is this waiting period with the timeout option, which is set to 10 seconds by default. This disable this timeout and use the system defaults, set it to zero.

# /etc/rstudio/load-balancer
[config]

balancer=sessions
timeout=5
...

SSL

If one of the nodes is temporarily using a self-signed or otherwise functional but invalid certificate the load balancer may fail to use that node. You can skip SSL certificate verification by disabling the option verify-ssl-certs, which is only applicable if connecting over HTTPS. For production use, you should always leave the default or have this set to true, but it can be disabled for testing purposes.

# /etc/rstudio/load-balancer
[config]

balancer=sessions
verify-ssl-certs=0
...

User Accounts Not Synchronized

One of the load balancing requirements is that user accounts must be accessible from each node and usernames and user ids must be identical on all nodes. If a user has the same username but different user ids on different nodes then permissions problems will result when the same user attempts to access shared storage using different user-ids.

You can determine the ID for a given username via the id command. For example:

$ id -u jsmith

NFS Volume Mounting Problems

If NFS volumes containing shared storage are unmounted during an RStudio session that session will become unreachable. Furthermore, unmounting can cause loss or corruption of file locks (see section below). If you are having problems related to accessing user directories then fully resetting the connections between RStudio nodes and NFS will often resolve them. To perform a full reset:

  1. Stop RStudio on all nodes (sudo rstudio-server stop).

  2. Fully unmount the NFS volume from all nodes.

  3. Remount the NFS volume on all nodes.

  4. Restart RStudio on all nodes (sudo rstudio-server start).

File Locking Problems

Shared user storage (e.g. NFS) must support file locking so that RStudio can synchronize access to sessions across the various nodes in the cluster. File locking will not work correctly if the clocks on all nodes in the cluster are not synchronized. This condition may be surfaced as 502 HTTP errors. You can verify that file locking is working correctly by following the instructions in the File Locking section above.

Diagnostics

To troubleshoot more complicated load balancing issues, RStudio can output detailed diagnostic information about internal load balancing traffic and state. You can enable this by using the diagnostics setting as follows:

[config]
diagnostics=tmp

Set this on every server in the cluster, and restart the servers to apply the change. This will write a file /tmp/rstudio-load-balancer-diagnostics on each server containing the diagnostic information.

The value stderr can be used in place of tmp to send diagnostics from the rserver process to standard error instead of a file on disk; this is useful if your RStudio Workbench instance runs non-daemonized.