High Availability & Load Balancing¶
Multiple instances of RStudio Connect can share the same data in highly available (HA) and load-balanced configurations. We refer to these configurations as "HA" for brevity.
HA Checklist¶
Follow the checklist below to configure multiple RStudio Connect instances for HA:
-
Install and Configure the same version of RStudio Connect on each node.
-
Migrate to a PostgreSQL database (if running SQLite). All nodes in the cluster must use the same PostgreSQL database.
-
Configure each server's
Server.DataDir
to point to the same shared location; see Variable Data and Shared Data Directory Requirements. -
If the
Database.Dir
setting has been customized, ensure that it points to a consistent, shared location on each server; see Variable Data and Shared Data Directory Requirements. -
Configure each server's
Server.LandingDir
to point to the same shared location (if using a custom landing page); see Using a Custom Landing Page and Shared Data Directory Requirements. -
Configure each server's
Metrics.DataPath
directory to point to a unique-per-server location; see the Metrics configuration appendix. Alternatively, you may also wish to consider using Graphite to write all metrics to a single location; see Metrics Requirements. -
Configure your load balancer to route traffic to your RStudio Connect nodes with sticky sessions; see
rsconnect
Cookie Support.
HA Limitations¶
Clock Synchronization¶
All nodes in an RStudio Connect HA Configuration MUST have their clocks
synchronized, preferably using ntp
. Failure to synchronize system clocks
between nodes can lead to undefined behavior, including loss of data.
Consistent Users and Groups¶
RStudio Connect executes your content using one or more target accounts. The
rstudio-connect
user and its primary group (also named rstudio-connect
)
are created when RStudio Connect is installed. The rstudio-connect
user
serves as the default for
Applications.RunAs
and is
used when executing content.
You must ensure consistent UID/GID for all Unix user accounts that may be used to execute your deployed content.
The id
command is one way to check the user and group identifiers for a
single Unix username. Your systems administrator can probably help configure
accounts in a uniform way across your cluster of hosts.
id rstudio-connect
# => uid=998(rstudio-connect) gid=998(rstudio-connect) groups=998(rstudio-connect)
Node Management¶
RStudio Connect nodes in a HA configuration are not self-aware of HA. The load-balancing responsibility is fully assumed by your load balancer, and the load balancer is responsible for directing requests to specific nodes and checking whether nodes are available to accept requests.
Database Requirements¶
RStudio Connect only supports HA when using a PostgreSQL database. If you are using SQLite, please switch to PostgreSQL. See the Changing Database Provider section.
Shared Data Directory Requirements¶
RStudio Connect manages uploaded content within the server's data directory.
This data directory must be a shared location. The
Server.DataDir
configuration on
each node must point to the same shared location. See the Variable
Data section for more information on the
server's data directory. We recommend and support NFS version 3 for file
sharing.
If you configure Database.Dir
(not
required), this also must point to the same shared location.
Metrics Requirements¶
By default, RStudio Connect writes metrics to a set of RRD files. We do not
support metrics aggregation, and each server must maintain a separate set of
RRD files to avoid conflicts. The Connect dashboard for a specific node will
only show metrics for that node. See the
Metrics configuration appendix for
information on configuring a unique
Metrics.DataPath
for each
server.
RStudio Connect includes optional support for writing metrics to Graphite. If you wish to aggregate metrics, consider using Graphite or any monitoring tool compatible with Carbon protocol. See Historical Metrics for more information.
Shiny Applications¶
Shiny applications depend on a persistent connection to a single server. Please configure your load-balancer to use cookie-based sticky sessions to ensure that Shiny applications function properly when using HA.
rsconnect
Cookie Support¶
For cookie-based sticky session support, you will need to ensure that your R
clients (including the RStudio IDE) use
rsconnect
version 0.8.3 or later.
Versions of rsconnect
prior to 0.8.3 did not include support for cookies.
Updating HA Nodes¶
When applying updates to the RStudio Connect nodes in your HA configuration, you should follow these steps to avoid errors due to an inconsistent database schema:
-
Stop all RStudio Connect nodes in your cluster.
-
Follow the Upgrading instructions to upgrade one RStudio Connect node. The first update will upgrade the database schema (if necessary) and start RStudio Connect on that instance.
-
Upgrade the remaining nodes using the same Upgrading instructions.
If you forget to stop any RStudio Connect nodes while upgrading another node, these nodes will be using a binary that expects an earlier schema version, and will be subject to unexpected and potentially serious errors. These nodes will detect an out-of-date database schema within 30 seconds and shut down automatically.
Downgrading¶
If you wish to move from an HA environment to a single-node environment, please follow these steps:
-
Stop all Connect services on all nodes
-
Reconfigure your network to route traffic directly to one of the nodes, unless you wish to continue using a load balancer.
-
If you wish to move all shared file data to the node, then
-
Configure the server's
Server.DataDir
to point to a location on the node, and copy all the data from the NFS share to this location; see Variable Data. -
If using a custom landing page, configure
Server.LandingDir
to point to a location on the node, and copy the custom landing page data from the NFS share to this location; see Using a Custom Landing Page. -
Configure the server's
Metrics.DataPath
directory to point to an appropriate location. If necessary, copy the data from the NFS share to this location; see Metrics Requirements.
-
-
If you wish to move the database to this node, install PostgreSQL on the node and copy the data. Moving the PostgreSQL database from one server to another is beyond the scope of this guide. Please note that we do not support migrating from PostgreSQL back to SQLite.
-
Start the Connect process; see Stopping and Starting
Backups, Snapshots, and Data Integrity¶
Note
Do not attempt to restore snapshots taken while RStudio Connect is running.
Incremental snapshots such as these may be useful for security auditing
purposes. Additionally, the bundles
directory has compressed archives that could be redeployed from the RStudio IDE if necessary.
Quick Backup Steps¶
-
Stop all nodes of the RStudio Connect server
# Executed on EVERY node running RStudio Connect sudo systemctl stop rstudio-connect
-
Dump RStudio Connect's postgres database to a file:
# Executed on ONLY the Postgres server by a user with database access # Change `-U connect` to the user that owns connect's database. pgdump -U connect connect > connect-backup.sql
-
Copy or Archive the data directory to storage
Or# Executed on ONLY one server, assumes you have a backup server at /mnt/backup-server cp -rp /var/lib/rstudio-connect /mnt/backup-server/rstudio-connect-backup
# Executed on ONLY one server tar cvpzf /mnt/backup-server/rstudio-connect-backup.tar.gz /var/lib/rstudio-connect
-
Restart the RStudio Connect server once the postgres dump and data backup are complete.
# Executed on EVERY node running RStudio Connect sudo systemctl start rstudio-connect
Quick Restore Steps¶
Note
Quick restore is intended for a "rollback" of an existing set of RStudio Connect servers. If you have lost your entire RStudio Connect host, you will need to follow the instructions to reinstall RStudio Connect and its dependencies before following the quick restore steps.
-
Ensure that RStudio Connect is completely shut down
# Executed on EVERY node running RStudio Connect sudo systemctl stop rstudio-connect
-
Restore the RStudio Connect database
# Executed on ONLY the Postgres server by a user with database access # Change `-U connect` to the user that owns connect's database. psql -U connect -d connect -1 connect-backup.sql
-
Restore the data directory files
Or# Executed on ONLY one server, assumes you have a backup server at /mnt/backup-server cp -rp /mnt/backup-server/rstudio-connect-backup /var/lib/rstudio-connect
# Executed on ONLY one server tar xvpzf /mnt/backup-server/rstudio-connect-backup.tar.gz -C /var/lib/rstudio-connect
-
Restart the RStudio Connect server once the restores are complete
# Executed on EVERY node running RStudio Connect sudo systemctl start rstudio-connect
Detailed Explanation of Backup & Restore¶
RStudio Connect has two data sources that should be consistent with each other for normal operation: the data directory, and the database.
In a single-node configuration, the data directory is usually some directory on the host node and the database is a SQLite file also located on the host node.
In a HA, multiple-node configuration, the data directory is generally located on an NFS share, and the database is a postgres database, often hosted elsewhere.
When backing up a single node, it is sufficient to stop the node, copy the data directory and database file, and then restart the node.
For a multiple node configuration, you can also stop all nodes, copy the data
directory, dump the database using pg_dump
, and then restart all the nodes
again. This is the only way to guarantee that your RStudio Connect environment
can be restored without any inconsistency.
In order to address concerns about day-to-day data loss, some users may wish to take incremental snapshots of the database and data directory, using the same method, while Connect is running. These are not guaranteed to be consistent and you should not attempt to restore an RStudio Connect installation with them. Notably:
-
Depending on the timing between when both snapshots are taken, the database may have records that aren't reflected in your data directory snapshot, or vice versa.
-
Because the networked filesystem may lock certain files that are currently in use, the snapshot process may finish much later than it began, leading to a snapshot that isn't consistent with any state.
- Example: Imagine app X v0.1 was copied successfully, and app Y is locked on NFS. While the backup script waits for app Y to unlock, app X v0.2 is uploaded. Then app Z is created. The snapshot would reflect apps X v0.1, Y, and Z, despite that the only consistent states were "app X v0.1, Y" and "app X v0.2, Y, Z"
-
Copying your data directory and snapshotting your database will create contention on both the networked filesystem and the database, which could cause service slowness or interruption for users. Consider warning users and/or performing incremental snapshots during times of low user activity.
Recognizing that you will not be able to restore the entire backup, it may be
useful to only snapshot those parts of the data directory or database that you
want to retain. For example, you could use rsync
to find new bundles added
since the last backup was perfomed, and copy them to a snapshot directory.
HA Details¶
Concurrent Scheduled Document Rendering¶
The
Applications.ScheduleConcurrency
configuration setting specifies the number of scheduled jobs that can run
concurrently on a host. By default, two scheduled jobs can run simultaneously.
Hosts with substantial processing power can increase this setting. This can be helpful if your environment has many long-running reports.
A particular host can disable processing of scheduled jobs by setting
Applications.ScheduleConcurrency
to zero.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
ScheduleConcurrency = 0
Note
No schedule job will execute if every host sets Applications.ScheduleConcurrency
to zero.
The Applications.ScheduleConcurrency
setting does not affect ad-hoc
rendering requests, hosted APIs, or Shiny applications.
Concurrent Shiny Applications and Ad-Hoc Rendering¶
Each process associated with Shiny applications, hosted APIs, ad-hoc rendering requests, and bundle deployments runs on the server where the request was initiated. We depend on your load balancer to distribute these requests to an appropriate Connect node. The minimum and maximum process limits for Shiny applications are enforced per server. For example, if a Shiny application allows a maximum of 10 processes, a maximum of 10 process per server will be enforced. See the Scheduler configuration appendix for more information.
Polling¶
RStudio Connect nodes poll the data directory for new scheduled jobs:
-
Every 5 seconds, and
-
After every completed scheduled job.
Abandoned Processes¶
While processing a scheduled job, the RStudio Connect node periodically updates the job's metadata in the database with a "heartbeat". If the node goes offline and the "heartbeat" ceases, another node will eventually claim the abandoned job and run it again. Hence, if a server goes offline or the Connect process gets shut down while a scheduled report is running, it is possible that the scheduled job could run twice.
Abandoned Shiny Applications¶
A Shiny applications depends on a persistent connection to a single server. If the server associated with a particular Shiny application session goes down, the Shiny application will fail. However, simply refreshing the application should result in a new session on an available server, assuming your load balancer detects the failed node and points you to a working one.
Shiny applications that support client-side reconnects using the
session$allowReconnect(TRUE)
feature will automatically reconnect the Shiny
application to a working node. See
https://shiny.rstudio.com/articles/reconnecting.html