Process Management#
RStudio Connect launches R and Python to perform a variety of tasks. This includes:
- Installing R packages
- Rendering R Markdown documents
- Running Shiny, Dash, Streamlit, or Bokeh Applications
- Customizing a parameterized R Markdown document
- Running APIs using Plumber or Python (Flask)
- Running TensorFlow Model APIs
The location of R (and optionally, Python) defaults to whatever is in the
path. Customize the
Server.RVersion
setting to use
a specific R installation. See the R and Python
sections for details.
Sandboxing#
The RStudio Connect process runs as the root
user. It needs escalated
privileges to allow binding to protected ports and to create "unshare"
environments where content processes are run.
RStudio Connect runs its processes as an unprivileged user; both a system default and content-specific overrides are supported. See the User Account for Processes section for details.
The "unshare" environment created for process execution involves first
establishing a number of bind mounts and then switching to the target
unprivileged user. RStudio Connect uses unshare
to alter the execution
context available to processes. Within this newly established environment, a
number of mount
calls are made in order to hide or isolate parts of the
filesystem.
Documentation for the system calls used to create an RStudio Connect sandbox:
unshare
: http://man7.org/linux/man-pages/man2/unshare.2.htmlmount
: http://man7.org/linux/man-pages/man2/mount.2.html
Your local system man pages document the behavior of unshare
and mount
on
that system.
Note
If you are running RStudio Connect within a Docker container, that container must be started with additional privileges. The Docker section of the Server Management chapter discusses privileged containers and the capabilities needed by RStudio Connect.
The following locations are masked during R execution:
-
The
Server.DataDir
directory containing all variable data used by RStudio Connect. -
The
SQLite.Dir
directory, which can optionally be placed outside the data directory. -
Configuration directories, including
/etc/rstudio-connect
. -
The
Server.TempDir
directory contains aconnect-workspaces
sub-directory with per-process temporary directories.
The following information is exposed during R execution:
-
Data directories containing installed packages and environments. The exact set of directories will vary depending on the type of content.
-
The directory containing the deployed and unpackaged R or Python code.
-
The document rendering destination directory (only for R Markdown documents and Jupyter notebooks).
-
A per-process temporary directory specified in the
TMPDIR
environment variable of the process. This temporary directory is created underServer.TempDir/connect-workspaces
.
When
Applications.HomeMounting
is enabled, the contents of /home
are masked by an additional bind mount as
follows:
- The contents of
/home
are masked by the home directory of theRunAs
user. - If the
RunAs
does not have a home directory, an empty directory masks/home
.
The path to the home directory is always available through the HOME
environment variable. With
Applications.HomeMounting
,
the mounted path to the HOME
directory is subject to change. Avoid
hard-coding paths to either /home
and /home/username
.
Interactive applications like Shiny, Plumber, Dash, and Flask have write access to the directory containing the unpackaged code. This directory is the working directory when launching an application. Data written into this directory is visible to all processes associated with that application not to processes associated with other content. Application directory data remains available until that application is next deployed to RStudio Connect. A deployment creates a new application directory containing only the deployed content.
Note
RStudio Connect may launch multiple processes to service requests for an application. There is no coordination between these processes. Applications that write to local files could experience problems when different processes attempt to write to a single file.
For example, two different processes writing to the same file may see output incorrectly interleaved or even overwritten.
We do not recommend using the file system for data persistence.
Batch-executed content like R Markdown documents and Jupyter notebooks have write access to a directory containing the unpackaged code and a separate output directory that receives the rendered result. A new directory containing the deployed source code is created each time the content is executed. This copy of the code ensures that simultaneous rendering processes are isolated from each other and cannot overwrite each other's output files. The temporary source directory is the working directory when rendering the content. A distinct output directory is used each time the content is rendered. Data created during one rendering is not visible to another.
R Markdown multi-document sites have a slightly different rendering pipeline
than standalone documents. RStudio Connect uses the rmarkdown::render_site
function, which does its rendering in-place. The content from the source
directory is copied into the rendering destination directory in preparation
for rendering. Site rendering has write access to the destination directory.
Access to the original source directory is not provided because the source
content is duplicated in the destination directory
The rmarkdown::render_site
call usually places its output into a
subdirectory (typically, '_site'). The contents of this output subdirectory
will be moved to the root of the rendering destination directory, replacing
any other content. No post-rendering file movement occurs if
rmarkdown::render_site
is instructed to render into the current directory
instead of a subdirectory. This means that both source and output files will
be available for serving.
Note
We recommend against configuring rmarkdown::render_site
to write its
output into the current directory. Rendering the site into a subdirectory
(the default) allows RStudio Connect to remove source from the output directory.
RStudio Connect serves rendered content from the document output directory. This content remains available until a subsequent rendering is successful and activated (if requested). Neither incomplete nor unsuccessful document renderings affect the availability of previously rendered content.
Temporary Directory#
Each process started by RStudio Connect is given its own unique temporary
directory. These directories are created under
Server.TempDir
/connect-workspaces
.
The default value for
Server.TempDir
is obtained by
first checking the TMPDIR
environment variable for a path and falls back to
/tmp
otherwise.
You may wish to override Server.TempDir
if the default temporary directory has
too little space or is mounted with the noexec
option.
Note
If you do override Server.TempDir
please ensure the location can be
reached by, read from, and written to by any user on the system. On most
systems, temporary directories typically have permissions of 1777
.
You can learn more about the noexec
option
here.
Applications & APIs#
RStudio Connect manages both batch-oriented and long-lived processes. Batch-oriented process tend to be narrowly scoped and short-lived, while processes for applications built with web frameworks such as Shiny, Plumber, Flask, Dash, or Streamlit may see a process handle many requests for many users over their lifetimes.
RStudio Connect launches a process tied to a live application when the first request arrives for that application. That process will continue to service requests until it becomes idle and is eventually terminated. If there is sufficient traffic against that application, RStudio Connect may launch additional processes to service those requests.
There are a number of configuration parameters which control the conditions under which processes for applications are launched and eventually reaped. The default values are appropriate for most applications but occasionally need customization in specialized environments. The Scheduler configuration appendix explains each of the options.
We recommend that adjustment to these runtime properties be done gradually.
TensorFlow Model APIs#
TensorFlow Model APIs have a similar lifecycle to Plumber or Flask APIs. They are live processes that handle user requests on demand. Per-process and global scheduler settings may still apply to TensorFlow Model API processes.
TensorFlow Model APIs may also be run within a supervisor script if one is provided. The API server requires access to the app's content directory and to shared object files and their dependencies.
User Account for Processes#
RStudio Connect executes your content with an unprivileged Unix account. The
Applications.RunAs
setting
tells RStudio Connect which account to use. The rstudio-connect
account is
created during installation and used as the default value for
Applications.RunAs
.
The root
account never executes deployed user code.
Administrators can configure some pieces of content to be executed by a
different Unix account than the Applications.RunAs
default. This setting is
found in the Access tab when editing content settings. Non-administrators
are prohibited from changing the RunAs
setting.
Each Unix account used as a custom RunAs
must be a member of the Unix group
Applications.SharedRunAsUnixGroup
. This group membership requirement
always applies, even when Applications.RunAs
does not use the default rstudio-connect
user.
The rstudio-connect
user has a primary group also named rstudio-connect
.
Example
Let's customize the Unix RunAs
user and SharedRunAsUnixGroup
to allow alternate Unix accounts
for specific pieces of content.
We want to use the ds-system
Unix account as our default RunAs
user and
the data-scientists
Unix group as our shared group.
The following configuration tells RStudio Connect to use ds-system
:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAs = "ds-system"
SharedRunAsUnixGroup = "data-scientists"
Other Unix accounts that belong to the data-scientists
group can be used as RunAs
overrides. For example, the Unix account hadley
must be a member of the data-scientists
group before it can be used to run your Shiny application.
Package installation always happens as the Application.RunAs
user. An
application or R Markdown report may override its RunAs
setting; this alters
how the deployed code is executed and does not impact package installation.
See the Sandboxing section for more information about process
sandboxing.
The RunAs
Unix account does not need to be associated with an RStudio
Connect user account. Most installations use a small number of shared Unix
accounts. Some configurations (e.g. PAM
authentication) pair RStudio Connect user accounts
with Unix accounts, but this is not required.
Licensed user limits are enforced against RStudio Connect user accounts, not the Unix accounts used to run content.
Current user execution#
RStudio Connect can use a local Unix account associated with the currently logged-in user when executing applications. This works for Shiny apps, Shiny documents, and Python Dash, Streamlit, and Bokeh apps. This feature requires that user authentication use PAM.
Info
See Authentication Integration with PAM for information about using PAM for user authentication.
The
Applications.RunAsCurrentUser
property specifies that content can be configured to execute as the currently
logged-in user.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAsCurrentUser = true
Administrators can now customize the RunAs
settings to permit current-user
execution on a content-specific level. The Access content setting tab offers
the option of executing using "The Unix account of the current user".
Content accessed anonymously will execute as the specified fallback RunAs
user.
Info
See the User Account for Processes section for more information about RunAs
customization.
Content execution settings are not altered when
Applications.RunAsCurrentUser
is enabled. The
Applications.RunAsCurrentUser
setting permits current-user execution but
by itself does not change how processes are launched. Each Shiny application
or Shiny document must explicitly request current-user execution.
All Unix accounts used to execute R must be members of the Unix group defined
by Applications.SharedRunAsUnixGroup
.
Applications are not permitted to launch if the Unix account associated with
the logged-in user does not have the proper group membership.
Note
The Applications.RunAs
setting uses the rstudio-connect
user by default.
This user has a primary group also named rstudio-connect
. Any Unix account
that may be used to execute applications or R Markdown reports must be a member of the rstudio-connect
group.
PAM sessions#
RStudio Connect can leverage PAM (Pluggable Authentication Modules for Linux) to establish the environment and resources available for R sessions when Authentication Integration with PAM has been configured.
PAM sessions are enabled with the
PAM.UseSession
setting.
; /etc/rstudio-connect/rstudio-connect.gcfg
[PAM]
UseSession = true
The default PAM service name used for PAM sessions is su
. This gives RStudio
Connect the ability to launch processes as the specified user without
requiring a password.
You can customize the PAM service name used for PAM sessions by customizing
the PAM.SessionService
setting.
; /etc/rstudio-connect/rstudio-connect.gcfg
[PAM]
SessionService = rstudio-connect-session
The PAM.SessionService
must contain the PAM directive that enables
authentication with root privileges. Otherwise, processes will not run and
will return error code 70.
# Allows root to su without passwords (required)
auth sufficient pam_rootok.so
PAM Credential Caching (Kerberos)#
Note
RStudio Connect's PAM cache is encrypted and is not stored on disk. The credentials must expire after a certain period of time.
RStudio Connect can be configured to securely cache a user's PAM credentials when they log in to RStudio Connect. This enables RStudio Connect to let users run R processes as their current UNIX account when the PAM profile requires a user's credentials, such as when using Kerberos.
The following config settings are required for credential caching to be enabled:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAsCurrentUser = true
[PAM]
; Enable PAM sessions
UseSession = true
; Forward the current user's password into the PAM session
ForwardPassword = true
; Cache passwords for 12 hours after login
PasswordLifetime = 12h
; PAM service that accepts credentials ("su" is the default)
AuthenticatedSessionService = YOUR_PAM_SERVICE_HERE
Replace 12h
with the amount of time you would like credentials to be cached.
Credential lifetime is counted from the moment the user logs into RStudio
Connect. It is not tied to the user's web session, except that logging in again
will restart the timer for that user's credentials.
The
PAM.AuthenticatedSessionService
setting is similar to
PAM.SessionService
, except
that it should accept user credentials and validate them. For example, a PAM
service that uses the host's Kerberos configuration to expose functionality
could be:
auth required pam_krb5.so
account [default=bad success=ok user_unknown=ignore] pam_krb5.so
password sufficient pam_krb5.so use_authtok
session requisite pam_krb5.so
Some distributions (such as Red Hat Enterprise Linux 8) do not support the
use of pam_krb5.so
. It is recommended to use pam_sss.so
instead, and
configure SSSD to provide Kerberos authentication.
Path Rewriting#
The sandboxing used by RStudio Connect involves bind mounts which map physical locations on disk onto different directory structures at runtime. Paths used by your R code use these sandboxed locations. If you need to find the physical file on disk, you will need to undo the path transformation.
This section gives some examples of path rewriting and offer some ways of finding the file you need.
Let's start with an app.R
file that describes a Shiny application. This file
will be in the apps/XX/YY/
directory underneath the
Server.DataDir
location. The
XX
and YY
path components correspond to the application ID and bundle (or
deployment) ID for this version of your application. This directory is
available at runtime as /opt/rstudio-connect/mnt/app/
.
The directory structure of /opt/rstudio-connect/mnt/
is just a number of
empty directories. The "unshare" environment created during sandboxing allows
RStudio Connect to associate different application directories with these
mount directories.
Here are some common path transformations that may be helpful. All of the
physical paths are beneath the
Server.DataDir
hierarchy that
defaults to /var/lib/rstudio-connect
. All of the sandbox paths are beneath
the mount directory /opt/rstudio-connect/mnt/
. This location is not
customizable.
Physical path | Sandbox Path |
---|---|
DataDir/apps/XX/YY/ |
MountDir/app/ (non-renders) |
DataDir/reports/v2/XX/YY/temp.render.TT |
MountDir/app/ (renders) |
DataDir/reports/v2/XX/VV/RR |
MountDir/report/ |
DataDir/R |
MountDir/R |
DataDir/packrat |
MountDir/packrat |
Here are some actual path transformations using the default Server.DataDir
location:
# A source Shiny application
/var/lib/rstudio-connect/apps/4/7/app.R
=> /opt/rstudio-connect/mnt/app/app.R
# A source Plumber API
/var/lib/rstudio-connect/apps/38/10/plumber.R
=> /opt/rstudio-connect/mnt/app/plumber.R
# A source R Markdown document
/var/lib/rstudio-connect/reports/v2/8/12/temp.render.639085504/index.Rmd
=> /opt/rstudio-connect/mnt/app/index.Rmd
# An HTML document rendered from that R Markdown document
/var/lib/rstudio-connect/reports/v2/8/2/17/index.html
=> /opt/rstudio-connect/mnt/report/index.html
# A staticly deployed document
/var/lib/rstudio-connect/apps/17/21/index.html
=> /opt/rstudio-connect/mnt/app/index.html
# The Shiny package inside the packrat cache
/var/lib/rstudio-connect/packrat/3.2.5/v2/library/shiny/
28d6903a44dc53bd4823fa43ccdc08e5/shiny
=> /opt/rstudio-connect/mnt/packrat/3.2.5/v2/library/shiny/
28d6903a44dc53bd4823fa43ccdc08e5/shiny
Program Supervisors#
You may need to modify the environment or resources available to processes
before the processes are launched. This can be accomplished using a program
supervisor using the
Applications.Supervisor
configuration setting.
The supervisor command is provided the full target command-line, usually R or Python, which MUST be invoked by the supervisor. The process exit code from the target command MUST be returned as the exit code of the supervisor. The file descriptors for standard input, output, and error MUST NOT be intercepted by the supervisor.
The supervisor command or script must be executable by any users that may
perform package installation or run content (see next paragraph.) It must not
be located in a directory that will be masked as described in the
Sandboxing section. (If you are unsure where to put your
supervisor script, /usr/local/bin/
is a safe location.) If the command is not
executable, is in a disallowed directory, or does not execute its target
command-line properly, RStudio Connect will log an error and fail to start.
A supervisor is executed as the appropriate RunAs
user. Package installation
always uses the
Applications.RunAs
user.
Other processes will use the content-specific RunAs
account, falling back to
Applications.RunAs
if no override was configured. See the User Account for
Processes section for details.
Supervisors run within the sandbox established for any process. See the Sandboxing section for more information about process sandboxes.
RStudio Connect configures the TMPDIR
and HOME
environment variables for
launched processes. RStudio Connect also manages package installation and
references. Avoid altering any of this behavior in program supervisors.
Important
Supervisor scripts must echo all informational messages to standard error to prevent RStudio Connect from processing them.
RSTUDIO_PANDOC#
You can customize the RSTUDIO_PANDOC
environment in a supervisor script or
with a content-specific environment variable.
If unset, the RSTUDIO_PANDOC
environment variable is automatically
configured as R starts. The rmarkdown
package uses this environment variable
to discover Pandoc binaries.
rmarkdown
versions < 1.9 use Pandoc 1. rmarkdown
versions >= 1.9 and < 2.5
use a Pandoc 2.x before 2.11. rmarkdown
versions >= 2.5 use Pandoc 2.11.
A global RSTUDIO_PANDOC
setting may cause problems in some environments, as
not all rmarkdown
package versions are compatible with all pandoc versions.
The
Applications.Pandoc1Dir
,
Applications.Pandoc2Dir
,
and
Applications.Pandoc2Dir
settings offer more granular control than the RSTUDIO_PANDOC
environment
variable.
Example Supervisors#
Here is a configuration that uses the nice
command to lower the priority of
executing content. See http://linux.die.net/man/1/nice for details about
nice
. Because process supervisors are run as a RunAs
user and not as
root
or another super-user, you may not be permitted to assign a negative
(higher priority) privilege.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = "/usr/bin/nice -n 2"
Note
The Applications.Supervisor
setting must contain the absolute path to the
target application or script.
Here is a configuration that uses a custom script to prepare a custom execution environment before finally running the target command.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = "/some/script/that/prepares/an/environment.sh"
Here is an example supervisor that echos its arguments, sets an environment variable, then invokes whatever arguments have been passed.
#!/usr/bin/env bash
# echo informational messages to standard error to
# prevent Connect from processing them.
echo arguments: "$@" >&2
echo >&2
export COMPANY_DATA_HOME="/data/resides/here"
# Execute the target process after the environment is established.
# All customization must happen before this "exec".
exec "$@"
The argument list of the supervisor is the full command-line of the target
command. The supervisor MUST invoke this target command using exec
or an
equivalent technique.
The following command shows how you can test a supervisor script. This example is only asking R to print its version.
/some/script/that/prepares/an/environment.sh /opt/R/3.6.3/bin/R --version
Note
Your organization may use shell initialization scripts to establish a particular environment. This environment might not be completely compatible with how RStudio Connect attempts to launch R and Python. We recommend building supervisor scripts gradually and carefully. Changes to the environment can alter how your content executes or even prevent R or Python from running correctly.
Using the config
Package#
The config
package makes it easy to
manage environment specific configuration values in R code. For example, you
might want to use one value for a variable locally, and another value when
deployed on RStudio Connect. The package vignette contains more information.
The desired configuration is identified to the config
package by the
R_CONFIG_ACTIVE
environment variable. By default, R processes launched by
RStudio Connect set R_CONFIG_ACTIVE
to rsconnect
. The value can be changed
by modifying the
Applications.RConfigActive
configuration setting. Note that the value of R_CONFIG_ACTIVE
is not
available during package installation.
Specifying Protocols#
RStudio Connect provides a wide variety of techniques to keep Shiny application data in the web browser synchronized. The preferred technique, and the one most widely used, is the use of WebSockets. If WebSockets are not supported, either by some intermediate network between the server and your client or by your client's web browser, then a fallback protocol will be used.
Note
Python applications on RStudio Connect that make use of WebSockets require WebSocket support from the network and any intermediate proxies.
In order of preference, the connection methods are:
- WebSocket
- XHR Streaming
- iframe Eventsource
- iframe HTML File
- XHR Polling
- iframe XHR Polling
- JSONP Polling
Use the Applications.DisabledProtocol
setting to disable specific protocols.
Client Protocol Selection - Shiny Applications#
To change the available protocols from the client, open a Shiny application and press the keyboard shortcut: Ctrl+Alt+Shift+A (or, from a Mac: control+option+shift+A). This will open a window that will allow you to select or deselect any of the above protocols. After you confirm the changes, these settings will be saved in your browser for future visits to this server. These settings will take effect upon loading an application hosted on this domain, and will last until you explicitly change them again; they will only have an effect on the browser in which this action was performed.