12 Process Management
RStudio Connect launches R to perform a variety of tasks. This includes:
- Installation of R packages
- Rendering of R Markdown documents
- Running Shiny Applications
- Running a Shiny application to customize a parameterized R Markdown document.
- Running APIs using Plumber
- Running TensorFlow Model APIs
The location of R defaults to whatever is in the path. Customize the
Server.RVersion
setting to use a specific R installation. See Chapter
14 for details.
12.1 Sandboxing
The RStudio Connect process runs as the root
user. It needs escalated
privileges to allow binding to protected ports and to create “unshare”
environments where content processes are run.
RStudio Connect runs its processes as an unprivileged user; both a system default and content-specific overrides are supported. See Section 12.5 for details.
The “unshare” environment created for R execution involves first establishing
a number of bind mounts and then switching to the target unprivileged user.
RStudio Connect uses unshare
to alter the execution context available to R
processes. Within this newly established environment, a number of mount
calls are made in order to hide or isolate parts of the filesystem.
You can learn more about unshare
here. The mount
call
is detailed here. Your
local man pages will document their behavior specific to your system.
The following locations are masked during R execution:
- The
Server.DataDir
directory containing all variable data used by RStudio Connect. - The
SQLite.Dir
directory, which can optionally be placed outside the data directory. - Configuration directories, including
/etc/rstudio-connect
. - The
Server.TempDir/connect-workspaces
directory, which contains temporary directories, one per process.
The following information is exposed during R execution:
- The
packrat
data directory (read-only except when installing packages). - The
R
data directory (only when installing packages). - The directory containing the unpackaged R code (Shiny, Plumber, and R Markdown).
- The document rendering destination directory (only for R Markdown).
- A per-process temporary directory specified in the
TMPDIR
environment variable of the process. This temporary directory is created underServer.TempDir/connect-workspaces
.
When Applications.HomeMounting
is enabled, the contents of /home
are
masked by an additional bind mount as follows:
- The contents of
/home
are masked by the home directory of theRunAs
user. - If the
RunAs
does not have a home directory, an empty directory masks/home
.
The path to the home directory is always available through the HOME
environment variable. With Applications.HomeMounting
, the mounted path to
the HOME
directory is subject to change. Avoid hard-coding paths to either
/home
and /home/username
.
Running R applications, like Shiny apps and Plumber APIs, have write access to the directory containing the unpackaged R code. This application directory is the working directory when launching an application. Data written here will be visible to all processes associated with that application but are not visible to other R processes. Application directory data remains available until that application is next deployed to RStudio Connect. A deployment creates a new application directory containing only the deployed content.
RStudio Connect may launch multiple processes to service requests for an application. There is no coordination between these processes. Applications that write to local files could experience problems when different processes attempt to write to a single file.
For example, two different processes writing to the same file may see output incorrectly interleaved or even overwritten.
We do not recommend using the file system for data persistence.
R Markdown documents have write access to the rendering destination directory
and to a directory containing the unpackaged R code. When RStudio Connect is
rendering a document, it first makes a copy of the unpackaged R code into a new,
temporary directory so that simultaneous rendering processes are isolated from
each other and cannot corrupt each other’s output files. This temporary source
directory is the working directory when calling rmarkdown::render
. The
destination directory is passed as the output_dir
while a temporary directory
is passed as the intermediates_dir
. The intermediate directory is transient
and not available after rendering completes. A new output directory is created
whenever the document is rendered. Data created during one rendering is not
visible to another.
R Markdown multi-document sites have a slightly different rendering pipeline
than standalone documents. RStudio Connect uses the rmarkdown::render_site
function, which does its rendering in-place. The content from the source
directory is copied into the rendering destination directory in preparation
for rendering. Site rendering has write access to the destination directory.
Access to the original source directory is not provided because the source
content is duplicated in the destination directory
The rmarkdown::render_site
call usually places its output into a
subdirectory (typically, ’_site’). The contents of this output subdirectory
will be moved to the root of the rendering destination directory, replacing
any other content. No post-rendering file movement occurs if
rmarkdown::render_site
is instructed to render into the current directory
instead of a subdirectory. This means that both source and output files will
be available for serving.
We recommend against configuring
rmarkdown::render_site
to write its output into the current directory. Rendering the site into a subdirectory (the default) allows RStudio Connect to remove source from the output directory.
RStudio Connect serves rendered content from the document output directory. This content remains available until a subsequent rendering is successful and activated (if requested). Neither incomplete nor unsuccessful document renderings affect the availability of previously rendered content.
12.2 Temporary Directory
Each process started by RStudio Connect is given its own unique temporary
directory. These directories are created under
Server.TempDir
/connect-workspaces
.
Server.TempDir
’s default value is obtained by first checking the TMPDIR
environment variable for a path and falls back to /tmp
otherwise.
You may wish to override Server.TempDir
if the default temporary directory has
too little space or is mounted with the noexec
option.
Note: If you do override
Server.TempDir
please ensure the location can be reached by, read from, and written to by any user on the system. On most systems, temporary directories typically have permissions of1777
.
You can learn more about the noexec
option
here.
12.3 Shiny Applications & Plumber APIs
Most of the processes started by RStudio Connect are batch-oriented tasks. A process is invoked, does a narrow set of work, and then exits. Shiny applications and Plumber APIs are different and may see a process handle many requests for many users over their lifetimes. Both Shiny Applications and Plumber APIs are live applications that react to user requests on-demand.
RStudio Connect launches an process tied to a live application when the first request arrives for that application. That process will continue to service requests until it becomes idle and eventually terminated. If there is sufficient traffic against that application, RStudio Connect may launch additional processes to service those requests.
There are a number of configuration parameters which control the conditions under which processes for applications are launched and eventually reaped. The default values are appropriate for most applications but occasionally need customization in specialized environments. Section A.20 explains each of the options.
We recommend that adjustment to these runtime properties be done gradually.
12.4 TensorFlow Model APIs
TensorFlow Model APIs have a similar lifecycle to Plumber APIs. They are live processes that handle user requests on demand. TensorFlow Model API processes do not run R. However, the same per-process and global scheduler settings may still apply to TensorFlow Model API processes.
TensorFlow Model APIs may also be run within a supervisor script if one is provided. The API server requires access to the app’s content directory and to shared object files and their dependencies.
12.5 User Account for Processes
The RStudio Connect installation creates a local rstudio-connect
user account.
This account runs all processes associated with deployed content; root
does
not invoke processes directly. If you would like a different user to run content processes,
customize the Applications.RunAs
property.
Administrators can customize the RunAs
user on a content-specific level.
This means that different applications and reports can be run
using different Unix accounts. This setting can be found on the Access tab
when editing content settings. Publishers and Viewers are prohibited from
changing the RunAs
user on a content-specific level.
If you choose to specify a custom RunAs
user for content, that user must
be a member of the Unix group that is the primary group of the
Applications.RunAs
user.
The
rstudio-connect
user, for example, has a primary group also namedrstudio-connect
. Any Unix account configured as a customRunAs
user for a Shiny application, Plumber API, or R Markdown report must be a member of therstudio-connect
group.
Installation of R packages always happens as the Application.RunAs
user. An
application or R Markdown report may override its RunAs
setting; this
alters how the deployed code is executed and does not impact package
installation. See Section 12.1 for more
information about process sandboxing.
12.6 Current user execution
RStudio Connect can use a local Unix account associated with the currently logged-in user when executing Shiny applications or Shiny documents. This feature requires that user authentication use PAM.
See Section 10.8 for information about using PAM for user authentication.
The Applications.RunAsCurrentUser
property specifies that content can be
configured to execute as the currently logged-in user.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAsCurrentUser = true
Administrators can now customize the RunAs
settings to permit current-user
execution on a content-specific level. The Access content setting tab offers
the option of executing using “The Unix account of the current user”.
Content accessed anonymously will execute as the specified fallback RunAs
user.
See Section 12.5 for more information about
RunAs
customization.
Content execution settings are not altered when RunAsCurrentUser
is enabled.
The RunAsCurrentUser
setting permits current-user execution but by itself
does not change how processes are launched. Each Shiny application or Shiny
document must explicitly request current-user execution.
All Unix accounts used to execute R must be members of the Unix group that
is the primary group of the Applications.RunAs
user. Applications are not
permitted to launch if the Unix account associated with the logged-in user
does not have the proper group membership.
The
Applications.RunAs
setting uses therstudio-connect
user by default. This user has a primary group also namedrstudio-connect
. Any Unix account that may be used to execute applications or R Markdown reports must be a member of therstudio-connect
group.
12.7 PAM sessions
Note: Please see the special instructions at the bottom of this section for running RStudio Connect on Ubuntu 14.04 (Trusty Tahr)
RStudio Connect can use PAM to establish the environment and resources available for R sessions.
See Section 10.8 for information about using PAM for user authentication.
PAM sessions are enabled with the PAM.UseSession
setting.
; /etc/rstudio-connect/rstudio-connect.gcfg
[PAM]
UseSession = true
The default PAM service name used for PAM sessions is su
. This gives RStudio
Connect the ability to launch processes as the specified user without
requiring a password.
You can customize the PAM service name used for PAM sessions by customizing
the PAM.SessionService
setting.
; /etc/rstudio-connect/rstudio-connect.gcfg
[PAM]
SessionService = rstudio-connect-session
The SessionService
must contain the PAM directive that enables authentication
with root privileges. Otherwise, processes will not run and will return error
code 70.
# Allows root to su without passwords (required)
auth sufficient pam_rootok.so
Ubuntu 14.04 (Trusty Tahr) uses upstart
as init by default, but also uses
systemd-logind
to clean up processes from closed user sessions. There is
a known issue where PAM.UseSession
causes this specific host configuration
to rapidly terminate processes, returning error code 129
.
If you enable PAM.UseSession
, you also need to edit the upstart
configuration file at /etc/init/rstudio-connect.conf
, replacing the line
beginning exec /opt/rstudio-connect/bin/connect
with the following:
exec su -s /bin/sh -c 'exec "$0" "$@"' root -- /opt/rstudio-connect/bin/connect \
--config=/etc/rstudio-connect/rstudio-connect.gcfg >> /var/log/rstudio-connect.log 2>&1
After altering rstudio-connect.conf
, trigger an upstart configuration reload
and then restart RStudio Connect.
sudo initctl reload-configuration
sudo stop rstudio-connect
sudo start rstudio-connect
Changing the rstudio-connect.conf
in this way has considerable side effects
because it is the equivalent of opening a su
session for root and leaving it
open for the life cycle of the RStudio Connect daemon.
If this solution is unacceptable, alternative solutions may include:
- Upgrading the host to Ubuntu 16 or later
- Updating
systemd-logind
to be newer than v204 - Altering the init provider to use
systemd
instead ofupstart
- Disabling
systemd-logind
on the host
12.7.1 PAM Credential Caching (Kerberos)
Note: RStudio Connect’s PAM cache is encrypted and is not stored on disk. The credentials must expire after a certain period of time.
RStudio Connect can be configured to securely cache a user’s PAM credentials when they log in to RStudio Connect. This enables RStudio Connect to let users run R processes as their current UNIX account when the PAM profile requires a user’s credentials, such as when using Kerberos.
The following config settings are required for credential caching to be enabled:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAsCurrentUser = true
[PAM]
UseSession = true ; Enable PAM sessions
ForwardPassword = true ; Forward the current user's password into the PAM session
PasswordLifetime = 12h ; Cache passwords for 12 hours after login
AuthenticatedSessionService = YOUR_PAM_SERVICE_HERE ; PAM service that accepts credentials
Replace 12h
with the amount of time you would like credentials to be cached.
Credential lifetime is counted from the moment the user logs into RStudio
Connect. It is not tied to the user’s web session, except that logging in again
will restart the timer for that user’s credentials.
The AuthenticatedSessionService
setting is similar to SessionService
, except
that it should accept user credentials and validate them. For example, a PAM service
that uses the host’s Kerberos configuration to expose functionality could be:
auth required pam_krb5.so
account [default=bad success=ok user_unknown=ignore] pam_krb5.so
password sufficient pam_krb5.so use_authtok
session requisite pam_krb5.so
12.8 Path Rewriting
The sandboxing used by RStudio Connect involves bind mounts which map physical locations on disk onto different directory structures at runtime. Paths used by your R code use these sandboxed locations. If you need to find the physical file on disk, you will need to undo the path transformation.
This section gives some examples of path rewriting and offer some ways of finding the file you need.
Let’s start with an app.R
file that describes a Shiny application. This file
will be in the apps/XX/YY/
directory underneath the Server.DataDir
location. The XX
and YY
path components correspond to the application ID
and bundle (or deployment) ID for this version of your application. This
directory is available at runtime as /opt/rstudio-connect/mnt/app/
.
The directory structure of /opt/rstudio-connect/mnt/
is just a number of
empty directories. The “unshare” environment created during sandboxing allows
RStudio Connect to associate different application directories with these
mount directories.
Here are some common path transformations that may be helpful. All of the
physical paths are beneath the Server.DataDir
hierarchy that defaults to
/var/lib/rstudio-connect
. All of the sandbox paths are beneath the mount
directory /opt/rstudio-connect/mnt/
. This location is not customizable.
Physical path | Sandbox path |
---|---|
DataDir/apps/XX/YY/ |
MountDir/app/ (non-renders) |
DataDir/reports/v2/XX/YY/temp.render.TT |
MountDir/app/ (renders) |
DataDir/reports/v2/XX/VV/RR |
MountDir/report/ |
DataDir/R |
MountDir/R |
DataDir/packrat |
MountDir/packrat |
Here are some actual path transformations using the default Server.DataDir
location:
# A source Shiny application
/var/lib/rstudio-connect/apps/4/7/app.R
=> /opt/rstudio-connect/mnt/app/app.R
# A source Plumber API
/var/lib/rstudio-connect/apps/38/10/plumber.R
=> /opt/rstudio-connect/mnt/app/plumber.R
# A source R Markdown document
/var/lib/rstudio-connect/reports/v2/8/12/temp.render.639085504/index.Rmd
=> /opt/rstudio-connect/mnt/app/index.Rmd
# An HTML document rendered from that R Markdown document
/var/lib/rstudio-connect/reports/v2/8/2/17/index.html
=> /opt/rstudio-connect/mnt/report/index.html
# A staticly deployed document
/var/lib/rstudio-connect/apps/17/21/index.html
=> /opt/rstudio-connect/mnt/app/index.html
# The Shiny package inside the packrat cache
/var/lib/rstudio-connect/packrat/3.2.5/v2/library/shiny/
28d6903a44dc53bd4823fa43ccdc08e5/shiny
=> /opt/rstudio-connect/mnt/packrat/3.2.5/v2/library/shiny/
28d6903a44dc53bd4823fa43ccdc08e5/shiny
12.9 Program Supervisors
You may need to modify the environment or resources available to processes
before the processes are launched. This can be accomplished using a program supervisor
using the Applications.Supervisor
configuration setting.
The supervisor command is provided the full R command-line, which MUST be invoked by the supervisor. The process exit code from R MUST be returned as the exit code of the supervisor. The file descriptors for standard input, output, and error MUST NOT be intercepted by the supervisor.
A supervisor is executed as the appropriate RunAs
user. Package installation
always uses the Applications.RunAs
user. Other processes will use the
content-specific RunAs
account, falling back to Applications.RunAs
if no
override was configured. See Section 12.5 for
details.
Supervisors run within the sandbox established for any process. See Section 12.1 for more information about process sandboxes.
RStudio Connect configures the TMPDIR
, HOME
, and RSTUDIO_PANDOC
environment variables for launched processes. RStudio Connect also manages
package installation and references. Avoid altering any of this behavior in
program supervisors.
12.9.1 Example Supervisors
Here is a configuration that uses the nice
command to lower the priority of
executing content. See http://linux.die.net/man/1/nice for details about nice
.
Because process supervisors are run as a RunAs
user and not as root
or
another super-user, you may not be permitted to assign a negative (higher
priority) privilege.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = nice -n 2
Here is a configuration that uses a custom script to prepare a custom execution environment before finally running R.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = /some/script/that/prepares/an/environment.sh
Here is an example supervisor that echos its arguments, sets an environment variable, then invokes whatever arguments have been passed.
#!/bin/bash
echo arguments: "$@"
echo
export COMPANY_DATA_HOME="/data/resides/here"
# Execute the target process after the environment is established.
# All customization must happen before this "exec".
exec "$@"
The argument list of the supervisor is the full command-line of the target
command. The supervisor MUST invoke this target command using exec
or
an equivalent technique.
Your organization may use shell initialization scripts to establish a particular environment. This environment might not be completely compatible with how RStudio Connect attempts to launch R.
We recommend building supervisor scripts gradually and carefully. Changes to the environment can alter how your content executes or even prevent R from running correctly.
12.10 Using the config
Package
The config
package makes it easy to
manage environment specific configuration values in R code. For example, you
might want to use one value for a variable locally, and another value when
deployed on RStudio Connect. The package vignette contains more information.
The desired configuration is identified to the config
package by the
R_CONFIG_ACTIVE
environment variable. By default, R processes launched by
RStudio Connect set R_CONFIG_ACTIVE
to rsconnect
. The value can be changed
by modifying the Applications.RConfigActive
configuration setting. Note that
the value of R_CONFIG_ACTIVE
is not available during package installation.