R Package Management#
Package Installation#
RStudio Connect installs the R package dependencies of Shiny applications, Plumber APIs, and R Markdown documents when that content is deployed.
The RStudio IDE uses the rsconnect
and packrat
R packages to identify the
target source code and enumerate its dependencies. That information is bundled
into an archive (.tar.gz
) file and uploaded to RStudio Connect.
RStudio Connect receives a bundle archive (.tar.gz
) file, unpacks it, and
uses packrat
to install the identified package dependencies.
Note
RStudio Connect includes and manages its own installation of the packrat
package. This packrat
installation is not available to user code and used
only when restoring execution environments.
The execution environment created by RStudio Connect and packrat
contains
the same package versions you are using in your development environment.
Package Caching#
The packrat
package attempts to re-use R packages whenever possible. The
shiny
package, for example, is installed when the first Shiny application is
deployed. That version of shiny
is placed into the packrat
package cache
and associated with that Shiny application deployment. Other Shiny
applications built with the same version of the shiny
package will use that
cached installation. Deployments are faster when they can take advantage of
previously-installed packages.
The packrat
package cache allows multiple versions of a package to exist on
a system. An old Shiny application built with shiny
version 1.0.5 continues
to use that package version even as newer deployments choose updated versions
of shiny
. Each Shiny application has an R environment with its expected
shiny
version. The different applications and shiny
versions coexist.
Publish new content without worrying about package updates breaking existing, deployed content. Distinct versions of packages are kept isolated from each other.
Package compilation#
Some packages contain C and C++ code components. That code needs to be
compiled during package installation. The
Server.CompilationConcurrency
setting controls the number of concurrent compilation processes used by
package installation.
The default value for the Server.CompilationConcurrency
setting is derived
from the number of available CPUs with the formula max(1, min(8,
(cpus-1)/2))
. This property controls the number of concurrent C/C++
compilations during R package installation. This value makes it less likely
for package installs to encounter memory capacity issues on lightweight hosts
while allowing more concurrency on high-capacity servers.
CPUs | CompilationConcurrency |
---|---|
1 | 1 |
2 | 1 |
4 | 1 |
6 | 2 |
8 | 3 |
16 | 7 |
24 | 8 |
32 | 8 |
You can customize Server.CompilationConcurrency
to force a specific level of
concurrency.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Server]
CompilationConcurrency = 1
External Package Installation#
Warning
Adding external packages decreases the reproducibility and isolation of content on RStudio Connect, and should only be done as a last resort.
You can indicate that a system-wide installation of a package should be used
instead of one fetched by packrat. The
Packages.External
can be
used to enumerate each system-provided package
For example, rJava
or ROracle
are large installations, potentially with
odd dependencies, such as your choice of JDK and/or Oracle InstantClient.
First, you would install these packages in every R installation that RStudio
Connect will be using. Then, you would configure RStudio Connect with the
following parameters:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Packages]
External = ROracle
External = rJava
This is the same as settings the packrat option external.packages
to
c("ROracle", "rJava")
using packrat::set_opts
. The external.packages
option instructs packrat::restore
to load certain packages from the user
library. See the packrat
documentation for more
information.
Proxy Configuration#
If the http_proxy
and/or https_proxy
environment variables are provided
to RStudio Connect when the server starts, those variables will be passed
to all processes run by RStudio Connect, including the package installation
process.
Configuring
Packages.HTTPProxy
and
Packages.HTTPSProxy
will provide their values as the http_proxy
and https_proxy
environment
variables only when packages are installed during deployment. This could be
useful if you have a special proxy just for downloading package dependencies.
You could regulate access to unapproved packages in non-CRAN repositories by
rejecting certain URL patterns.
Private Repositories#
Packrat records details about how a package was obtained in addition to information about its dependencies. Most public packages will come from a public CRAN mirror. Packrat lets RStudio Connect support alternate repositories in addition to CRAN.
Info
Learn how to create your own custom repository; this directory can then be shared over HTTP or through a shared filesystem.
Here are some reasons why your organization might use an alternate/private repository.
-
Internally developed packages are made available through a corporate repository. This is used in combination with a public CRAN mirror.
-
All packages (private and public) are approved before use and must be obtained through the corporate repository. Public CRAN mirrors are not used.
-
Direct access to a public CRAN mirror is not permitted. A corporate repository is used as a proxy and caches public packages to avoid external network access.
RStudio Connect supports private repositories in these situations given that the deploying instance of R is correctly configured. No adjustment to the RStudio Connect server is needed in this case.
In the case where the deploying instance of R and RStudio Connect must have
different repository URLs, the
RPackageRepository
configuration option allows the repository URLs set by the user to be
overridden on each packrat restore.
Repository information is configured using the repos
R option. Your users
will need to make sure their desktop R is configured to use your corporate
repository.
Note
RStudio IDE version 0.99.1285 or greater is needed when using repositories other than the public CRAN mirrors.
We recommend using an .Rprofile
file to configure multiple repositories or
non-public repositories.
The .Rprofile
file should be created in a user's home directory.
# A sample .Rprofile file with two different package repositories.
local({
r <- getOption("repos")
r["CRAN"] <- "https://cran.rstudio.com/"
r["mycompany"] <- "http://rpackages.mycompany.com/"
options(repos = r)
})
This .Rprofile
creates a custom repos
option. It instructs R to attempt
package installation first from "CRAN"
and then from the "mycompany"
repository. R installs a package from the first repository in "repos"
containing that package.
With this custom repos
option, you will be able to install packages from the
mycompany
repository. RStudio Connect will be able to install these packages
as code is deployed.
For more information about the .Rprofile
file, see
help(Startup)
in R. For details about package installation, see
help(install.packages)
and
help(available.packages)
.
Private Packages#
Packages available on CRAN, a private package repository, or a public GitHub repository are automatically downloaded and built when an application is deployed. RStudio Connect cannot automatically obtain packages from private GitHub repositories, but a workaround is available.
Note
We recommend using a private repository to host internal packages when possible. See the Private Repositories section for details.
Warning
Server.SourcePackageDir
is deprecated as of RStudio Connect 1.8.6 and will
be removed in a future version. We recommend using a private repository.
The configuration option
Server.SourcePackageDir
can reference a directory containing additional packages that Connect would
not otherwise be able to retrieve. This directory and its contents must be
readable by the
Applications.RunAs
user.
Connect will look in this directory for packages before attempting to obtain
them from a remote location.
This feature has some limitations.
-
The package must be tracked in a git repository so that each distinct version has a unique commit hash associated with it.
-
The package must have been installed from the git repository using the
devtools
package so that the hash is contained in theDESCRIPTION
file on the client machine.
If these conditions are met, you may place .tar.gz
source packages into
per-package subdirectories of SourcePackageDir
. The proper layout of these
files is <package-name>/<full-git-hash>.tar.gz
.
For example, if Server.SourcePackageDir
is defined as /opt/R-packages
,
source bundles for the MyPrivatePkg
package are located at
/opt/R-packages/MyPrivatePkg
. A commit hash of
28547e90d17f44f3a2b0274a2aa1ca820fd35b80
needs its source bundle stored at
the following path:
/opt/R-packages/MyPrivatePkg/28547e90d17f44f3a2b0274a2aa1ca820fd35b80.tar.gz
When private package source is arranged in this manner, users of RStudio Connect will be able to use those package versions in their deployed content.
Be aware that this mechanism is specific to the commit hash, so you will
either need to make many git revisions of your package available in the
Server.SourcePackageDir
directory hierarchy or standardize to a particular git commit of the package.