Files and Directories¶
Changing Ownership¶
Many of the files and directories mentioned in this chapter are, by default, owned by the rstudio-pm
user. If you change the RunAs user for the RStudio Package Manager service, you will need to change ownership of these files and directories. See the Changing RunAs User section in the appendix for details on changing the RStudio Package Manager service RunAs user.
Program Files¶
The RStudio Package Manager installers place all program files into the
/opt/rstudio-pm
directory.
You should not need to change any files in the /opt/rstudio-pm
hierarchy. Any
alterations will be overwritten by subsequent re-installs or upgrades of
RStudio Package Manager.
Configuration¶
The RStudio Package Manager configuration file is /etc/rstudio-pm/rstudio-pm.gcfg
. This
file is initially owned by rstudio-pm
with permissions 0640
. You will edit this
file to properly configure RStudio Package Manager for your organization.
An example configuration file that includes all the available configuration settings along with their defaults is installed at
/etc/rstudio-pm/rstudio-pm.gcfg.defaults
.
A configuration management tool like Puppet
or Chef can be used to maintain the
rstudio-pm.gcfg
file. We recommend that it remain owned by rstudio-pm
and
have permissions 0640
, as your configuration may need to contain passwords
and other sensitive information.
RStudio Package Manager upgrades will not overwrite customizations to the
rstudio-pm.gcfg
file.
Server Log¶
The RStudio Package Manager server log is located at /var/log/rstudio-pm.log
.
This file is owned by rstudio-pm
with permissions 0600
.
If logrotate
is available when RStudio Package Manager is installed, a logrotate
configuration will be installed. The default configuration is to rotate the
logfile daily. The old log file will be stored alongside the original with a
numeric extension, .1
, .2
, etc. The rotated log files are compressed after
one day. The .1
log file is retained uncompressed, but older logs are
compressed. Most systems use gzip
for compression, giving log files with
extensions like .2.gz
, .3.gz
. Logs will be maintained for 30 days.
The manual for logrotate
has more information.
Service Log¶
RStudio Package Manager can log information about serving source and binary packages.
To enable the service log, define the Server.ServiceLog
configuration property.
[Server] ServiceLog = "/var/log/rstudio-pm.service.log"
The service log file is owned by rstudio-pm
with permissions 0600
. The log
file is stored in JSON format, with each row being a valid JSON object.
Service Log Properties¶
Each row of the service log is a valid JSON object. The following properties may be included.
time
-
A timestamp that identifies when the request was initiated.
-
Type: string
-
Appears: all logs
name
-
The name of the package that was requested.
-
Type: string
-
Appears: all logs
version
-
The version of the package that was requested.
-
Type: string
-
Appears: all logs
class
-
Set to one of the following
-
current
- A current source package request.
-
archived
- An archived source package request.
-
binary_win
- A Windows binary package request.
-
Type: string
-
Appears: all logs
source
-
Set to the source name for the current package request. This is useful for determining which source a package originates from, particularly when a repo contains multiple sources.
-
Type: string
-
Appears: all logs
type
-
Indicates one of the following request types
-
source_archived
- The request is for an archived package source.
-
source_current
- The request is for a current package source.
-
binary_archived
- The request is for an archived package binary.
-
binary_current
- The request is for a current package binary.
-
Type: string
-
Appears: all logs
distro
-
For package binary requests, indicates the distro requested. When R is configured to use Linux binaries, the distro is the URL segment that directly follows
__linux__/
. On Windows, the distro indicates the R version, e.g.,3.5-win
. -
Type: string
-
Appears: binary requests only
r_version
-
For package binary requests, indicates the R version requested. The R version is indicated in the
User-Agent
header value. -
Type: string
-
Appears: binary requests only
match_type
-
For package binary requests, indicates one of the following match types.
-
exact
- TheUser-Agent
R version matches the distro's default R version exactly.
-
best
- A best-matching R version was calculated based on theUser-Agent
header value.
-
forced
- TheUser-Agent
R version's minor version matches the minor version of the distro's R version, and the distro was configured to force using its R version.
-
Type: string
-
Appears: binary requests only
failed_service
-
The presence of this property indicates a service failure. If the log line includes a
message
property ofok
, then afailed_service
property indicates that a package binary request was unsuccessful, but the service was able to successfully fall back to serving the package source. However, if the log line'smessage
property includes an error message, then the request failed. -
Service failure values include:
-
service_error_ash
- Failure while calculating the unique signature (ASH) for a package binary.
-
service_error_fetch
- Failure while fetching package binary or source files from storage.
-
Type: string
-
Appears: failed and source fallback requests (service failure related)
failed_ua
-
For package binary requests, the presence of this property property indicates a User-Agent failure. User-Agent failures include:
-
ua_error_none
- NoUser-Agent
header was found.
-
ua_error_unsupported_os
- TheUser-Agent
header value indicates an OS that is not supported.
-
ua_error_no_r
- TheUser-Agent
header value does not indicate the R version.
-
ua_error_distro_mismatch
- The OS indicated by theUser-Agent
header value does not match the OS for the distro.
-
Type: string
-
Appears:
User-Agent
-related source fallback requests user_agent
-
For User-Agent failures, the
User-Agent
header value is included for reference. -
Type: string
-
Appears:
User-Agent
-related source fallback requests failed_match
-
For package binary requests, the presence of this property property indicates a failure related to calculating a best-matching R version. Values include:
-
match_error_no_best_match
- No matching R version was found.
-
match_error_force_mismatch
- The R version is being forced by the distro, but the distro's minor version does not match the minor version indicated by theUser-Agent
header.
-
Type: string
-
Appears: Match-related source fallback requests
error
-
For service failures that result in falling back to serving source, an
error
property may be included to indicate an error message associated with the service failure. -
Type: string
-
Appears: source fallback requests (service failure related)
message
-
For requests ending in failure, the error message is indicated by the
message
property. All other requests, including requests for binaries that successfully fall back to source due to one of the above failure modes, will return a value ofok
. -
Type: string
-
Appears: all logs
Service Log Examples¶
Below are examples of typical service logs.
-
Serving a current package binary where the R version indicated by the
User-Agent
header is an exact match for the distro identifier's default R version.{ "time": 1563907491, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.5.2", "match_type": "exact", "message": "ok" }
-
Serving a current package binary where the R version indicated by the
User-Agent
header is a "best" match for the distro identifier's default R version.{ "time": 1563907529, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.5.3", "match_type": "best", "message": "ok" }
-
A current package binary was requested, but was not available to download. RStudio Package Manager automatically fell back to serving the package source, which succeeded.
{ "time": 1563907544, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.6.3", "match_type": "best", "error": "DownloadBinaryEtagRunner: file not found: https://rspm-sync.rstudio.com/bin/3.6-bionic/e3f8ab6d0bd9f83cb787b4f7472d60d98f247f64ea2c8a32aff68be6abbde5cf.tar.gz", "failed_service": "service_error_fetch", "message": "ok" }
-
A current package binary was requested, but no best-matching binary was available for R 3.7.3. RStudio Package Manager automatically fell back to serving the package source, which succeeded.
{ "time": 1563907562, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.7.3", "failed_match": "match_error_no_best_match", "message": "ok" }
-
A current package binary was requested, but the
User-Agent
header did not indicate an R version. RStudio Package Manager automatically fell back to serving the package source, which succeeded. Note that theuser_agent
property is included in this case to aid in debugging.{ "time": 1563907598, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "failed_ua": "ua_error_no_r", "user_agent": "Q (3.5.3 x86_64-pc-linux-gnu x86_64 linux-gnu)", "message": "ok" }
-
A current package binary was requested, but the
User-Agent
header indicated an unsupported OS. RStudio Package Manager automatically fell back to serving the package source, which succeeded. Note that theuser_agent
property is included in this case to aid in debugging.{ "time": 1564757046, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "xenial", "failed_ua": "ua_error_unsupported_os", "user_agent": "python-requests/2.9.1", "message": "ok" }
-
Serving a current package source.
{ "time": 1563907590, "name": "plumber", "version": "0.4.6", "class": "current", "type": "source_current", "message": "ok" }
Access Logs¶
The RStudio Package Manager HTTP access logs are located at
/var/log/rstudio-pm.access.log
. This file is owned by rstudio-pm
with
permissions 0600
. Log files are stored in Apache Combined Log Format. See
http://httpd.apache.org/docs/2.2/logs.html#combined for a description of this
format.
If logrotate
is available when RStudio Package Manager is installed, a logrotate
configuration will be installed. The default configuration is to rotate the
logfile daily. The old logfile will be compressed and stored alongside the
original log file with a .1.gz
extension (then .2.gz
, etc.). Logs will
maintained for 30 days.
Variable Data¶
RStudio Package Manager manages R packages and repositories. All package source bundles are stored in the server's data directory. The RStudio Package Manager handles incoming requests for packages across repositories. Only a single copy of each package source is stored, even if the package is referenced in multiple repositories.
The RStudio Package Manager data directory also contains information used by the
server to manage repositories including the RStudio Package
Manager SQLite
databases and encryption key if SQLite is used.
The default location for the RStudio Package Manager data directory is
/var/lib/rstudio-pm
. This can be customized by specifying an alternate
DataDir
in the Server
section of your configuration file.
; /etc/rstudio-pm/rstudio-pm.gcfg [Server] DataDir = /mnt/rstudio-pm
If you customize the RStudio Package Manager data directory, make sure that
the rstudio-pm
user has permission to read, write, and create directories in
the data directory.
The RStudio Package Manager SQLite databases must exist on local storage. If
the location for DataDir
is not local storage but a networked location over
NFS, configure the Dir
setting in the SQLite
section of your server
configuration file.
; /etc/rstudio-pm/rstudio-pm.gcfg [Server] DataDir = /mnt/rstudio-pm [SQLite] Dir = /var/lib/rstudio-pm/db
Permissions¶
/var/lib/rstudio-pm
is owned by rstudio-pm
with permissions 0700
.
Variable Data Classes¶
All variable data storage locations default to subdirectories of the
Server.DataDir
setting. There are six classes of variable data, listed below
- Cache - Stores data to increase performance for computationally intensive
operations. Certain operations, such as Git package building, also temporarily
cache data here. Defaults to
<DataDir>/cache
. - Launcher - Stores data for Job Launcher operations. This location currently
stores the stdout and stderr data associated with each Git package builder
operation. Defaults to
<DataDir>/launcher
. - Metrics - This directory contains aggregated metrics data to improve Usage
Stats performance. Defaults to
<DataDir>/metrics
. - Packages - Package tarballs and README files are stored here. Defaults to
<DataDir>/packages
. This includes: - Package tarballs and README files for local packages.
- Package tarballs and README files git packages.
- CRAN - Package tarballs and README files for CRAN are stored here. Defaults
to
<DataDir>/cran
. This includes: - Package tarballs for CRAN packages that have been downloaded.
- README files for CRAN packages.
- Binaries - Pre-compiled R package binaries are stored here. Defaults to
<DataDir>/binaries
.
You can customize the storage directory for each storage class. For example:
; /etc/rstudio-pm/rstudio-pm.gcfg [FileStorage "cache"] Location = /mnt/rstudio-pm-cache [FileStorage "launcher"] Location = /mnt/rstudio-pm-launcher [FileStorage "metrics"] Location = /mnt/rstudio-pm-metrics [FileStorage "packages"] Location = /mnt/rstudio-pm-packages [FileStorage "cran"] Location = /mnt/rstudio-pm-cran [FileStorage "binaries"] Location = /mnt/rstudio-pm-binaries
Again, if you customize any of the RStudio Package Manager storage directories,
make sure that the rstudio-pm
user has permission to read, write, and create
directories in each data directory.
Destinations¶
The six variable storage classes (see the Variable Data Classes section above) default to storing data on disk. Each storage class can optionally be configured to store data on S3. For example, to configure all six variable data storage classes for S3, use the following configuration:
; /etc/rstudio-pm/rstudio-pm.gcfg [Storage] Cache = s3 Launcher = s3 Metrics = s3 Packages = s3 CRAN = s3 Binaries = s3 ; Default S3 settings. This is the minimum-required setting for using S3. [S3Storage] Bucket = your-s3-bucket ; Override default S3 settings for the "packages" class. This demonstrates ; all the available S3 configuration settings. [S3Storage "packages"] Bucket = another-s3-bucket Prefix = rspm-packages Profile = dev-rspm Region = us-west-1 EnableSharedConfig = true
RStudio Package Manager's AWS S3 support utilizes the AWS S3 SDK, which documents configuration and credential standards for interacting with S3 services.
See the S3 Configuration chapter for information on configuring your system to use AWS S3.
Server Migrations¶
You may need to migrate your RStudio Package Manager installation:
- When moving from one environment to another (e.g., physical to virtual or on-prem to cloud)
- To facilitate HA setup with more than one node
Several factors must be considered before migrating your RStudio Package Manager installation from one server to another. We recommend that you don't make any configuration changes (or as few as possible) during the initial migration. If, for instance, you will be migrating to a new server and upgrading to a new default version of R, complete the migration first. Then upgrade R in subsequent steps.
Before you migrate the server, you need to perform a backup to obtain a consistent copy of the data in the necessary directories. These directories can then be copied to the new server.
- Install RStudio Package Manager on the new server, then stop the service.
- Mirror the Unix accounts used by RStudio Package Manager on the existing server to
the new server. See the Account for Processes section. If you are using the
default
rstudio-pm
account and group, then you will only need to consider the user account that needs permission to use the CLI. - Copy the config and data directories while preserving the permissions and
file ownership. Not all file transfer clients can preserve these
attributes, so consider using
rsync
with the-a
flag to copy the data. - Update your
/etc/rstudio-pm/rstudio-pm.gcfg
file if you've changed settings like the path to your data directory. - Install the same version(s) of R on the new server to mimic existing behavior. If you must install a different version of R, RStudio Package Manager will still function correctly, but certain functions (like building Git packages) may be affected.
If you are also migrating to a different database provider, see the section on Changing the Database Provider.