7 Files and Directories
7.1 Changing Ownership
Many of the files and directories mentioned in this chapter are, by default,
owned by the rstudio-pm
user. If you change the RunAs user for the
RStudio Package Manager service, you will need to change ownership of these files
and directories. See D for details on changing the
RStudio Package Manager service RunAs user.
7.2 Program Files
The RStudio Package Manager installers place all program files into the
/opt/rstudio-pm
directory.
You should not need to change any files in the /opt/rstudio-pm
hierarchy. Any
alterations will be overwritten by subsequent re-installs or upgrades of
RStudio Package Manager.
7.3 Configuration
The RStudio Package Manager configuration file is /etc/rstudio-pm/rstudio-pm.gcfg
. This
file is initially owned by rstudio-pm
with permissions 0640
. You will edit this
file to properly configure RStudio Package Manager for your organization.
A configuration management tool like Puppet
or Chef can be used to maintain the
rstudio-pm.gcfg
file. We recommend that it remain owned by rstudio-pm
and
have permissions 0640
, as your configuration may need to contain passwords
and other sensitive information.
RStudio Package Manager upgrades will not overwrite customizations to the
rstudio-pm.gcfg
file.
7.4 Server Log
The RStudio Package Manager server log is located at /var/log/rstudio-pm.log
.
This file is owned by rstudio-pm
with permissions 0600
.
If logrotate
is available when RStudio Package Manager is installed, a logrotate
configuration will be installed. The default configuration is to rotate the
logfile daily. The old log file will be stored alongside the original with a
numeric extension, .1
, .2
, etc. The rotated log files are compressed after
one day. The .1
log file is retained uncompressed, but older logs are
compressed. Most systems use gzip
for compression, giving log files with
extensions like .2.gz
, .3.gz
. Logs will be maintained for 30 days.
The manual for logrotate
has more information.
7.5 Service Log
RStudio Package Manager can log information about serving source and binary packages.
To enable the service log, define the Server.ServiceLog
configuration property.
[Server]
ServiceLog = "/var/log/rstudio-pm.service.log"
The service log file is owned by rstudio-pm
with permissions 0600
. The log
file is stored in JSON format, with each row being a valid JSON object.
7.5.1 Service Log Properties
Each row of the service log is a valid JSON object. The following properties may be included.
time
A timestamp that identifies when the request was initiated.
Type: string
Appears: all logs
name
The name of the package that was requested.
Type: string
Appears: all logs
version
The version of the package that was requested.
Type: string
Appears: all logs
class
Set to either
current
for current package requests orarchived
for archived package requests.Type: string
Appears: all logs
type
Indicates one of the following request types
source_archived
- The request is for an archived package source.
source_current
- The request is for a current package source.
binary_archived
- The request is for an archived package binary.
binary_current
- The request is for a current package binary.
Type: string
Appears: all logs
distro
For package binary requests, indicates the distro requested. The distro is the URL segment that directly follows
__linux__/
.Type: string
Appears: binary requests only
r_version
For package binary requests, indicates the R version requested. The R version is indicated in the
User-Agent
header value.Type: string
Appears: binary requests only
match_type
For package binary requests, indicates one of the following match types.
exact
- TheUser-Agent
R version matches the distro’s default R version exactly.
best
- A best-matching R version was calculated based on theUser-Agent
header value.
forced
- TheUser-Agent
R version’s minor version matches the minor version of the distro’s R version, and the distro was configured to force using its R version.
Type: string
Appears: binary requests only
failed_service
The presence of this property indicates a service failure. If the log line includes a
message
property ofok
, then afailed_service
property indicates that a package binary request was unsuccessful, but the service was able to successfully fall back to serving the package source. However, if the log line’smessage
property includes an error message, then the request failed.Service failure values include:
service_error_ash
- Failure while calculating the unique signature (ASH) for a package binary.
service_error_fetch
- Failure while fetching package binary or source files from storage.
Type: string
Appears: failed and source fallback requests (service failure related)
failed_ua
For package binary requests, the presence of this property property indicates a User-Agent failure. User-Agent failures include:
ua_error_none
- NoUser-Agent
header was found.
ua_error_unsupported_os
- TheUser-Agent
header value indicates an OS that is not supported.
ua_error_no_r
- TheUser-Agent
header value does not indicate the R version.
ua_error_distro_mismatch
- The OS indicated by theUser-Agent
header value does not match the OS for the distro.
Type: string
Appears:
User-Agent
-related source fallback requestsua
For User-Agent failures, the
User-Agent
header value is included for reference.Type: string
Appears:
User-Agent
-related source fallback requestsfailed_match
For package binary requests, the presence of this property property indicates a failure related to calculating a best-matching R version. Values include:
match_error_no_best_match
- No matching R version was found.
match_error_force_mismatch
- The R version is being forced by the distro, but the distro’s minor version does not match the minor version indicated by theUser-Agent
header.
Type: string
Appears: Match-related source fallback requests
error
For service failures that result in falling back to serving source, an
error
property may be included to indicate an error message associated with the service failure.Type: string
Appears: source fallback requests (service failure related)
message
For requests ending in failure, the error message is indicated by the
message
property. All other requests, including requests for binaries that successfully fall back to source due to one of the above failure modes, will return a value ofok
.Type: string
Appears: all logs
7.5.2 Service Log Examples
Below are examples of typical service logs.
Serving a current package binary where the R version indicated by the
User-Agent
header is an exact match for the distro identifier’s default R version.{ "time": 1563907491, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.5.2", "match_type": "exact", "message": "ok" }
Serving a current package binary where the R version indicated by the
User-Agent
header is a “best” match for the distro identifier’s default R version.{ "time": 1563907529, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.5.3", "match_type": "best", "message": "ok" }
A current package binary was requested, but was not available to download. RStudio Package Manager automatically fell back to serving the package source, which succeeded.
{ "time": 1563907544, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.6.3", "match_type": "best", "error": "DownloadBinaryEtagRunner: file not found: https://rspm-sync.rstudio.com/bin/3.6-bionic/e3f8ab6d0bd9f83cb787b4f7472d60d98f247f64ea2c8a32aff68be6abbde5cf.tar.gz", "failed_service": "service_error_fetch", "message": "ok" }
A current package binary was requested, but no best-matching binary was available for R 3.7.3. RStudio Package Manager automatically fell back to serving the package source, which succeeded.
{ "time": 1563907562, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "r_version": "3.7.3", "failed_match": "match_error_no_best_match", "message": "ok" }
A current package binary was requested, but the
User-Agent
header did not indicate an R version. RStudio Package Manager automatically fell back to serving the package source, which succeeded. Note that theuser_agent
property is included in this case to aid in debugging.{ "time": 1563907598, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "bionic", "failed_ua": "ua_error_no_r", "user_agent": "Q (3.5.3 x86_64-pc-linux-gnu x86_64 linux-gnu)", "message": "ok" }
A current package binary was requested, but the
User-Agent
header indicated an unsupported OS. RStudio Package Manager automatically fell back to serving the package source, which succeeded. Note that theuser_agent
property is included in this case to aid in debugging.{ "time": 1564757046, "name": "plumber", "version": "0.4.6", "class": "current", "type": "binary_current", "distro": "xenial", "failed_ua": "ua_error_unsupported_os", "user_agent": "python-requests/2.9.1", "message": "ok" }
Serving a current package source.
{ "time": 1563907590, "name": "plumber", "version": "0.4.6", "class": "current", "type": "source_current", "message": "ok" }
7.6 Access Logs
The RStudio Package Manager HTTP access logs are located at
/var/log/rstudio-pm.access.log
. This file is owned by rstudio-pm
with
permissions 0600
. Log files are stored in Apache Combined Log Format. See
http://httpd.apache.org/docs/2.2/logs.html#combined for a description of this
format.
If logrotate
is available when RStudio Package Manager is installed, a logrotate
configuration will be installed. The default configuration is to rotate the
logfile daily. The old logfile will be compressed and stored alongside the
original log file with a .1.gz
extension (then .2.gz
, etc.). Logs will
maintained for 30 days.
7.7 Variable Data
RStudio Package Manager manages R packages and repositories. All package source bundles are stored in the server’s data directory. The RStudio Package Manager handles incoming requests for packages across repositories. Only a single copy of each package source is stored, even if the package is referenced in multiple repositories.
The RStudio Package Manager data directory also contains information used by the
server to manage repositories including the RStudio Package
Manager SQLite
databases and encryption key if SQLite is used.
The default location for the RStudio Package Manager data directory is
/var/lib/rstudio-pm
. This can be customized by specifying an alternate
DataDir
in the Server
section of your configuration file.
; /etc/rstudio-pm/rstudio-pm.gcfg
[Server]
DataDir = /mnt/rstudio-pm
If you customize the RStudio Package Manager data directory, make sure that
the rstudio-pm
user has permission to read, write, and create directories in
the data directory.
The RStudio Package Manager SQLite databases must exist on local storage. If
the location for DataDir
is not local storage but a networked location over
NFS, configure the Dir
setting in the SQLite
section of your server
configuration file.
; /etc/rstudio-pm/rstudio-pm.gcfg
[Server]
DataDir = /mnt/rstudio-pm
[SQLite]
Dir = /var/lib/rstudio-pm/db
7.7.1 Permissions
/var/lib/rstudio-pm
is owned by rstudio-pm
with permissions 0700
.
7.8 Variable Data Classes
All variable data storage locations default to subdirectories of the
Server.DataDir
setting. There are five classes of variable data, listed below
- Cache - Stores data to increase performance for computationally intensive
operations. Certain operations, such as Git package building, also temporarily
cache data here. Defaults to
<DataDir>/cache
. - Launcher - Stores data for Job Launcher operations. This location currently
stores the stdout and stderr data associated with each Git package builder
operation. Defaults to
<DataDir>/launcher
. - Metrics - This directory contains aggregated metrics data to improve Usage
Stats performance. Defaults to
<DataDir>/metrics
. - Packages - Package tarballs and README files are stored here. Defaults to
<DataDir>/packages
. This includes:- Package tarballs and README files for local packages,
- Package tarballs and README files git packages,
- Package tarballs for CRAN packages that have been downloaded lazily or eagerly, and
- README files for CRAN packages (downloaded either lazily or eagerly).
- Binaries - Pre-compiled R package binaries are stored here. Defaults to
<DataDir>/binaries
.
You can customize the storage directory for each storage class. For example:
; /etc/rstudio-pm/rstudio-pm.gcfg
[FileStorage "cache"]
Location = /mnt/rstudio-pm-cache
[FileStorage "launcher"]
Location = /mnt/rstudio-pm-launcher
[FileStorage "metrics"]
Location = /mnt/rstudio-pm-metrics
[FileStorage "packages"]
Location = /mnt/rstudio-pm-packages
[FileStorage "binaries"]
Location = /mnt/rstudio-pm-binaries
Again, if you customize any of the RStudio Package Manager storage directories,
make sure that the rstudio-pm
user has permission to read, write, and create
directories in each data directory.
7.9 Destinations
The five variable storage classes (see section 7.8, above) default to storing data on disk. Each storage class can optionally be configured to store data on S3. For example, to configure all five variable data storage classes for S3, use the following configuration:
; /etc/rstudio-pm/rstudio-pm.gcfg
[Storage]
Cache = s3
Launcher = s3
Metrics = s3
Packages = s3
Binaries = s3
; Default S3 settings. This is the minimum-required setting for using S3.
[S3Storage]
Bucket = your-s3-bucket
; Override default S3 settings for the "packages" class. This demonstrates
; all the available S3 configuration settings.
[S3Storage "packages"]
Bucket = another-s3-bucket
Prefix = rspm-packages
Profile = dev-rspm
Region = us-west-1
EnableSharedConfig = true
RStudio Package Manager’s AWS S3 support utilizes the AWS S3 SDK, which documents configuration and credential standards for interacting with S3 services.
See chapter 8 for information on configuring your system to use AWS S3.