Files and Directories

Changing Ownership

Many of the files and directories mentioned in this chapter are, by default, owned by the rstudio-pm user. If you change the RunAs user for the RStudio Package Manager service, you will need to change ownership of these files and directories. See the Changing RunAs User section in the appendix for details on changing the RStudio Package Manager service RunAs user.

Program Files

The RStudio Package Manager installers place all program files into the /opt/rstudio-pm directory.

You should not need to change any files in the /opt/rstudio-pm hierarchy. Any alterations will be overwritten by subsequent re-installs or upgrades of RStudio Package Manager.

Configuration

The RStudio Package Manager configuration file is /etc/rstudio-pm/rstudio-pm.gcfg. This file is initially owned by rstudio-pm with permissions 0640. You will edit this file to properly configure RStudio Package Manager for your organization.

An example configuration file that includes all the available configuration settings along with their defaults is installed at /etc/rstudio-pm/rstudio-pm.gcfg.defaults.

A configuration management tool like Puppet or Chef can be used to maintain the rstudio-pm.gcfg file. We recommend that it remain owned by rstudio-pm and have permissions 0640, as your configuration may need to contain passwords and other sensitive information.

RStudio Package Manager upgrades will not overwrite customizations to the rstudio-pm.gcfg file.

Server Log

The RStudio Package Manager server log is located at /var/log/rstudio-pm.log. This file is owned by rstudio-pm with permissions 0600.

If logrotate is available when RStudio Package Manager is installed, a logrotate configuration will be installed. The default configuration is to rotate the logfile daily. The old log file will be stored alongside the original with a numeric extension, .1, .2, etc. The rotated log files are compressed after one day. The .1 log file is retained uncompressed, but older logs are compressed. Most systems use gzip for compression, giving log files with extensions like .2.gz, .3.gz. Logs will be maintained for 30 days.

The manual for logrotate has more information.

Service Log

RStudio Package Manager can log information about serving source and binary packages. To enable the service log, define the Server.ServiceLog configuration property.

[Server]
ServiceLog = "/var/log/rstudio-pm.service.log"

The service log file is owned by rstudio-pm with permissions 0600. The log file is stored in JSON format, with each row being a valid JSON object.

Service Log Properties

Each row of the service log is a valid JSON object. The following properties may be included.

time

A timestamp that identifies when the request was initiated.

Type: string

Appears: all logs

name

The name of the package that was requested.

Type: string

Appears: all logs

version

The version of the package that was requested.

Type: string

Appears: all logs

class

Set to one of the following

  • current - A current source package request.
  • archived - An archived source package request.
  • binary_win - A Windows binary package request.

Type: string

Appears: all logs

source

Set to the source name for the current package request. This is useful for determining which source a package originates from, particularly when a repo contains multiple sources.

Type: string

Appears: all logs

type

Indicates one of the following request types

  • source_archived - The request is for an archived package source.
  • source_current - The request is for a current package source.
  • binary_archived - The request is for an archived package binary.
  • binary_current - The request is for a current package binary.

Type: string

Appears: all logs

distro

For package binary requests, indicates the distro requested. When R is configured to use Linux binaries, the distro is the URL segment that directly follows __linux__/. On Windows, the distro indicates the R version, e.g., 3.5-win.

Type: string

Appears: binary requests only

r_version

For package binary requests, indicates the R version requested. The R version is indicated in the User-Agent header value.

Type: string

Appears: binary requests only

match_type

For package binary requests, indicates one of the following match types.

  • exact - The User-Agent R version matches the distro's default R version exactly.
  • best - A best-matching R version was calculated based on the User-Agent header value.
  • forced - The User-Agent R version's minor version matches the minor version of the distro's R version, and the distro was configured to force using its R version.

Type: string

Appears: binary requests only

failed_service

The presence of this property indicates a service failure. If the log line includes a message property of ok, then a failed_service property indicates that a package binary request was unsuccessful, but the service was able to successfully fall back to serving the package source. However, if the log line's message property includes an error message, then the request failed.

Service failure values include:

  • service_error_ash - Failure while calculating the unique signature (ASH) for a package binary.
  • service_error_fetch - Failure while fetching package binary or source files from storage.

Type: string

Appears: failed and source fallback requests (service failure related)

failed_ua

For package binary requests, the presence of this property property indicates a User-Agent failure. User-Agent failures include:

  • ua_error_none - No User-Agent header was found.
  • ua_error_unsupported_os - The User-Agent header value indicates an OS that is not supported.
  • ua_error_no_r - The User-Agent header value does not indicate the R version.
  • ua_error_distro_mismatch - The OS indicated by the User-Agent header value does not match the OS for the distro.

Type: string

Appears: User-Agent-related source fallback requests

user_agent

For User-Agent failures, the User-Agent header value is included for reference.

Type: string

Appears: User-Agent-related source fallback requests

failed_match

For package binary requests, the presence of this property property indicates a failure related to calculating a best-matching R version. Values include:

  • match_error_no_best_match - No matching R version was found.
  • match_error_force_mismatch - The R version is being forced by the distro, but the distro's minor version does not match the minor version indicated by the User-Agent header.

Type: string

Appears: Match-related source fallback requests

error

For service failures that result in falling back to serving source, an error property may be included to indicate an error message associated with the service failure.

Type: string

Appears: source fallback requests (service failure related)

message

For requests ending in failure, the error message is indicated by the message property. All other requests, including requests for binaries that successfully fall back to source due to one of the above failure modes, will return a value of ok.

Type: string

Appears: all logs

Service Log Examples

Below are examples of typical service logs.

  • Serving a current package binary where the R version indicated by the User-Agent header is an exact match for the distro identifier's default R version.

    {
        "time": 1563907491,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "binary_current",
        "distro": "bionic",
        "r_version": "3.5.2",
        "match_type": "exact",
        "message": "ok"
    }
    
  • Serving a current package binary where the R version indicated by the User-Agent header is a "best" match for the distro identifier's default R version.

    {
        "time": 1563907529,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "binary_current",
        "distro": "bionic",
        "r_version": "3.5.3",
        "match_type": "best",
        "message": "ok"
    }
    
  • A current package binary was requested, but was not available to download. RStudio Package Manager automatically fell back to serving the package source, which succeeded.

    {
        "time": 1563907544,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "binary_current",
        "distro": "bionic",
        "r_version": "3.6.3",
        "match_type": "best",
        "error": "DownloadBinaryEtagRunner: file not found: https://rspm-sync.rstudio.com/bin/3.6-bionic/e3f8ab6d0bd9f83cb787b4f7472d60d98f247f64ea2c8a32aff68be6abbde5cf.tar.gz",
        "failed_service": "service_error_fetch",
        "message": "ok"
    }
    
  • A current package binary was requested, but no best-matching binary was available for R 3.7.3. RStudio Package Manager automatically fell back to serving the package source, which succeeded.

    {
        "time": 1563907562,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "binary_current",
        "distro": "bionic",
        "r_version": "3.7.3",
        "failed_match": "match_error_no_best_match",
        "message": "ok"
    }
    
  • A current package binary was requested, but the User-Agent header did not indicate an R version. RStudio Package Manager automatically fell back to serving the package source, which succeeded. Note that the user_agent property is included in this case to aid in debugging.

    {
        "time": 1563907598,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "binary_current",
        "distro": "bionic",
        "failed_ua": "ua_error_no_r",
        "user_agent": "Q (3.5.3 x86_64-pc-linux-gnu x86_64 linux-gnu)",
        "message": "ok"
    }
    
  • A current package binary was requested, but the User-Agent header indicated an unsupported OS. RStudio Package Manager automatically fell back to serving the package source, which succeeded. Note that the user_agent property is included in this case to aid in debugging.

    {
        "time": 1564757046,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "binary_current",
        "distro": "xenial",
        "failed_ua": "ua_error_unsupported_os",
        "user_agent": "python-requests/2.9.1",
        "message": "ok"
    }
    
  • Serving a current package source.

    {
        "time": 1563907590,
        "name": "plumber",
        "version": "0.4.6",
        "class": "current",
        "type": "source_current",
        "message": "ok"
    }
    

Access Logs

The RStudio Package Manager HTTP access logs are located at /var/log/rstudio-pm.access.log. This file is owned by rstudio-pm with permissions 0600. Log files are stored in Apache Combined Log Format. See http://httpd.apache.org/docs/2.2/logs.html#combined for a description of this format.

If logrotate is available when RStudio Package Manager is installed, a logrotate configuration will be installed. The default configuration is to rotate the logfile daily. The old logfile will be compressed and stored alongside the original log file with a .1.gz extension (then .2.gz, etc.). Logs will maintained for 30 days.

Variable Data

RStudio Package Manager manages R packages and repositories. All package source bundles are stored in the server's data directory. The RStudio Package Manager handles incoming requests for packages across repositories. Only a single copy of each package source is stored, even if the package is referenced in multiple repositories.

The RStudio Package Manager data directory also contains information used by the server to manage repositories including the RStudio Package Manager SQLite databases and encryption key if SQLite is used.

The default location for the RStudio Package Manager data directory is /var/lib/rstudio-pm. This can be customized by specifying an alternate DataDir in the Server section of your configuration file.

; /etc/rstudio-pm/rstudio-pm.gcfg

[Server]
DataDir = /mnt/rstudio-pm

If you customize the RStudio Package Manager data directory, make sure that the rstudio-pm user has permission to read, write, and create directories in the data directory.

The RStudio Package Manager SQLite databases must exist on local storage. If the location for DataDir is not local storage but a networked location over NFS, configure the Dir setting in the SQLite section of your server configuration file.

; /etc/rstudio-pm/rstudio-pm.gcfg

[Server]
DataDir = /mnt/rstudio-pm

[SQLite]
Dir = /var/lib/rstudio-pm/db

Permissions

/var/lib/rstudio-pm is owned by rstudio-pm with permissions 0700.

Variable Data Classes

All variable data storage locations default to subdirectories of the Server.DataDir setting. There are six classes of variable data, listed below

  • Cache - Stores data to increase performance for computationally intensive operations. Certain operations, such as Git package building, also temporarily cache data here. Defaults to <DataDir>/cache.
  • Launcher - Stores data for Job Launcher operations. This location currently stores the stdout and stderr data associated with each Git package builder operation. Defaults to <DataDir>/launcher.
  • Metrics - This directory contains aggregated metrics data to improve Usage Stats performance. Defaults to <DataDir>/metrics.
  • Packages - Package tarballs and README files are stored here. Defaults to <DataDir>/packages. This includes:
  • Package tarballs and README files for local packages.
  • Package tarballs and README files git packages.
  • CRAN - Package tarballs and README files for CRAN are stored here. Defaults to <DataDir>/cran. This includes:
  • Package tarballs for CRAN packages that have been downloaded.
  • README files for CRAN packages.
  • Binaries - Pre-compiled R package binaries are stored here. Defaults to <DataDir>/binaries.

You can customize the storage directory for each storage class. For example:

; /etc/rstudio-pm/rstudio-pm.gcfg

[FileStorage "cache"]
Location = /mnt/rstudio-pm-cache

[FileStorage "launcher"]
Location = /mnt/rstudio-pm-launcher

[FileStorage "metrics"]
Location = /mnt/rstudio-pm-metrics

[FileStorage "packages"]
Location = /mnt/rstudio-pm-packages

[FileStorage "cran"]
Location = /mnt/rstudio-pm-cran

[FileStorage "binaries"]
Location = /mnt/rstudio-pm-binaries

Again, if you customize any of the RStudio Package Manager storage directories, make sure that the rstudio-pm user has permission to read, write, and create directories in each data directory.

Destinations

The six variable storage classes (see the Variable Data Classes section above) default to storing data on disk. Each storage class can optionally be configured to store data on S3. For example, to configure all six variable data storage classes for S3, use the following configuration:

; /etc/rstudio-pm/rstudio-pm.gcfg

[Storage]
Cache = s3
Launcher = s3
Metrics = s3
Packages = s3
CRAN = s3
Binaries = s3

; Default S3 settings. This is the minimum-required setting for using S3.
[S3Storage]
Bucket = your-s3-bucket

; Override default S3 settings for the "packages" class. This demonstrates
; all the available S3 configuration settings.
[S3Storage "packages"]
Bucket = another-s3-bucket
Prefix = rspm-packages
Profile = dev-rspm
Region = us-west-1
EnableSharedConfig = true

RStudio Package Manager's AWS S3 support utilizes the AWS S3 SDK, which documents configuration and credential standards for interacting with S3 services.

See the S3 Configuration chapter for information on configuring your system to use AWS S3.

Server Migrations

You may need to migrate your RStudio Package Manager installation:

  • When moving from one environment to another (e.g., physical to virtual or on-prem to cloud)
  • To facilitate HA setup with more than one node

Several factors must be considered before migrating your RStudio Package Manager installation from one server to another. We recommend that you don't make any configuration changes (or as few as possible) during the initial migration. If, for instance, you will be migrating to a new server and upgrading to a new default version of R, complete the migration first. Then upgrade R in subsequent steps.

Before you migrate the server, you need to perform a backup to obtain a consistent copy of the data in the necessary directories. These directories can then be copied to the new server.

  1. Install RStudio Package Manager on the new server, then stop the service.
  2. Mirror the Unix accounts used by RStudio Package Manager on the existing server to the new server. See the Account for Processes section. If you are using the default rstudio-pm account and group, then you will only need to consider the user account that needs permission to use the CLI.
  3. Copy the config and data directories while preserving the permissions and file ownership. Not all file transfer clients can preserve these attributes, so consider using rsync with the -a flag to copy the data.
  4. Update your /etc/rstudio-pm/rstudio-pm.gcfg file if you've changed settings like the path to your data directory.
  5. Install the same version(s) of R on the new server to mimic existing behavior. If you must install a different version of R, RStudio Package Manager will still function correctly, but certain functions (like building Git packages) may be affected.

If you are also migrating to a different database provider, see the section on Changing the Database Provider.