16 Historical Information

This section describes the configuration and management of historical information, of which there are two types. Metrics about resource usage over time (trends) are enabled with the Metrics.Enabled setting. More discrete usage events are enabled with the Metrics.Instrumentation setting. Both settings are true, by default; to disable either one, set it to false in your configuration file.

16.1 Historical Metrics

RStudio Connect uses a separate rserver-monitor process to record resource (CPU, memory, etc.) usage over time. It is only active when historical metrics are enabled. The customization settings described in the remainder of this section have no effect when Metrics.Enabled is off.

16.1.1 Historical Metrics Settings

Metrics data is written by default to a set of RRD files. This data is stored by default at /var/lib/rstudio-connect/metrics. You can specify an alternate data path by using the DataPath setting mentioned in Section A.22.

The rserver-monitor process runs (by default) with the same user account Connect uses to run its processes associated with deployed content. By default, this user account is rstudio-connect (see the RunAs setting in Section A.17). You can specify an alternate user account for the rserver-monitor process by modifying the User setting. See Section A.22 for details.

RStudio Connect also supports logging of metrics to Graphite, and it supports disabling its default behavior of logging to RRD. Please see Section A.22 for more options for configuring the historical metrics in Connect.

16.1.2 Historical Metrics Process Management

Connect automatically spawns a process (rserver-monitor) to help maintain historical data. If this process exits, Connect will restart it in an attempt to record as much historical information as possible. Connect will delay restarting rserver-monitor if it observes rapid, repeated failures.

Since the rserver-monitor needs permission to write data to the metrics data directory, Connect attempts to ensure the necessary permissions at startup. When Connect starts, it grants ownership of the metrics data directory to the user account that will be used to start rserver-monitor.

16.1.3 Historical Metrics Process Logging

The rserver-monitor process logs its output to syslog. If the process is unable to run, you can check the system log (e.g., /var/log/messages or /var/log/syslog) for messages.

16.2 Historical Events

RStudio Connect can record event-style usage information which is intended to answer questions like, “Who used my Shiny app and for how long?” This information is stored in dedicated tables in the database. When using SQLite, this is handled automatically by creating a second database file named from SQLite.Name with -instrumentation appended. For PostgreSQL, a second, full database URL can be provided in the Postgres.InstrumentationURL setting. If it is not specified, it will default to the value of Postgres.URL. This allows you to store the event data in the same place as the rest of the Connect information, in a different schema, or even a different database, whichever meets your needs best. Please see Section 9.2 for more details about using Postgres.

Note: There is currently no data retention policy so all data will always be kept. Data retention controls will be added in a future release.

Note: This data is not migrated by the migrate tool (see 9.3).

16.2.1 Shiny Application Events

When a user opens a Shiny application, an event containing their user information and the length of their session will be logged to the instrumentation database.

These events may be be retrieved by making use of the “Get Shiny App Usage” API. The API returns information in pages and provides URLs in each response that may be used as-is to request the next or previous page of information. All data may be retrieved by first invoking the endpoint without next or previous parameters to return the first page of results and then repeatedly following the “next” link in each response until that link becomes null.

The API may only be used by administrators and publishers. Additionally, publishers may only retrieve information about the shiny apps that they own.

Optional filters within the request may be used to limit what usage records are returned with each response. Filters are “ANDed” together (i.e. data returned will satisfy all filters).

Application GUIDs may be provided to limit responses to particular applications. A publisher will be implicitly limited to only applications he or she owns. If a publisher asks for information about other content (content owned by someone else), the result will not contain data for that application and will not be reported as an error.

Timestamps may be provided to limit usage information to a more narrow time window of interest. By using the from or to filters, either independently or together, the information returned will be limited to applications that were being accessed within that window of time. It’s worth noting that information for an app will be included in such a request if any portion of its usage by a user falls within the specified time window.

More details for using the API may be found in the “Instrumentation” section of the API documentation for Connect.

16.2.2 RMD and Static Report Events

When a user visits an RMD or report page (such as a plot), an event containing their user information and information about the content visited will be logged to the instrumentation database.

This event information is not currently presented in the dashboard or via an API.

16.2.4 Server Node Session Events

Note: Node session events may be erroneous if you have multiple nodes with the same hostname and do not reconfigure your node name, as explained below.

When a node is started, an event will be logged to the database containing a node name, the server start time, and a periodically-updating heartbeat timestamp indicating the length of the node’s running session. A node that exits cleanly will log true to the exited_cleanly column for its session.

If exited_cleanly is false, it means either:

The node is still running. In this case, the heartbeat will continue to update.
The node’s rstudio-connect process was terminated with SIGKILL, or the system lost power while the process was running. Confirm this by cross-referencing the node’s log file with the row for the session at issue. The log for that session will end abruptly in this case.
The node was terminated with SIGTERM or SIGINT, but did not successfully write to the database before being terminated with SIGKILL or before the system lost power. Confirm this by cross-referencing the node’s log file, looking specifically for the line beginning with Caught SIGINT/SIGTERM. The log for that session will end abruptly after that line in this case.
The node was terminated with SIGTERM or SIGINT, but couldn’t write to the database for some other reason. Confirm this by cross-referencing the node’s log file, looking specifically for Error storing server exit time. The log for that session will contain that line in this case.

The node name defaults to the node’s hostname, but can be changed using the Server.NodeName configuration setting or the RSTUDIO_CONNECT_NODE_NAME environment variable. The node name MUST be unique for every node in your cluster. RStudio Connect cannot detect duplicate node names at this time, including the situation where multiple nodes have the same hostname.

The heartbeat timestamp can be changed from its default of 30m by setting Metrics.InstrumentationServerHeartbeat to another duration.