Historical Information#
This section describes the configuration and management of historical
information, of which there are two types. Metrics about resource usage over
time (trends) are enabled with the
Metrics.Enabled
setting. More
discrete usage events are enabled with the
Metrics.Instrumentation
setting. Both settings are true
, by default; to disable either one, set it
to false
in your configuration file.
Historical Metrics#
RStudio Connect uses a separate rserver-monitor
process to record resource
(CPU, memory, etc.) usage over time. It is only active when historical metrics
are enabled. The customization settings described in the remainder of this
section have no effect when
Metrics.Enabled
is off.
Historical Metrics Settings#
Metrics data is written by default to a set of
RRD files. This data is stored by default at
/var/lib/rstudio-connect/metrics
. You can specify an alternate data path by
using the Metrics.DataPath
setting.
The rserver-monitor
process runs (by default) with the same user account
Connect uses to run its processes associated with deployed content. By
default, this user account is rstudio-connect
(see the
Applications.RunAs
setting).
You can specify an alternate user account for the rserver-monitor
process by
modifying the Metrics.User
setting.
RStudio Connect also supports logging of metrics to Graphite, and it supports disabling its default behavior of logging to RRD. Please see the Metrics configuration appendix for more options for configuring the historical metrics in Connect.
Historical Metrics Process Management#
Connect automatically spawns a process (rserver-monitor
) to help maintain
historical data. If this process exits, Connect will restart it in an attempt
to record as much historical information as possible. Connect will delay
restarting rserver-monitor
if it observes rapid, repeated failures.
Since the rserver-monitor
needs permission to write data to the metrics data
directory, Connect attempts to ensure the necessary permissions at startup.
When Connect starts, it grants ownership of the metrics data directory to the
user account that will be used to start rserver-monitor
.
Historical Metrics Process Logging#
The rserver-monitor
process logs its output to syslog. If the process is
unable to run, you can check the system log (e.g., /var/log/messages
or
/var/log/syslog
) for messages.
Historical Events#
RStudio Connect can record event-style usage information which is intended to
answer questions like, "Who used my Shiny app and for how long?" This
information is stored in dedicated tables in the database. When using SQLite,
this is handled automatically by creating a second database file named
connect-instrumentation
. For PostgreSQL, a second, full database URL can be
provided in the
Postgres.InstrumentationURL
setting. If it is not specified, it will default to the value of
Postgres.URL
. This allows you to
store the event data in the same place as the rest of the Connect information,
in a different schema, or even a different database, whichever meets your
needs best. Please see the PostgreSQL section for
more details about using Postgres.
Note
There is currently no data retention policy so all data will always be kept. Data retention controls will be added in a future release.
Note
This data is not migrated by the migrate
tool (see Changing Database Provider).
Shiny Application Events#
When a user opens a Shiny application, an event containing their user information and the length of their session will be logged to the instrumentation database. It's important to note that some configuration settings may affect how the ending time for a session is set.
-
The ended time will always include the duration configured in the
Client.ReconnectTimeout
property. This is 15 seconds, by default. -
The
Scheduler.ConnectionTimeout
andScheduler.ReadTimeout
values control when an idle session is terminated by RStudio Connect. This will, as a side effect, set the ended time for the session.
These events may be retrieved by making use of the "Get Shiny Application
Usage" API. The API returns information in pages and provides URLs in each
response that may be used as-is to request the next or previous page of
information. All data may be retrieved by first invoking the endpoint without
next
or previous
parameters to return the first page of results and then
repeatedly following the next
link in each response until that link becomes
null.
The API may only be used by administrators and publishers. Additionally, publishers may only retrieve information about the shiny apps that they own.
Optional filters within the request may be used to limit what usage records are returned with each response. Filters are "ANDed" together (i.e. data returned will satisfy all filters).
Application GUIDs may be provided to limit responses to particular applications. A publisher will be implicitly limited to only applications he or she owns. If a publisher asks for information about other content (content owned by someone else), the result will not contain data for that application and will not be reported as an error.
Timestamps may be provided to limit usage information to a more narrow time
window of interest. By using the from
or to
filters, either independently
or together, the information returned will be limited to applications that
were being accessed within that window of time. It's worth noting that
information for an app will be included in such a request if any portion of
its usage by a user falls within the specified time window.
The data returned for a shiny application session includes a version number. As the Connect software has evolved, issues have been identified with how data is recorded. The version number provides an indication of any known issues and are described below.
Version | Issue |
---|---|
0 |
Extraneous records were recorded under some conditions, notably when
protocols other than websockets were used. These can be identified
by the same value for started and ended .
This may adversely affect analyses involving counts or session
lengths.
|
1 |
No known issues. |
The min_data_version
filter may be used to control what data to return. The
default minimum data version to return is 1
.
Code examples showing how to access this data are in the User Activity API Cookbook recipes. The Connect Server API Reference documents each of the Instrumentation APIs used by the recipes.
Content Visit Events#
When a user visits a document, plot, or application other than a Shiny application, an event will be logged to the instrumentation database noting the time of the visit, the visitor and the content visited.
These events may be retrieved by making use of the "Get Content Visits" API.
The API returns information in pages and provides URLs in each response that
may be used as-is to request the next or previous page of information. All
data may be retrieved by first invoking the endpoint without next
or
previous
parameters to return the first page of results and then repeatedly
following the next
link in each response until that link becomes null.
The API may only be used by administrators and publishers. Additionally, publishers may only retrieve information about the content that they own.
Optional filters within the request may be used to limit what visit records are returned with each response. Filters are "ANDed" together (i.e. data returned will satisfy all filters).
Application GUIDs may be provided to limit responses to particular content. A publisher will be implicitly limited to only content he or she owns. If a publisher asks for information about other content (content owned by someone else), the result will not contain data for that content and will not be reported as an error.
Timestamps may be provided to limit visit information to a more narrow time
window of interest. By using the from
or to
filters, either independently
or together, the information returned will be limited to content that was
visited within that window of time.
The data returned for a visit to content includes a version number. As the Connect software has evolved, issues have been identified with how data is recorded. The version number provides an indication of any known issues and are described below.
Version | Issue |
---|---|
`0` | Extraneous records were recorded under some conditions, notably when content is not rendered to a self-contained page _and_ refers to images, CSS, JavaScript, and the like as files external to the page but within the content. |
`1` | No known issues. |
The min_data_version
filter may be used to control what data to return. The
default minimum data version to return is 1
.
Code examples showing how to access this data are in the User Activity API Cookbook recipes. The Connect Server API Reference documents each of the Instrumentation APIs used by the recipes.
User Login Events#
When a user logs in to the Connect dashboard their user information will be logged to the instrumentation database.
This event information is not currently presented in the dashboard or via an API.
Server Node Session Events#
Note
Node session events may be erroneous if you have multiple nodes with the same hostname and do not reconfigure your node name, as explained below.
When a node is started, an event will be logged to the database containing a
node name, the server start time, and a periodically-updating heartbeat
timestamp indicating the length of the node's running session. A node that
exits cleanly will log true
to the exited_cleanly
column for its session.
If exited_cleanly
is false
, it means either:
-
The node is still running. In this case, the heartbeat will continue to update.
-
The node's
rstudio-connect
process was terminated withSIGKILL
, or the system lost power while the process was running. Confirm this by cross-referencing the node's log file with the row for the session at issue. The log for that session will end abruptly in this case. -
The node was terminated with
SIGTERM
orSIGINT
, but did not successfully write to the database before being terminated withSIGKILL
or before the system lost power. Confirm this by cross-referencing the node's log file, looking specifically for the line beginning withCaught SIGINT/SIGTERM
. The log for that session will end abruptly after that line in this case. -
The node was terminated with
SIGTERM
orSIGINT
, but couldn't write to the database for some other reason. Confirm this by cross-referencing the node's log file, looking specifically forError storing server exit time
. The log for that session will contain that line in this case.
The node name defaults to the node's hostname, but can be changed using the
Server.NodeName
configuration
setting or the RSTUDIO_CONNECT_NODE_NAME
environment variable. The node name
MUST be unique for every node in your cluster. RStudio Connect cannot
detect duplicate node names at this time, including the situation where
multiple nodes have the same hostname.
The heartbeat timestamp can be changed from its default of 30m
by setting
Metrics.InstrumentationServerHeartbeat
to another duration.
This event information is not currently presented in the dashboard or via an API.