Skip to content

Python Package Management#

Package Installation#

RStudio Connect installs the Python package dependencies of Python-based content when that content is deployed. This includes FastAPI and Flask APIs, Jupyter notebooks, apps built using Dash, Bokeh, or Streamlit, and R projects that include Python.

Package dependencies are captured in one of two ways:

  1. If a requirements.txt file exists in the directory containing the project being deployed, then the contents of that file specify the dependencies. See the pip documentation for details. If you provide a requirements.txt file, you must ensure that the listed dependencies are correct for the content you are deploying.

  2. Otherwise, the pip freeze command is used to produce a full specification of the current Python environment including all installed packages and their version numbers. In a Jupyter notebook, pip freeze will be run with the version of Python being used by the active Jupyter notebook kernel. In the RStudio IDE, the environment variable RETICULATE_PYTHON will be used to determine which Python environment to inspect.

The resulting package list is included in the bundle archive file, which is uploaded to RStudio Connect. RStudio Connect receives the bundle archive file, unpacks it, and uses venv and pip to install the identified package dependencies. In addition to the specified dependencies, RStudio Connect will also install packages that it uses to deploy and render your content.

The execution environment created by RStudio Connect contains the same package versions you are using in your development environment. The use of venv isolates environments from one another to avoid package version conflicts.

Requirements Files#

A requirements.txt file lists the packages that are required by a piece of content and (optionally) their versions. The pip package manager allows additional options in requirements files, giving authors more flexibility and control over package installation. See the pip documentation for details.

Environment Caching#

RStudio Connect maintains a cache of Python environments to enable faster deployments. New environments are created as needed, based on the list of package dependencies received in the bundle and the python version in use.

Subsequent deployments that have the same list of dependencies will reuse the previously-built environment. If any dependencies are different, a new environment will be created. This enables published content to make use of different versions of dependent packages without conflict.

Providing a requirements.txt file which is the same across multiple projects is one way to facilitate environment reuse and enable faster deployments. A similar benefit is achieved in the automatic (pip freeze) case if the Python environment on the publishing computer remains the same between deployments.

Additionally, environments in RStudio Connect always inherit packages from the system-wide environment configured in RStudio Connect. Providing an empty requirements.txt file (i.e., a placeholder requirements.txt file without content) allows you to use system-wide packages without adding any to the base set. This is useful in situations where packages are centrally managed (e.g., internal company packages). To set up system-wide packages, see the External Package Installation section below.

RStudio Connect will periodically delete Python virtual environments that are no longer in use by any deployed content. The setting Application.PythonEnvironmentReapFrequency can be used to control how often this occurs.

External Package Installation#

Warning

Adding external packages decreases the reproducibility and isolation of content on RStudio Connect, and should only be done as a last resort.

You can indicate that a system-wide installation of a package should be used instead of one fetched by pip. The Python.External configuration setting can be used to enumerate each system-provided package.

For example, to make numpy and scipy external packages:

Install these packages in every Python installation that RStudio Connect will be using, e.g.:

/opt/Python/3.8.1/bin/python -m pip install numpy scipy

Then configure RStudio Connect to treat those packages as external:

; /etc/rstudio-connect/rstudio-connect.gcfg 
[Python] 
External = numpy
External = scipy

If any configured Python installation is missing one of the external packages, RStudio Connect will treat this as an error at startup. If this is a required configuration (for example, because an external package is not compatible with one of the installed Python versions), you can set Python.ExternalsCheckIsFatal to false.

External Package Version Matching#

By default, RStudio Connect attempts to match external packages by name. In the example above, if an uploaded bundle requests numpy==1.15 and the Python installation in RStudio Connect has numpy 1.18.1 installed, the external version will be used even though the version number does not match. This is similar to how external R packages are handled. Effectively, the version number specified in the incoming requirements.txt file is ignored for external packages.

To require strict version matching, honoring exactly what is specified in the bundle's requirements.txt file, you can set Python.ExternalVersionMatching to true. In this case, the version requested in the bundle's requirements.txt file will be installed if needed, and the external version will only be used if it's the correct version.

Excluding Python Packages#

To exclude a certain Python package from being utilized, set Python.ProhibitedPackage to the name of the package. Specify this property once for each Python package that is to be excluded.

This may be used, for example, when deploying Python-enabled content that utilizes OS-specific packages which are unavailable on the OS that RStudio Connect runs on.

RStudio Connect excludes certain Python packages by default. For a list of these packages please see the Python configuration appendix.

Configuring pip#

Since RStudio Connect uses pip to install Python packages, you can set package installation options by creating or modifying the pip.conf file.

The global pip configuration file is /etc/pip.conf. Alternatively, since all Python environment restore processes are run under the user account specified in the Applications.RunAs configuration, you can configure options in the default RunAs user's pip.conf file. For example, if the default RunAs user is rstudio-connect, the configuration file might be at /home/rstudio-connect/.config/pip/pip.conf.

For more information about configuring pip, refer to the pip user guide.

Specifying a Package Repository#

If you have a Python package repository for your own Python packages, or have a PyPI mirror inside your firewall, you can configure RStudio Connect to use that package repository when installing packages.

For example, to configure a private package repository with a timeout of 60 seconds, add the following to pip.conf:

[global]
timeout = 60
index-url = https://my-python-package-repo.internal.com

Note that setting index-url replaces pip's default repository (PyPI). To add a new repository, use the extra-index-url setting.