19 Python Package Management
19.1 Package Installation
RStudio Connect installs the Python package dependencies of Python-based content when that content is deployed. This includes Jupyter notebooks, as well as R projects that include Python.
Package dependencies are captured in one of two ways:
If a
requirements.txtfile exists in the directory containing the Jupyter notebook or R project being deployed, then the contents of that file specify the dependencies. See the pip documentation for details. If you provide arequirements.txtfile, you must ensure that the listed dependencies are correct for the content you are deploying.Otherwise, the
pip freezecommand is used to produce a full specification of the current Python environment including all installed packages and their version numbers. In a Jupyter notebook,pip freezewill be run with the version of Python being used by the active Jupyter notebook kernel. In the RStudio IDE, the environment variableRETICULATE_PYTHONwill be used to determine which Python environment to inspect.
The resulting package list is included in the bundle archive file, which
is uploaded to RStudio Connect. RStudio Connect receives the bundle
archive file, unpacks it, and uses virtualenv and pip to install the
identified package dependencies. In addition to the specified dependencies,
RStudio Connect will also install packages that it uses to deploy and render
your content.
The execution environment created by RStudio Connect contains the same
package versions you are using in your development environment. The use
of virtualenv isolates environments from one another to avoid package
version conflicts.
19.2 Environment Caching
RStudio Connect maintains a cache of Python environments to enable faster deployments. New environments are created as needed, based on the list of package dependencies received in the bundle and the python version in use.
Subsequent deployments that have the same list of dependencies will reuse the previously-built environment. If any dependencies are different, a new environment will be created. This enables published content to make use of different versions of dependent packages without conflict.
Providing a requirements.txt file which is the same across multiple
projects is one way to facilitate environment reuse and enable faster
deployments. A similar benefit is achieved in the automatic (pip freeze) case if the Python environment on the publishing computer
remains the same between deployments.
19.3 External Package Installation
Warning: Adding external packages decreases the reproducibility and isolation of content on RStudio Connect, and should only be done as a last resort.
You can indicate that a system-wide installation of a package should be
used instead of one fetched by pip. The
Python.External configuration setting can be used to
enumerate each system-provided package.
For example, to make numpy and scipy external packages:
Install these packages in every Python installation that RStudio Connect will be using, e.g.:
/opt/python/3.7.1/bin/python -m pip install numpy scipy
Then configure RStudio Connect to treat those packages as external:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Python]
External = numpy
External = scipy
If any configured Python installation is missing one of the external
packages, RStudio Connect will treat this as an error at startup. If
this is a required configuration (for example, because an external
package is not compatible with one of the installed Python versions),
you can set Python.ExternalsCheckIsFatal to false.
19.3.1 External Package Version Matching
By default, RStudio Connect attempts to match external packages by name.
In the example above, if an uploaded bundle requests numpy==1.15 and
the Python installation in RStudio Connect has numpy 1.16 installed,
the external version will be used even though the version number does
not match. This is similar to how external R packages are handled.
Effectively, the version number specified in the incoming
requirements.txt file is ignored for external pacakges.
To require strict version matching, honoring exactly what is specified
in the bundle’s requirements.txt file, you can set
Python.ExternalVersionMatching to true. In this case, the
version requested in the bundle’s requirements.txt file will be
installed if needed, and the external version will only be used if it’s
the correct version.
19.4 Excluding Python Packages
To exclude a certain Python package from being utilized, set
Python.ProhibitedPackage to the name of the package.
Specify this property once for each Python packge that is to be
excluded.
This may be used, for example, when deploying Python-enabled content that utilizes OS-specific packages which are unavailable on the OS that RStudio Connect runs on.
RStudio Connect excludes certain Python packages by default. For a list of these packages please see section A.26.
19.5 Configuring pip
Configuration options for pip may be set in /etc/pip.conf. These
options will be read by pip when installing Python packages. This may
be used, for example, to add additional package indexes or to control
connection timeouts.
Refer to the pip documentation for further information.