Package Ecosystem¶
The R package ecosystem has a few key components.
Packages¶
Packages are the primary extension mechanism for R. They can be used to share functions, datasets, and documentation. An R package can exist in a few states:
Source¶
An R package is composed of a series of directories and files. The source of an R package is just a top-level directory containing the coponents of the package. Package authors work with source packages during development. Git(hub) repositories store source packages.
Bundle¶
A bundled package is a package thats been compressed into a single file. By convention, package bundles in R use the extension .tar.gz.
Binary¶
A binary package is the result of building a source package for a specific operating system. Binary packages are single files that are ready for installation on their specific operating systems.
Installed¶
An installed package is a binary package that has been decompressed into a package library and is ready for use by R.
Repositories¶
Repositories organize R packages for distribution to end users. Repositories
contain package bundles and binaries that are organized in a specific way so that
users can install packages from the repository using R's install.packages
command. CRAN and Bioconductor are examples of R repositories.
Git(hub)¶
Many R package sources are stored in version controlled directories. A popular
versioning tool is Git. Github, as an extension of Git, houses many package
sources. The devtools
R package includes convenience functions for installing
packages from the package source contained on a Git repository, including
Github. Used in this manner, git repositories and Github are one way to
distribute R packages, but Github and Git repositories are not R package
repositories.
Libraries¶
End users of R typically interact with installed packages that live in libraries. Package libraries are just directories containing installed packages. When a package is requested by R, R searches the different library directories to find the installed package.
R libraries are very flexible. In the past, R users have set up libraries for specific projects or set up a system-wide library used across multiple projects. In multi-tenant servers it has been common to have both a system library shared by all users and user-specific libraries.
A best practice is to set up per-project libraries alongside a package cache.