Packages in R
What is an R package?
R packages contain collections of functions and tools for your research project. You can view a package as an extension to R and such extensions intend to facilitate and expand the functionality of R. You will need functions which are not included in the default R installation and there are over 15.000 R packages containing such functions. You can install any package within a few seconds. All packages are written by the R community and most packages undergo validation by experts so that the quality can be trusted. Every R package comes with detailed documentation.
Some examples of popular R packages for research:
ggplot2
package for visualizing data in R.dplyr
package for data wrangling in R.tidyr
package for tidying data in R.purrr
package for writing functions effectively in R.survival
package for survival analysis in R.lme4
package for creating mixed (random) effects models.caret
package for pre-processing of data, elaborating and evaluating prediction models across all commonly used frameworks.mlr3
another package for pre-processing of data, elaborating and evaluating prediction models across all commonly used frameworks.
In summary: there is most likely an R package for whatever you want to do.
How to install an R package
Before using any R package you need to install the package and activate it, so that you can use it in your current R session (a session is launched every time you start R). R comes with a few packages pre-installed. Such packages contain core functionality, e.g basic mathematical operations, handling of data frames, etc. However, the vast majority of packages are not installed by default. To use them you need to download and install them. Note the following:
- You only install a package once.
- You must load a package every session you want to use it.
Here is how you install the dplyr
package:
install.packages("dplyr")
Note the quotation marks above.
How to update an R package.
You may occasionally need to update an R package. This is done by executing the following command:
update.packages("dplyr")
Installing packages in R using the click interface
You can also install packages in R without writing code. We do not recommend this as it is always preferred to have all operations/commands documented in the code. To install the rms
package, which contains many useful functions for regression modeling, do the following:
- In the Files pane of RStudio:
- Click on the “Packages” tab
- Click on “Install”
- Type the name of the package under “Packages (separate multiple with space or comma):” In this case, type
rms
- Click “Install”

However, the best way is to install the packages by writing install.packages("rms")
in the source/script pane.
Let’s install some packages. Since we will be installing multiple packages, we will create a vector that includes the names of all desired packages. We’ll do this in two different ways, which yields identical results.
Method 1
# create an object called new_packages,
# which is a vector containing names of desired packages
new_packages <- c("dplyr", "ggplot2", "rms", "survival")
# install these packages
install.packages(new_packages)
Method 2
install.packages(c("dplyr", "ggplot2", "rms", "survival"))
Method 2 consumes fewer rows and performs the same task. This is one of the beauties of R: you can embed functions within other functions! In this case c()
is a function, which is contained in the function install.packages()
.
6.1.4 Loading packages in R
Every time you launch R a new session is started. This means that your current working environment is empty, as shown below.

6.1.5 How to load a package in R
When you start RStudio some packages are loaded by default. These are the basic packages (also called R base), i.e packages containing fundamental functions. You will soon have hundreds of other packages installed but they will not be loaded automatically when you start a new session. You need to load these packages manually every time you start a new session in RStudio. Loading packages is done using the library()
command.
For example, to load the ggplot2
and dplyr
packages, run the following commands in the Console pane:
library(ggplot2)
library(dplyr)
Note that quotation marks are not required when loading packages.
6.1.5.1 Errors when loading packages in R
R will return an error if you attempt to load a package which is not installed. We will now try to load a package called polish which is not installed:
library(polish)
Error in library(polish): there is no package called 'polish'
6.1.5.2 Successful loading of a package in R
R will return a message when a package is successfully loaded. This message could include a message from the package author or other important information. Loading the package dplyr
results in the following message:
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
There are important notes in the message above. You will see similar messages often and you have to pay attention to them. The message says the following:
- Loading
dplyr
resulted in masking of some functions in other packages. - Specifically, loading of
dplyr
resulted in masking of the functionsfilter
andlag
which are contained in thestats
package, as well as the functionsintersect
,setdiff
,setequal
andunion
which are contained in thebase
package. - For example, if you use the
filter()
function now, R will apply thefilter()
function of dplyr, and not thefilter()
function of stats. You can however, force R to use thefilter()
function of stats; to do so, you declare explicitly that you desire the function from the stats package, as follows:stats::filter()
.
6.1.6 Tidyverse – The Revolution
R has traditionally been considered as a difficult language to learn. Consider the situation where you wish to filter your data frame in order to keep a subset of your original observations. You may, for example, have a data frame with measurements on men and women and now wish to only keep the men for further analyses. In the old days, we used to write:
my_data_frame[my_data_frame$Sex == 'Males',]
my_data_frame
is the name of the data frame.Sex
is a variable in that data frame.- The
$
symbol is used to refer to a variable (column) inmy_data_frame
.my_data_frame$Sex
means that we wish to accessSex
inmy_data_frame
.- We use brackets (
[]
) to subset the data frame and we apply the condition thatSex
should equal “Males.” - We write a comma (
,
) after “Males.”
This is just one of many examples of R code which most users find difficult to write and read. Thetidyverse
contains numerous functions which simplify life in R. Indeed, the tidyverse
makes it possible for anyone to read and write R code. Below follows the same code using dplyr
, which is one of the packages included in tidyverse
:
filter(my_data_frame, Sex=="Males")
To install the tidyverse
you enter the following command:
install.packages("tidyverse")
This will download and install tidyverse
from CRAN. The tidyverse
package actually includes several packages:
ggplot2
for creating graphicstibble
for handling data framestidyr
for tidying data framesreadr
for importing data into Rpurrr
for applying functions in various waysdplyr
for manipulating data frames
6.1.7 Other R packages
You can view all R packages on CRAN: