Getting Started

You can never look at the data too much. – Mark Engerman

The world confronts us. Make decisions we must.

Installing R and RStudio

We use R via RStudio. R is to RStudio as a car’s engine is to its dashboard.

More precisely, R is a programming language that runs computations, while RStudio is an integrated development environment (IDE) that provides an interface with many convenient features. Just as having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier.

Download and install R and RStudio (Desktop version) on your computer.

  1. Download and install R. If you are using a Mac, make sure to use the correct installation, depending on whether you are using newer Apple silicon “M1-3 Macs” (first option) or older “Intel Macs” (second option). You can look up the chip used in your Mac by checking “About This Mac” under the Apple menu.

  2. Download and install RStudio Desktop (the free version).

If you are using Windows, download and install R Tools. Again, this is only for Windows users. The installer associated with “R-release” is what you want.

If you are using a Chromebook, follow this advice.

If you are having trouble, just sign up for a free account on Posit Cloud. (See here for our instructions.) This will give you 25 hours of usage, which should be enough to get started.

Using R and RStudio

Icons of R versus RStudio on your computer.

Much as we don’t drive a car by interacting directly with the engine but rather by interacting with elements on the car’s dashboard, we won’t be using R directly. Instead, we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs (also called applications) you can open. Always work in RStudio and not directly in the R application.

Open up RStudio. You should see three panes dividing the screen: the Console pane, the Environment pane, and the Output pane.

This is your workspace. Start with the big pane on the left:

There are three tabs in the Console pane. We’ll be focusing on the Console and Terminal tabs. When you first start R, the Console gives you some information about your version of R. The Console is where you can type and run R code. For example, if you type 1 + 1 and hit Enter, the Console returns 2.

We often use the phrase “run the following code.” This means that you should type or copy-and-paste the code into the Console and then hit the Enter key. (Note that Return is the name of a Mac key and Enter the name of the corresponding Windows key. We use Enter throughout the Primer.)

Switch to the top right of your workspace:

In the Environment pane, the two main tabs we will be using are Environment and Git (which is not yet visible). The Environment tab shows you the data sets and variables you currently have loaded into R. In this case, we loaded in a data set with 3407 rows and 5 columns and a variable x equal to 5. For you, the Environment should be empty. Let’s change that. Go to your Console and type:

x <- 5

This code assigns the value 5 to an object, x. <- is the operator used to assign values to objects in R. Now, hit Enter and you should see a variable x equal to 5 in your Environment tab. You must always hit Enter after typing a command, otherwise RStudio will not realize that you want R to execute the command.

Note that a “pane,” like the Environment pane, refers to an entire window of RStudio while a “tab,” like the Environment tab, refers to a single tab within the overall pane.

Switch to the Output pane on the bottom right:

The Files tab, within the Output pane, displays your computer’s file system. When you create a project later, this tab will automatically show the contents of your project’s folder.

Package installation

R packages, also known as libraries, extend the power of R by providing additional functions and data.

R is to R packages as a new phone is to apps for that phone.

R is like a new phone. While it has a certain amount of features when you use it for the first time, it doesn’t have everything. R packages are like the apps you download onto your phone.

Consider an analogy to Instagram. If you have a new phone and want to share a photo with friends. You need to:

  1. Install the app: Since your phone is new and does not include the Instagram app, you need to download the app. You do this only once. (You might need to do this again in the future when there is an update to the app.)
  2. Open the app: After you’ve installed Instagram, you need to open it. You need to do this every time you use the app.

The process is very similar for an R package. You need to:

Installing versus loading an R package
  1. Install the package: This is like installing an app on your phone. Most packages are not installed by default when you put R and RStudio on your computer. Thus if you want to use a package for the first time, you need to install it. Once you’ve installed a package, you likely won’t install it again unless you want to update it to a newer version.

  2. Load the package: “Loading” a package is like opening an app on your phone. Packages are not loaded by default when you start RStudio. You need to load each package you want to use every time you restart RStudio.

Just for this chapter, execute this command in your R Console, unless you are working on Posit Cloud.

options("pkgType" = "binary")

This helps to ensure that package installation will go more smoothly. Again, it is not necessary on Posit Cloud.

Let’s install a useful package. At the Console pane within RStudio, type:

install.packages("remotes")

And press Enter on your keyboard. You must include the quotation marks around the name of the package. A package can depend on other packages, which will be automatically installed if needed.

One tricky aspect of this process is that R will occasionally ask you:

Do you want to install from sources the packages which 
need compilation? (Yes/no/cancel)

Unless you have a good reason not to, always answer “no” to this question.

R packages generally live in one of two places:

  • CRAN (rhymes with “clan”) for more mature, popular packages. Use install.packages(), as you did above.

  • Github for more experimental, less stable packages. Use remotes::install_github().

Package loading

After you install a package, you need to “load” it by using the library() command. To load the remotes package, run the following code in the Console.

library(remotes)

After running this code, a blinking cursor should appear next to the > symbol. (The > is the “prompt.”) This means you were successful and the remotes package is now loaded and ready to use. However, you might get a red “error message” which reads:

Error in library(remotes) : there is no package called ‘remotes’

This error message means that you haven’t successfully installed the package. If you get this error message, make sure to install the remotes package before proceeding.

For historical reasons, packages are also known as libraries, which is why the relevant command for loading them is library().

R will occasionally ask you if you want to install some packages. You almost always want to, otherwise R would not be asking you.

Let’s install another package:

install.packages("tutorial.helpers")

This package provides infrastructure for doing tutorials, a common approach for learning R.

Let’s install another package:

install.packages("tidyverse")

This might take awhile since the “Tidyverse” includes so many useful packages.

Package use

You have to load each package you want to use every time you start RStudio. If you don’t load a package before attempting to use one of its features, you will see an error message like:

Error: could not find function

This is a different error message than the one you just saw about a package not having been installed yet. R is telling you that you are trying to use a function in a package that has not yet been loaded. R doesn’t know where to “find” the function you want to use.

Let’s install two other packages: r4ds.tutorials and primer.tutorials. Copy and paste the following to the R Console:

library(remotes)
install_github("PPBDS/r4ds.tutorials")

This should install easily. There is a version of r4ds.tutorials available on CRAN, but we want to get the latest version, often called the “development” version.

Next, copy and paste the following to the R Console:

remotes::install_github("PPBDS/primer.tutorials")

Instead of loading the remotes package explicitly, we used the double colon notation — :: — to refer directly to the install_github() function which resides in the remotes package.

If the download fails, try

options("timeout" = 600)

This gives the download more time to complete. Try it again.

Some other new packages will be installed as well, most importantly primer.data, which includes most of the data we use in the Primer. It may take a few minutes. If something gets messed up, it is often useful to read the error messages and see if a specific package is at fault. If so, use the remove.packages() function to remove the problematic package and then install it again.

If primer.tutorials does not install correctly, don’t worry about it. This package is much less important than r4ds.tutorials.

Tutorials

There are many tutorials available in the r4ds.tutorials and primer.tutorials packages. In order to access these tutorials, you should run library(r4ds.tutorials) or library(primer.tutorials) in the R Console.

Access the tutorials via the Tutorial tab in the Environment pane in RStudio. If you don’t see any tutorials, try clicking the “Home” icon – the little house symbol with the thin red roof. You may need to restart your R session. Click on the “Session” menu and select “Restart R”.

In order to expand the window, you can drag and enlarge the Tutorial tab inside RStudio. In order to open a pop-up window, click the “Show in New Window” icon next to the Home icon.

You may notice that the Background Jobs tab in the Console pane will create output as the tutorial is starting up. This is because RStudio is running the code to create the tutorial.

Your work will automatically be saved between RStudio sessions. You can complete the tutorial in multiple sittings.

There are two ways you can close out of a tutorial safely before quitting your RStudio session:

  • If you clicked “Show in new window” and were working on the tutorial in a pop-up window, simply close the pop-up window.

  • If you were working on the tutorial inside the Tutorial pane of RStudio, simply press the red stop sign icon.

Basic Commands

There are a few terms, tips, and tricks that you should know before getting started with R.

  • Functions: these perform tasks by taking an input called an argument and returning an output.
sqrt(64)
[1] 8

sqrt() is a function that gives us the square root of the argument. 64 is the argument. Therefore, the output should be 8. Try it for yourself in the Console!

  • Help files: these provide documentation for packages, functions and data sets. You can bring up help files by adding a ? before the name of the object in the Console. Run ?sqrt.

  • Errors, warnings, and messages: these are important communications from R to you. When there is an error, the code will not run. Read (and/or Google) the message and try to fix it. Warnings don’t prevent code from completing. For example, if you create a scatterplot based on data with two missing values, you will see this warning:

    Warning message: Removed 2 rows containing missing values (geom_point).

Messages are similar. In both cases, you should fix the underlying issue until the error/warning/message goes away.

Next Steps

Complete the “Getting Started with Tutorials” tutorial from the tutorial.helpers package. This will ensure that your RStudio settings are sensible. Note that, in the Tutorial pane, tutorials are in alphabetical order by package name. So the “Getting Started with Tutorials” tutorial from the tutorial.helpers package will be near the bottom of the Tutorial pane.

Once you have done so, or if you already know R, you can start to learn about data science.

Summary

You should have done the following:

  • Installed the latest versions of R and RStudio.

  • Installed, from CRAN, the remotes, tutorial.helpers and tidyverse packages:

options("pkgType" = "binary")
install.packages("remotes")
install.packages("tutorial.helpers")
install.packages("tidyverse")
  • Installed, from Github, the r4ds.tutorials and primer.tutorials packages:
remotes::install_github("PPBDS/r4ds.tutorials")
remotes::install_github("PPBDS/primer.tutorials")
  • Completed the Getting Started tutorial from the tutorial.helpers package.

Let’s get started.