Getting Started

The world confronts us. Make decisions we must.

Installing R and RStudio

Analogy of difference between R and RStudio.

FIGURE 0.1: Analogy of difference between R and RStudio.

Throughout this book, we assume that you are using R via RStudio. R is to RStudio as a car’s engine is to its dashboard.

More precisely, R is a programming language that runs computations, while RStudio is an integrated development environment (IDE) that provides an interface with many convenient features. Just as having access to a speedometer, rearview mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier.

Download and install both R and RStudio (Desktop version) on your computer.

  1. Do this first: Download and install R.

  2. Do this second: Download and install RStudio Desktop (the free version).

If you want a slower introduction than we are providing here, check out the short book, Getting Used to R, RStudio, and R Markdown. Ismay and Kennedy (2016) include screencast recordings that you can follow along and pause as you learn. They include an introduction to R Markdown, a tool used for reproducible research in R.

Using R via RStudio

Icons of R versus RStudio on your computer.

FIGURE 0.2: Icons of R versus RStudio on your computer.

Much as we don’t drive a car by interacting directly with the engine but rather by interacting with elements on the car’s dashboard, we won’t be using R directly but rather we will use RStudio’s interface. After you install R and RStudio on your computer, you’ll have two new programs (also called applications) you can open. Always work in RStudio and not directly in the R application.

Let’s begin with getting familiar with RStudio. Open up RStudio. You should see three panes, or panels, dividing the screen: the Console pane, the Files pane, and the Environment pane.

This is your workspace. Start with the big pane on the left:

There are three panels (or tabs) in this window, we’ll be focusing on the Console and Terminal. When you first start R, the Console gives you some information about your version of R. The Console is where you can type and run R code. For example, if you type 1 + 1 and hit return, the Console returns 2.

Look at the top right:

The main two tabs you’ll be using are Environment and Git (which is not yet visible). The Environment tab shows you the datasets and variables you currently have loaded into R. In this case, we loaded in a dataset with 3407 rows and 5 columns and a variable x equal to 5. For you, the Environment should be empty. Let’s change that. Go to your Console and type:

x <- 5

This assigned the value 5 to an object, x. <- is the operator used to assign values to objecjts in R. Now, hit return/enter and you should see a variable x equal to 5 in your Environment tab. You must always hit return/enter after typing a command, otherwise RStudio will not realize that you want R to execute the command. Look at the bottom right window:

The Files tab displays your computer’s file system. When you create a project later, this tab will automatically show the contents of your project’s folder. The Plots tab will show you a preview of any plots you make in RStudio. Help will be discussed later.

Package installation

R packages, also known as libraries, extend the power of R by providing additional functions and data.

Analogy of R versus R packages.

FIGURE 0.3: Analogy of R versus R packages.

R is like a new mobile phone: while it has a certain amount of features when you use it for the first time, it doesn’t have everything. R packages are like the apps you can download onto your phone.

Consider an analogy to Instagram. If you have a new phone and want to share a photo with friends. You need to:

  1. Install the app: Since your phone is new and does not include the Instagram app, you need to download the app. You do this only once. (You might need to do this again in the future when there is an update to the app.)
  2. Open the app: After you’ve installed Instagram, you need to open it. You need to do this every time you use the app.

The process is very similar for an R package. You need to:

Installing versus loading an R package

FIGURE 0.4: Installing versus loading an R package

  1. Install the package: This is like installing an app on your phone. Most packages are not installed by default when you install R and RStudio. Thus if you want to use a package for the first time, you need to install it. Once you’ve installed a package, you likely won’t install it again unless you want to update it to a newer version.
  2. “Load” the package: “Loading” a package is like opening an app on your phone. Packages are not “loaded” by default when you start RStudio. You need to “load” each package you want to use every time you restart RStudio.

Let’s install three of the packages which we will need in the Primer. At the Console pane within RStudio, type:

install.packages(c("remotes", "tidyverse", "usethis"))

And press Return/Enter on your keyboard. Note you must include the quotation marks around the names of the packages. Note that these packages depend on other packages, which will be automatically installed.

One tricky aspect of this process is that R will occasionally ask you:

Do you want to install from sources the packages which 
need compilation? (Yes/no/cancel)

Unless you have a good reason not to, always answer “no” to this question.

R packages generally live in one of two places:

  • CRAN (rhymes with “clan”) for more mature, popular packages. Use install.packages(), as you did above.

  • Github for more experimental, less stable packages. Use remotes::install_github(). At the end of this section, we will install one package from Github.

Package loading

Recall that after you’ve installed a package, you need to “load” it. We do this by using the library() command.

For example, to load the remotes package, run the following code in the Console. What do we mean by “run the following code?” Either type or copy-and-paste the code into the Console and then hit the enter/return key.

After running the earlier code, a blinking cursor should appear next to the > symbol. (The > is generally referred to as the “prompt.”) This means you were successful and the remotes package is now loaded and ready to use. If, however, you get a red “error message” that reads:

Error in library(remotes) : there is no package called ‘remotes’

then you haven’t successfully installed the package. If you get this error message, make sure to install the remotes package before proceeding.

For historical reasons packages are also known as libraries, which is why the relevant command for loading them is library().

Note that R will occasionally ask you if you want to install some packages. Almost all the time you do want to, otherwise R would not be asking you.

Package use

You have to load each package you want to use every time you start RStudio. If you don’t load a package before attempting to use one of its features, you will see an error message like:

Error: could not find function

This is a different error message than the one you just saw about a package not having been installed yet. R is telling you that you are trying to use a function in a package that has not yet been loaded. R doesn’t know where to “find” the function you want to use.

Let’s install a package that is not available from CRAN: primer.tutorials. Copy and paste the following to the R Console:

library(remotes)
remotes::install_github("PPBDS/primer.tutorials")

Depending on your computer/browser/locale, this might fail, especially if the quotation marks you paste overturn “curly.” In that case, type the commands yourself.

Many other new packages will be installed, including primer.data, which provides the data sets we use in the Primer. It may take a few minutes. If something gets messed up, it is often useful to use the remove.packages() command to remove the problematic package and then install it again.

RStudio set up

You will thank us later if you run these commands now:

This changes the default value of RStudio to start each session afresh, with no junk leftover from your last session. This makes it much easier to create a reproducible analysis.

Additionally, go to the Files window, select More, and click “Show Hidden Files.”

This will show all of the hidden files that R has (in this case, they’re stored in R.proj.user). This becomes useful later on when we want to see all of the files that a folder has without hiding anything.

Creating R projects

Let’s go over the basics of R coding.

R is a programming language commonly used to analyze data. We can code in R in 2 ways:

  • By running R commands in the Console.

  • By creating an R project and running our R commands off of that.

So far, we’ve been using R commands without creating a project since we only needed to run a couple of R commands at a time. However, this isn’t very practical for more complicated projects that might need hundreds of lines of code. This is where we use R projects.

First off, create a new folder to put your projects in. Feel free to name it what you want, but make sure that there are no spaces or special characters. This creates problems down the line when we’re working with files.

Additionally, do NOT put the folder in OneDrive or a cloud file system. This prevents you from accessing certain files easily. Make sure to save the folder to your local drive.

Here's how to create a new projects folder on Windows.

FIGURE 0.5: Here’s how to create a new projects folder on Windows.

We can create an R Project by going to File -> New Project and following the dialogues. Make sure to keep your project in the folder that you just created.

For now, we won't be connecting the projects to GitHub, but you will need to do this for future projects.

FIGURE 0.6: For now, we won’t be connecting the projects to GitHub, but you will need to do this for future projects.

0.0.1 R Markdown Files

Now, we often use R to make graphs and other visualizations in order to find patterns within the data. However, it’s not very useful to just create a graph and then not explain what it’s actually showing. This is where R Markdown files come in.

R Markdown files, or .Rmd files, are a type of file that can generate reports from R code. Basically, you can write both the graph and the explanation about the graph within the same file. You can then knit the file in order to render the text and make it all look nice. Most of the time, we’ll be knitting the file to create HTML files that we can then use to create websites that contain our code.

You can learn more about how to write R Markdown files here.

First, create an R Markdown file by going to File -> New File -> New R Markdown. Feel free to title it whatever you wish and set the output format to “HTML.” This creates a new R Markdown file with some filler information.

This may look like a lot, but if you look closely you can see that it’s mostly just plain text but with a few special markings in between. These markings designate what the text actually does and what section it’s in. Here’s a basic breakdown of an R Markdown file.

Each section is important.

  • The YAML header is the basic information about the document, like your name, the date, and what it’s supposed to output.

  • The Setup code chunk is a section of R code that sets certain options about the file, such as whether it should show code chunks or not.

  • The R Markdown Header makes the text bigger and look like a title.

  • The R Markdown Text is the basic text information that you want in your report.

  • And finally, the R code chunk allows you to put R code into your report.

All of these sections combine to create a nice looking R report. Just knit your document by pressing the blue “Knit” button in the top left.

This creates a cool looking webpage that has all of the things your R Markdown file had, but rendered so that it looks nicer.

You can also add formatting to your text in R Markdown files by bolding (**), italicizing (*), or striking through (~) the text. You can find a full R Markdown cheatsheet here.

Finally, you can code in R Markdown by adding a code chunk. This allows you to put R code into your R Markdown file, which then runs like normal R code but displays in the R Markdown report. That means that you can create graphs and then add your own explanations to them within the same file.

You can see an example of a code chunk in the default R Markdown file.
It looks like this:

```{r pressure, echo=FALSE}
plot(pressure)
```

And knits like this:

Overall, R Markdown files allow you to generate cool looking reports with special formatting, letting you put both your analysis and your graph together.

Tutorials

For each chapter of the textbook, there are one or more tutorials available in the primer.tutorials package. In order to access these tutorials, you should run library(primer.tutorials) in the R Console.

You can access the tutorials via the Tutorial pane in the top right tab in RStudio. Click “Start tutorial” for the “Getting Started” tutorial. If you don’t see any tutorials, try clicking the “Home” button – the little house symbol with the thin red roof in the upper right. You may need to restart your R session. Click on the “Session” menu and select “Restart R.”

In order to expand the window, you can drag and enlarge the tutorial pane inside RStudio. In order to open a pop-up window, click the “Show in New Window” icon next to the home icon.

You may notice that the Jobs tab in the lower left will create output as the tutorial is starting up. This is because RStudio is running the code to create the tutorial. If you accidentally clicked “Start Tutorial” and would like to stop the job from running, you can click the back arrow in the Jobs tab, and then press the red stop sign icon.

Your work will be saved between RStudio sessions. You can complete the tutorial in multiple sittings. Once you have completed the tutorial, follow the instructions on the tutorial Submit panel and upload the resulting file to Canvas.

Tutorials are graded pass/fail. It is very hard to fail. As long as you make an honest attempt, you will pass easily. All of the tutorials associated with the book have prefix numbers in their identification strings: primer.tutorials: 000-getting-started, primer.tutorials: 011-rstudio-and-friends, primer.tutorials: 022-wrangling-overview, and so on. The initial two numbers in the string — 01, 02, et cetera — indicate the chapter in the Primer with which that tutorial is associated.

Now what? There are a few ways you can close out of a tutorial safely so you can quit your RStudio session.

  • If you clicked “Show in new window” and were working on the tutorial in a pop-up window, simply X out of the pop-up window.
  • If you were working on the tutorial inside the Tutorial pane of RStudio, simply press the red stop sign icon.

0.0.2 Basic Commands

Before we get started, there are a few terms, tips, and tricks that you should know before getting started with R.

  • Functions: these perform tasks by taking an input called an argument and returning an output. Take a look at the example below.
sqrt(64)
## [1] 8

sqrt() is a function that gives us the square root of the argument. 64 is the argument. Therefore, the output should be 8. Try it for yourself in the console!

  • Help files: these provide documentation for functions and datasets. You can bring up help files by adding a ? before the name of the object then run this in the console. The ggplot2 package is where a lot of the functions we will use for graphing will be located. Run ?ggplot2 in the console and see what pops up.

  • Code comments: these are text placed after a # symbol. Nothing will be run after a # symbol, which is useful when you include comments in your code, as you always should. The image below shows what a code comment looks like.

knitr::include_graphics("00-getting-started/images/code-comment.png")
  • Errors, warnings, and messages: these generally reported in a red font. When there is an error, the code will not run. Read (and/or Google) the message and try fix it. Warnings don’t prevent code from completing. For example, if you create a scatterplot based on data with two missing values, you will see this warning:

    Warning message:
    Removed 2 rows containing missing values (geom_point).  

    Messages are similar. In both cases, you should fix the underlying issue until the warning/message goes away.

Summary

You should have done the following:

  • Installed the latest versions of R and RStudio.

  • Installed, from CRAN, these three packages:

install.packages(c("remotes", "tidyverse", "usethis"))
  • Installed, from Github, this package:
remotes::install_github("PPBDS/primer.tutorials")
  • Set up your RStudio preferences:
usethis::use_blank_slate()
  • Created a project folder to place your R projects in.

  • Created a new R project and learned how to use R Markdown files.

  • Learned some basic terminology that will help you when you read the rest of the Primer.

Let’s get started.