# Tools

The chapter will be broken into the following sections. Read whichever ones are relevant.

## Working with the terminal

The Terminal is a very powerful window because it allows you to interact with your computer’s filesystem directly. Let’s open up the Terminal tab on the left window and start learning how to use the Terminal.

### pwd: Working directory

The first question you may have about working in the Terminal might be: If I can’t see the folders, how do I know where I am? Well that’s a great place to start learning the Terminal. To see what our current folder is, we type pwd (print working directory):

We are currently in a directory (or folder) called Yao, which is itself in a directory named /Users. The forward slash in front of “Users” tells is that this directory is at the lowest possible level.

### ls: Seeing items in the directory

To see the items in our current folder, we use the command ls (list). Type ls in the terminal and hit return/enter. You should see something like this:

Notice that this lists exactly the same items as the bottom right window in RStudio. The Terminal is just another way to interact with your computer’s filesystem. Anything you can do normally with your mouse/trackpad, like opening a folder, you can also do in the Terminal.

### cd: Changing directories

To move from one directory to another, we use cd (change directory). We’ll be using cd to change into the Desktop folder.

To change into the Desktop directory, we type cd Desktop/. A helpful hint, after you type the first few letters of a folder or file name, you can hit tab and the computer will auto complete the name. Try it! Type cd Desk and then hit tab to auto complete the name!

If you type ls again, you can see all the item on your Desktop listed.

To go back to the previous folder (aka the directory above), we can type cd .. The two periods represent one level above. You can see this hierarchy in this view on a Mac:

### mkdir and rmdir: Make and remove a directory

Now that we’re in the Desktop folder, let’s get set-up to stay organized for Gov 1005. Staying organized is critical when working with many data projects. So, using mkdir Gov-1005 (make directory) we can create a folder exclusively for Gov 1005 like so:

Now, when we type ls, we can see our new folder created. Note that we used a hyphen between Gov and 1005. This is because the Terminal can’t recognize spaces unless you put \ before it, like so: mkdir Gov\ 1005. Never use spaces or other weird characters in file or directory names.

To remove a folder, use rmdir (remove directory). We won’t be using this right now because we don’t need to remove anything.

### touch: Creating files

In order to experiment with the next few commands in the Terminal, we’ll need a test file. Type touch text.txt to create a test file.

And, of course, we can see that the test.txt file has been created using ls.

### mv: Moving files

Oh no! We created our test.txt file, but it should be in our Gov-1005 folder, right now it’s on the desktop. This happened because while we created the Gov-1005 folder using mkdir, we forgot to move into it by using cd Gov-1005/. But no worries, we can move the file to that folder using mv:

When using mv the first thing you type after mv is the file you want to move. The next thing is the location where you want to move it to. In our case we want to move test.txt to Gov-1005/, so we type mv test.txt Gov-1005/. After we do this, we can use cd to enter the Gov-1005 folder and then use ls to see that our test.txt file successfully was moved into the Gov-1005 directory.

### cp: Copying files

Copying files is very similar to moving files in the Terminal. Using the previous example, if we wanted to copy test.txt into the Gov-1005 folder but not delete the original test.txt file, we just replace mv with cp (copy paste):

cp test.txt Gov-1005/

### rm: Removing files

Ok, we are at the last Terminal command that this book will be teaching you. So, we’re done with this test.txt file. Let’s remove it with rm (remove):

Make sure you are the in Gov-1005 folder before you type rm test.txt! Using ls, we can see that our test file is now gone.

Congrats! You are now able to do most basic tasks with the Terminal! If you want to learn more Terminal commands, check out Sean Kross’s The Unix Workbench.

## Git, GitHub, and RStudio

This next section focuses on connecting GitHub with RStudio using Git. Why do we care about GitHub? Think of it as a Google Drive for all your R code and projects. If your computer blows up, GitHub will save all your R work just as Google Drive saves your Expos paper.

### Installing Git

The first step to using GitHub is installing Git on your computer. But first, you may already have Git installed on your computer. To check, go to your Terminal and type git --version. If you already have Git, this command will return the Git version you have installed. If you get an error, you can download and install git here.

### GitHub accounts

After installing Git, you’ll need a GitHub account. This is like a Google account. However, the one difference with GitHub is that your account is visible to the public. You want to pick a name carefully. It should be professional since you will be sending potential employers a link to your GitHub account in the near future. Check out some former Gov 1005 students’ GitHub profiles for inspiration:

Once you have a GitHub account, you are ready to connect Git and RStudio to this account. Type the following two commands in your Terminal pane. Replace Your Name with your name and your@email.com with the email you used to sign up for GitHub.

git config --global user.name "Your Name"
git config --global user.email "your@email.com"

### GitHub repositories

You are now ready to create a GitHub repository (repo). The GitHub repo is similar to a Google Drive folder. To make your first repo, make sure you are signed in and then go to the GitHub homepage and click on the green new button on the left.

You will then want to choose a good name for the repo and add a brief description. Here we will use productivity. You can choose to make the repo public or private, but we recommend that you only make a repo public if it is important for the world to see. This keeps your public GitHub profile clean and professional. This repo should probably be private. Let’s also add a README file for our repo. This is a document where you can add more information.

You now have your first repo on GitHub. The next step will be to download it to your computer — a process often known as “cloning” — and start editing and syncing using Git. To do this, we’ll need to copy the link to the repo and then use it in RStudio. Once again, the green button is your friend. Click on it and then copy the link shown. You can use the clipboard button on the right to automatically copy it.

### Connecting GitHub to RStudio

We are now ready to connect your productivity repo to RStudio. With the link to the productivity repo copied, we can go back to RStudio and begin with a new project. Go to File, then New Project:

Next, you’ll need to go through these steps to create the project: Version Control to Git to paste your link from GitHub and click Create Project.

Congrats! You’ve linked your productivity repo to RStudio. Note that Github will ask you for a location in which to place this and other projects. We recommend creating a folder on your desktop called “projects” and placing all your RStudio projects there. Don’t just scatter them across your computer in a mess. There will be dozens of them. Be organized!

### Updating .gitignore

The first thing you should always do when working with a new repo is updating the .gitignore file. You can open this file from the bottom right window under the Files tab. This file includes all the files that you don’t want to be uploaded to GitHub. This can come in handy when you are working with big datasets or files with private information. In our case, we want to add the productivityl.Rproj file to the .gitignore list.

This file is your private project file and usually you don’t want this uploaded to GitHub. So, in .gitignore, you’ll want to add *.Rproj The * tells your computer that we want to prevent all files ending in .Rproj from being uploaded. We could also just add productivity.Rproj.

Save the .gitignore file and you should see the productivity.Rproj file disappear from your Git tab in the top right window. If you don’t see any changes, click on the refresh button in the upper left.

The symbols in the Git tab are part a “conversation” between you and Git. The “?” is Git’s way of saying: “There is a new file here. What do you want to do with it?” Adding a line to the .gitignore is your way of replying “Ignore that file.”

### Commit and Push

Now that we’ve updated our .gitignore file, we want to upload this new version to GitHub. To do this, first select the .gitignore file and then click on the Commit button in the Git window:

This will open a new window where you will write a commit message. This message is a very short note on what you’re adding/changing in the repo. In our case, we’ve updated the .gitignore so let’s write just that:

Press commit. This is your way of telling Git “Yes these are the files I want to upload. I’m committed.” Next, press Push. This pushes or uploads the files to GitHub. (You can probably guess what pull does, but we won’t be using that yet)

Now, if you go to your GitHub repo and refresh the page, you can see that the .gitignore file has been uploaded with your commit message:

usethis::create_github_token()

This will, after logging in, bring you back to Github. Accept the defaults and press the Generate token button at the bottom. (You may need to change the Note if you have generated tokens before.) Copy the token which has been created. It will look something like:

8be3e800891425f8462c4491d9a4dbb5b1c1f35c

Then, issue this R command:

gitcreds::gitcreds_set()

Provide your token. After you start a new RStudio instance, Github should not ask you for your login/password again. Or it might just ask one more time. Seek help if this does not work.

Happy Git and GitHub for the useR is the best source for when Git or Github problems arise.

Before we depart from thinking about the setup of Git and GitHub, you should be aware of updates scheduled for the coming months that will affect how you are able to access your GitHub account. Currently, you are able to establish a connection between your account and R session with your username, password, and email. At present, the primary change coming will be the imposition of 2 factor authentication (2FA) over the current setup. While you are able now to edit your github account settings, we recommend that you familiarize yourself with the process while not fully enabling 2FA in the event that you wish to work on the cloud during this Spring 2021 iteration of the course. The 2FA sign-in requirement often creates a disruption in the connection with the virtual R session and won’t allow you to save your work efficiently. Should you continue to learn and work with R and Rstudio (we certainly encourage you to do so!), you should be aware of the ‘brownouts’ in June and July of this year designed to remind users to switch over to 2FA. In summary, none of this information invalidates the initial setup guides above, but rather serves as a memo should you find your future self struggling with complications related to these matters.

## PDF

Generating PDF files from RStudio is both easy and hard. It is easy because R markdown is designed to produce files in a variety of output formats, including PDF. It is hard because, for RStudio to make PDF files, your computer set up must be set up with a LaTeX installation. You have four options:

1. Making PDF files may just “work,” especially if you are using a Mac. Give it a try!

2. If it doesn’t just work, we strongly recommend using the tinytex R package. First, install the R package.

install.packages('tinytex')

Second, use the R package to install the underlying LaTeX distribution.

tinytex::install_tinytex()

Depending on your operating system, this may not work. But there should be an error message providing further instructions. Follow those instructions.

Restart R and everything should just work.

1. You can just generate an html file, open it in Chrome, select Print . . . from the drop-down menu. You will get a pop-up window. Click the down arrow to the right of Destination and choose Save as PDF in the drop-down menu. You’ll see a preview. Choose the Save as PDF option. This is not a convenient workflow but, if disaster strikes and the problem set is due in 10 minutes, it is a reasonable option.

2. You can install a full LaTeX installation yourself. Good luck! Don’t come to us for help.

## Style guide

Much of this material comes from the Tidyverse Style Guide. We will take off points on work submitted which violates these guidelines. In extremis, you may go against this advice, if you add a code comment in your work explaining your decision to do so.

Include comments in your code. Easy-to-understand chunks of code should not have comments. The code is the comment. But other code will merit many, many lines of comments, more lines than the code itself. In a given file, you should have about as many total lines of comments as you have lines of code.

Make your comments meaningful. They should not be a simple description of what your code does. The best comments are descriptions about why you did what you did and which other approaches you tried or considered. (The code already tells us what you did.) Good comments often have a “Dear Diary” quality: “I did this. Then I tried that. I finally chose this other thing because of reasons X, Y and Z. If I work on this again, I should look into this other approach.” Because of this, the structure is often a paragraph of comments followed by several lines of code.

Each line of a comment should begin with the comment symbol (a “hash”) followed by a single space: #. Code comments must be separated from code by one empty line on both sides. Format your code comments neatly. Ctrl-Shift-/ is the easiest way to do that. Name your R code chunks, without using weird characters or spaces. download_data is a good R code chunk name. Plot #1 is not.

Spelling matters. Comments should be constructed as sentences, with appropriate capitalization and punctuation.

### Graphics

Use captions, titles, axis labels and so on to make it clear what your tables and graphics mean.

Anytime you make a graphic without a title (explaining what the graphic is), a subtitle (highlighting a key conclusion to draw), a caption (with some information about the source of the data) and axis labels (with information about your variables), you should justify that decision in a code comment. We (try to) always include these items but there are situations in which that makes less sense. Ultimately, these decisions are yours, but we need to understand your reasoning.

Use your best judgment. For example, sometimes axis labels are unnecessary. Read Data Visualization: A practical introduction by Kieran Healy for guidance on making high quality graphics.

### 12.0.1 Formating

#### Long Lines

Limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. When calling functions, you can omit the argument names for very common arguments (i.e. for arguments that are used in almost every invocation of the function). Short unnamed arguments can also go on the same line as the function name, even if the whole function call spans multiple lines.

#### Whitespace

%>% should always have a space before it, and should usually be followed by a new line. After the first step in the pipe, each line should be indented by two spaces. This structure makes it easier to add new steps (or rearrange existing steps) and harder to overlook a step.

# Good

iris %>%
group_by(Species) %>%
summarize_if(is.numeric, mean) %>%
ungroup() %>%
gather(measure, value, -Species) %>%
arrange(value)

iris %>% group_by(Species) %>% summarize_all(mean) %>%
ungroup %>% gather(measure, value, -Species) %>%
arrange(value)

ggplot2 code is handled in a similar fashion. All commands after the initial invocation of ggplot() are indented.

# Good

diamonds %>%
ggplot(aes(x = depth)) +
geom_histogram(bins = 100) +
labs(title = "Distribution of Depth",
x = "Depth",
y = "Count")

diamonds %>%
ggplot(aes(x = depth)) +
geom_histogram(bins = 100) + labs(title = "Distribution of Depth",
x = "Depth",
y = "Count")

#### Commas

Always put a space after a comma, never before, just like in regular English.

# Good

x[, 1]

x[,1]
x[ ,1]
x[ , 1]

#### Parentheses

Do not put spaces inside or outside parentheses for regular function calls.

# Good

mean(x, na.rm = TRUE)

mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )

#### Infix operators

Most infix operators (=, ==, +, -, <-, et cetera) should be surrounded by one space.

# Good

height <- (feet * 12) + inches
mean(x, na.rm = TRUE)

height<-feet*12+inches
mean(x, na.rm=TRUE)

Other operators — like ~, ::, :::, $, @, [, [[, ^, and : — should never be surrounded by spaces. # Good sqrt(x^2 + y^2) df$z
x <- 1:10
y ~ a + b

sqrt(x ^ 2 + y ^ 2)
df z x <- 1 : 10 y~a + b You may add extra spaces if it improves alignment of = or <-. list(total = a + b + c, mean = (a + b + c) / n) Do not add extra spaces to places where space is not usually allowed. ### Messages/Warnings/Errors R messages/warnings/errors should never appear in a submitted document. The right way to deal with these issues is to find out their cause and then fix the underlying problem. Students sometimes use “hacks” to make these messages/warnings/errors disappear. The most common hacks involve using code chunk options like message = FALSE, warning = FALSE, results = "hide", include = FALSE and others. Don’t do this, in general. A message/warning/error is worth understanding and then fixing. Don’t close your eyes (metaphorically) and pretend that the problem doesn’t exist. There are some situations, however, in which, no matter what you try, you can’t fix the problem. In those few cases, you can use one of these hacks, but you must make a code comment directly below it, explaining the situation. The only exception is the “setup” chunk (included by default in every new Rmd) which comes with include = FALSE. In that chunk, no explanation is necessary, by convention. ## How to use Rpubs Rpubs provides a free hosting service for your R work. To use it: 1. Begin by creating a new repository on GitHub. Then clone it to your computer. We are calling the repository “rpubs_example.” As before, put *Rproj in your .gitignore file. This is to prevent your private project file from being uploaded to GitHub. 1. Start a new R Markdown file. Go to File –> New File –> R Markdown. For simplicity, leave the name “Untitled” and hit “OK.” 1. Save this file, again, as “Untitled” in your project directory. 1. Knit. You should see the following. 1. Notice a blue icon in the upper right-hand corner that reads “Publish.” Click it. 1. You will be asked whether you want to publish to RPubs or RStudio Connect. Choose RPubs. You will get a reminder that all documents you publish on RPubs are publicly visible. Click “Publish.” 1. This will take you to the RPubs website. You will need to create an account. Follow the steps as prompted. 1. Add document details. Name your document. Add a meaningful slug – otherwise you will end up with an ugly, long address you didn’t choose and can’t remember. You can leave the Description blank for simplicity of the exercise. 1. Hit “Continue,” et voilá! You have published your first document to Rpubs! 1. There is one more important step. “rsconnect” contains files specific to your computer that you do not want to push to GitHub. Therefore, as with .Rproj files before, we want to add the rsconnect folder to the .gitignore file. Click on .gitignore, add it there and hit “Save.” You will see it disappear from your GitHub in the top right window. If you don’t see any changes, hit the Refresh button in the top right corner. Since you’ve updated your .gitignore file, now is a good time to commit and push your changes to your GitHub repository. ## How to get help The best data science superpower is knowing how to ask a question. – Mara Averick ### Searching for Help with R Google is your best friend. If you have a question about doing something in R, someone has probably had the same question and someone else has probably answered it online. Stack Overflow and RStudio Community are two of the best forums for finding and asking questions/solutions. Adding “R” and/or “tidyverse” in your keyword search helps find more relevant results to your exact question. Be as specific as possible in wording your question! ### Reproducible Examples If you don’t find an answer to your question and you’re still stuck, then you should ask your own question on these forums! In order to get the best response, sharing a reproducible example with the community allows others to easily start where you left off. The reprex package allows you to do this easily. First, install the reprex package. Then, load the package. install.packages("reprex") library(reprex) Let’s now look at some problematic code. The data set “murders” from the package “dslabs” provides murder statistics, as well as population counts, for all states. Suppose we want to calculate the rate of murders in each state per 100k residents (number of murders/population * 10^6). However, in the code below we have misspelled “population,” resulting in this error: To use reprex, highlight the code and the necessary packages. Then copy the highlighted code by pressing Ctrl/Cmd + c. The code is now in the clipboard. In the Console, type reprex() and hit Enter/Return. reprex then automatically creates a reproducible example and places it in your clipboard. Now all you have to do now is go to your favorite R forum and paste your reproducible example into your question! Here is an example of how you would post it into Slack. And here is how you would post it to RStudio community. ## How to make a table gt is an R package for creating elegant tables. First, we’ll create a gt summary table of some observations from the data. Second, we’ll run a regression and display the outcome using gtsummary, a companion package to gt which specializes in presenting the results of statistical models. If you want to learn more about gt check this fantastic guide. Go here for the official gt package website. See here for an extensive guide to gtsummary. Load the necessary libraries. library(tidyverse) library(primer.data) library(gt) We set message=FALSE in the above code chunk to avoid showing all the ugly notes when these libraries are loaded. Let’s pull some data which we will use in our table: x <- trains %>% select(gender, income, att_end) %>% slice(1:5) x ## # A tibble: 5 x 3 ## gender income att_end ## <chr> <dbl> <dbl> ## 1 Female 135000 11 ## 2 Female 105000 10 ## 3 Male 135000 5 ## 4 Male 300000 11 ## 5 Male 135000 5 Create the simplest table with gt(), the key command: x %>% gt() gender income att_end Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 Now let’s make this more professional. gt offers a variety of functions to add features like these5: You can add a title and subtitle using tab_header(): x %>% gt() %>% tab_header(title = "Enos Data Observations", subtitle = "Gender, Income, and End Attitude from the Trains Data") Enos Data Observations Gender, Income, and End Attitude from the Trains Data gender income att_end Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 By default, titles and other text can not be formatted. If you want formatting, you must wrap the character string in a call to md(), where md stands for (M)ark(d)own. For example, here is a bolded title. x %>% gt()%>% tab_header(title = md("**Enos Data Observations**"), subtitle = "Gender, Income, and End Attitude from the Trains Data") Enos Data Observations Gender, Income, and End Attitude from the Trains Data gender income att_end Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 We can use tab_spanner() to add spanner columns. The vars() argument takes in the variables that the spanner column will cover. x %>% gt()%>% tab_header(title = md("**Enos Data Observations**"), subtitle = "Gender, Income, and End Attitude from the Trains Data") %>% tab_spanner(label = "Name of Spanner Column Here", vars(gender, income)) Enos Data Observations Gender, Income, and End Attitude from the Trains Data Name of Spanner Column Here att_end gender income Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 From here on, our current table will not include a spanner column. If you wish to see more examples of spanner columns, go to Chapter 4. You can change the column names using cols_label(): x %>% gt()%>% tab_header(title = md("**Enos Data Observations**"), subtitle = "Gender, Income, and End Attitude from the Trains Data") %>% cols_label(gender = "Gender", income = "Income", att_end = "End Attitude") Enos Data Observations Gender, Income, and End Attitude from the Trains Data Gender Income End Attitude Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 Use tab_source_note() to cite the source of the data or to create a caption. This function is not exclusively for providing a source — though it’s a handy way to do so — and can be used to display any text you’d like: x %>% gt()%>% tab_header(title = md("**Enos Data Observations**"), subtitle = "Gender, Income, and End Attitude from the Trains Data") %>% cols_label(gender = "Gender", income = "Income", att_end = "End Attitude") %>% tab_source_note("Source: Ryan Enos") Enos Data Observations Gender, Income, and End Attitude from the Trains Data Gender Income End Attitude Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 Source: Ryan Enos Using md() again, we can italicize the name of the Enos study in the caption: x %>% gt()%>% tab_header(title = md("**Enos Data Observations**"), subtitle = "Gender, Income, and End Attitude from the Trains Data") %>% cols_label(gender = "Gender", income = "Income", att_end = "End Attitude") %>% tab_source_note(md("Source: Ryan Enos, *Causal Effect of Intergroup Contact on Exclusionary Attitudes*")) Enos Data Observations Gender, Income, and End Attitude from the Trains Data Gender Income End Attitude Female 135000 11 Female 105000 10 Male 135000 5 Male 300000 11 Male 135000 5 Source: Ryan Enos, Causal Effect of Intergroup Contact on Exclusionary Attitudes Now that the table structure looks good, we want to format the numbers themselves. Let’s add some dollar signs to the income column using fmt_currency(). This function also adds commas (if you want commas without dollar signs use fmt_number()). The vars() within fmt_currency() denotes the variable being formatted as a currency: x %>% gt() %>% tab_header(title = md("**Enos Data Observations**"), subtitle = "Gender, Income, and End Attitude from the Trains Data")%>% cols_label(gender = "Gender", income = "Income", att_end = "End Attitude") %>% tab_source_note(md("Source: Ryan Enos, *Causal Effect of Intergroup Contact on Exclusionary Attitudes*")) %>% fmt_currency(columns = vars(income), decimals = 0)  Enos Data Observations Gender, Income, and End Attitude from the Trains Data Gender Income End Attitude Female135,000 11
Female $105,000 10 Male$135,000 5
Male $300,000 11 Male$135,000 5
Source: Ryan Enos, Causal Effect of Intergroup Contact on Exclusionary Attitudes

Note that the line return in the title between “Intergroup” and “Contact” does not effect or break up the title displayed by md().

### Regression tables

We can making a gt table with a stan_glm() regression object. Key to this is the gtsummary package and its tbl_regression() function.

library(rstanarm)
library(broom.mixed)
library(gtsummary)

fit2 <- stan_glm(att_end ~ party, data = trains, refresh = 0)

tbl_regression(fit2,
intercept = TRUE,
estimate_fun = function(x) style_sigfig(x, digits = 2)) %>%
as_gt() %>%
subtitle = "The Effect of Party on End Attitude") %>%
tab_source_note(md("Source: Ryan Enos, *Causal Effect of Intergroup
Contact on Exclusionary Attitudes*"))
The Effect of Party on End Attitude
Characteristic Beta 95% CI1
(Intercept) 8.8 8.2, 9.3
party
Democrat
Republican 2.2 0.84, 3.5
Source: Ryan Enos, Causal Effect of Intergroup Contact on Exclusionary Attitudes

1 CI = Confidence Interval