Style Guide

Much of this material comes from the Tidyverse Style Guide. In extremis, you might go against this advice, if you should add a code comment in your work explaining your decision to do so.

Comments

Include comments in your code. Easy-to-understand chunks of code should not have comments. The code is the comment. But other code will merit many, many lines of comments, more lines than the code itself. In a given file, you should have about as many total lines of comments as you have lines of code.

Make your comments meaningful. They should not be a simple description of what your code does. The best comments are descriptions about why you did what you did and which other approaches you tried or considered. (The code already tells us what you did.) Good comments often have a “Dear Diary” quality: “I did this. Then I tried that. I finally chose this other thing because of reasons X, Y and Z. If I work on this again, I should look into this other approach.” Because of this, the structure is often a paragraph of comments followed by several lines of code.

Each line of a comment should begin with the comment symbol (a “hash”) followed by a single space: #. Code comments must be separated from code by one empty line on both sides. Format your code comments neatly. Place your cursor in the comment block and hit Ctrl-Shift-/ to format the comment automatically. label your R code chunks, without using weird characters or spaces. download_data is a good R code chunk label. Plot #1 is not.

Spelling matters. Comments should be constructed as sentences, with appropriate capitalization and punctuation.

Graphics

Use titles, subtitles, axis labels, captions and so on to make it clear what your graphics mean.

Anytime you make a graphic without a title (explaining what the graphic is), a subtitle (highlighting a key conclusion to draw), a caption (with some information about the source of the data) and axis labels (with information about your variables), you should justify that decision in a code comment. We (try to) always include these items but there are situations in which doing so makes less sense. Ultimately, these decisions are yours, but the readers of your code (including future-you) should understand your reasoning.

Use your best judgment. For example, sometimes axis labels are unnecessary. Read Data Visualization: A practical introduction by Kieran Healy for guidance on making high quality graphics.

Formating

Long Lines

Limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. When calling functions, you can omit the argument names for very common arguments (i.e. for arguments that are used in almost every invocation of the function). Short unnamed arguments can also go on the same line as the function name, even if the whole function call spans multiple lines.

Whitespace

|> should always have a space before it, and should usually be followed by a new line. After the first step in the pipe, each line should be indented by two spaces. This structure makes it easier to add new steps (or rearrange existing steps) and harder to overlook a step.

Show the code

# Good

iris |>
  select(Species, Sepal.Length) |> 
  summarise(avg = mean(Sepal.Length),
            .by = Species) |> 
  arrange(avg)

# Bad

iris |> select(Species, Sepal.Length) |> summarise(avg = mean(Sepal.Length),
            .by = Species) |> arrange(avg)

ggplot2 code is handled in a similar fashion. All commands after the initial invocation of ggplot() are indented.

Show the code

# Good

diamonds |> 
  ggplot(aes(x = depth)) +
    geom_histogram(bins = 100) +
    labs(title = "Distribution of Depth",
         x = "Depth",
         y = "Count")

# Bad

diamonds |> 
ggplot(aes(x = depth)) +
geom_histogram(bins = 100) + labs(title = "Distribution of Depth",
         x = "Depth",
         y = "Count")

Commas

Always put a space after a comma, never before, just like in regular English.

Show the code

# Good

x[, 1]

# Bad

x[,1]
x[ ,1]
x[ , 1]

Parentheses

Do not put spaces inside or outside parentheses for regular function calls.

Show the code

# Good

mean(x, na.rm = TRUE)

# Bad

mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )

Infix operators

Most infix operators (=, ==, +, -, <-, ~, et cetera) should be surrounded by one space.

Show the code

# Good

height <- (feet * 12) + inches
mean(x, na.rm = TRUE)
y ~ a + b


# Bad

height<-feet*12+inches
mean(x, na.rm=TRUE)
y~a + b

Other operators — like ::, :::, $, @, [, [[, ^, and : — should never be surrounded by spaces.

Show the code

# Good

sqrt(x^2 + y^2)
df$z
x <- 1:10

# Bad

sqrt(x ^ 2 + y ^ 2)
df $ z
x <- 1 : 10

You may add extra spaces if it improves alignment of = or <-.

Show the code

list(total = a + b + c,
     mean = (a + b + c) / n)

Do not add extra spaces to places where space is not usually allowed.

Messages/Warnings/Errors

R messages/warnings/errors should never appear in a submitted document. The right way to deal with these issues is to find out their cause and then fix the underlying problem. Students sometimes use “hacks” to make these messages/warnings/errors disappear. The most common hacks involve using code chunk options like message = FALSE, warning = FALSE, results = "hide", include = FALSE and others. Don’t do this, in general. A message/warning/error is worth understanding and then fixing. Don’t close your eyes (metaphorically) and pretend that the problem doesn’t exist. There are some situations, however, in which, no matter what you try, you can’t fix the problem. In those few cases, you can use one of these hacks, but you must make a code comment directly below it, explaining the situation. The only exception is the “setup” chunk (included by default in every new Rmd) which comes with include = FALSE. In that chunk, no explanation is necessary, by convention.