There are no questions here. There are only simple instructions.
Tutorials are not challenging. They are confidence-building.
Create the shallowest possible learning curve.
Every word matters. Never waste a student’s time.
Drop some knowledge with each exercise.
Introduction
This document describes the best way to write R tutorials using the learnr package. The most common use case is a tutorial which covers the material in an assigned textbook, as with the r4ds.tutorials package for R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund..
Instructors assign textbook readings to students. We want our students to read and, perhaps more importantly, go through the associated code, typing it in and confirming what it does. Sadly, students almost never do so. Fortunately, the tutorial.helpers package provides tools for ensuring that students type in all the assigned code.
Imagine the shallowest possible learning curve. Almost every student should be able to answer almost every exercise, albeit perhaps with the help of a hint. There are no hard questions. In fact, there really aren’t any questions at all. Instead, there are instructions: Do one thing, then the next, and then the next.
Almost all exercises feature a knowledge drop, a bit of information, rarely more than two sentences, provided after the student has answered the question.
Assume that you are giving the student a private lesson. You ask them a question. They give you an answer. What would you say to them next? What do you want to teach them, given that context?
We are building a “pit of success.” Generally, students don’t do the assigned reading, at least in a large class. However, they will complete required work. They will do the assigned tutorials. Our promise: If you complete the tutorials, you will learn the material. There is simply no way not to.
Set Up
Tutorials themselves live in a directory within
inst/tutorials
in whichever package you are working on. We
recommend that this directory name be a combination of a prefix number
(which indicate the week/chapter with which a tutorial is associated
and/or the order in which to do it) and a name, which corresponds to the
id
of the tutorial. Within each directory is an R Markdown
file and, sometimes, other material like an images
or
data
directory. The prefix number determines the order in
which tutorials appear in the Tutorial tab. By default, we name the R
Markdown file tutorial.Rmd
, but any name works as long as
the file has the appropriate YAML header.
To create a new tutorial, use
File -> New File -> R Markdown...
. Choose the “From
Template” option and then select “Helpers Tutorial” from the
tutorial.helpers package. Follow the instructions.
The id
value is important. It should be the same as the
directory in which the tutorial is located, but with any leading numbers
removed. It is used for the name of the answer file which students save
at the end of the tutorial.
Note that tutorials must be R Markdown documents, meaning
that their suffix is .Rmd
. You can not (yet) use Quarto
documents with tutorials. Fortunately, most of what you need which works
in Quarto also works in R Markdown. The main difference is that code
chunk options appear within the {}
. Don’t worry about this
detail. Just use the provided RStudio Addin
functions.
There is a setup
code chunk at the top of a tutorial.
You must have library(learnr)
and, if you use our tools,
library(tutorial.helpers)
in this chunk. Our template also
provides useful settings for various options.
Warning: You must ensure that any library used in the tutorial is explicitly loaded in this setup chunk. Almost every tutorial makes use of functions from the tidyverse package, so be sure to load this. Unfortunately, nothing in our test suite captures the common error of using library X in the tutorial code and forgetting to load it in the setup chunk.
We recommend always including a question which requires students to
load any library used in the tutorial, other than
learnr and tutorial.helpers. This is
good for seveal reasons. First, students are always forgetting to load
libraries. More practice helps. Second, a load-library question provides
a good occasion for a knowledge drop. Third, a load-library question
should include a test case code chunk which loads the library. This test
will only pass if the library is loaded by us in the setup
chunk.
If your tutorials are part of an R package, then you should ensure that tutorial.helpers is included under Imports and that any library loaded in a tutorial is, at least, included under Suggests.
Structure
The beginning of every tutorial includes the
copy-code-chunk
and the info-section
code
chunks. The tutorial is then divided into different topics that
appear as side panels. The first topic is the “Introduction” and the
last is the “Summary”.
Within the topics, other than the Introduction and Summary, there are
a series of exercises which can include writing code or writing
text. At the end of the tutorial, there is a
download-answers
code chunk which provides students with
instructions on how to download a copy of their answers.
The Introduction portion is two to four sentence about the main topics covered in the tutorial. Why are we here? What will students get out of giving you 90 minutes of their lives? What functions/techniques will they learn?
The Summary portion is two to four sentences which bring the lessons of the tutorial together for the student. What do they know now that they did not know before? What are the most important functions/techniques covered? This should be very similar to the Introduction. You made a promise as to what they would learn. You kept that promise.
If there are one or two other key resources about the topic of the tutorial, then those resources should be mentioned somewhere in the tutorial and also in the Summary.
Anything typed at the keyboard belongs in `backticks` (not “quotation
marks”), except for package names, which are always
bolded. Function names always include the parentheses:
read_csv()
, not read_csv
. Example: the
+
sign is used to connect ggplot()
components
when using the ggplot2 library.
Ensure that the first few questions always require students to load
any libraries which are used in the tutorial. That is, look at all the
libraries you load in the set up chunk. (Try not to have too many of
them.) All of them, except for learnr and
tutorial.helpers merit an exercise which requires the
student to type library(package.name)
. This ensures that
students get in the practice of loading libraries. And it also provides
occasion to drop some knowledge. Don’t forget that all libraries you
load should be included in the DESCRIPTION file — if the tutorial is
part of a package — probably under Suggests.
Topics
Tutorials are divided into topics that appear on the side panel. To create these topics, we include a double hash (##) before the text for it to show up as a side panel. This is also called the topic title. Use sentence case. On the line after the topic title, put three hashes. This ensures that students will see the introductory text before they see the first exercise.
The one topic which only has the double hash (##) and not a triple hash (###) on the next line is the Summary topic, since there are no exercises which follow the summary text.
Each topic begins with a sentence or two about what this group of exercises is trying to accomplish. Example:
## Interacting with sites with `GET()`
###
In order to get data from an API, we use the **httr** package. The package is designed to imitate standard HTTP in R. Read more about HTTP [here](https://www.jmarshall.com/easy/http/).
### Exercise 1
The link will be formatted correctly once the tutorial is knitted. Topic introductions will sometimes have two parts: the introductory text as above and a plot which will be replicated in this portion of the tutorial. Those two parts are generally separated by a triple hash.
After the last exercise in a topic, you should put a triple hash and then give a two sentence summary about what this topic accomplished. A topic is a 20 minute transfer of knowledge from you to the student. At the beginning, you mentioned its purpose. Conclude by tying things back to that original purpose. Often, these “purposes” will be fairly trivial: You promised to go through an example of a scatter plot and, in fact, you did. And that is OK! We are not writing poetry. Not every topic leads to salvation.
One or two high quality links, specifically relevant to this topic, should be included/explained at either the beginning or end of a topic, unless the topic is very short.
Exercises
Each topic is composed of a series of numbered exercises.
Flow
Each exercise should have a flow which requires that students hit the “Continue” button at least once.
Begin with a Start which is a sentence or two of knowledge and/or the question itself. If the length of the Start text is longer than one or two lines, then do not place the question code chunk in the same part. Instead, the Start includes a triple hash, thereby creating the Continue button. If the length of the text is short enough that students are willing to read it (at most two sentences), you can include the exercise code chunk in the same part.
Most of the time there is no need for a triple hash before the exercise code chunk.
Do not expect students to read more than two sentences of text at a time. After two sentences, you almost always want to use a triple hash in order to create a Continue button so that students have a break. They won’t read more than two sentences without a break.
There is a danger that students will just click the Continue button until they see a question and, only then, start reading. There is little we can do about that. However, we can take advantage of students’ tendency to read the sentence or two which proceeds the question fairly closely. This is a great place for teaching since students can’t skip it since they don’t know if it provides necessary context for answering the question.
-
After the Start, come three code chunks:
- The question code chunk is the location in which students will place their answers.
- The hint code chunk includes any hints for the students. Hint code
chunks are only available for code exercises. The code chunk name is the
hint is always exactly the same as the one for the exercise code chunk,
except with
-hint-n
attached at the end. Then
is replaced by the number of the hint. Almost always, there is only one hint, so the suffix is-hint-1
. We always seteval = FALSE
in the hint code chunk since, often, the hint will not be legal R code. - The test code chunk has exactly the label as the one for the
question code chunk, except with
-test
attached at the end. Test code chunks, like hint code chunks, are only used for code exercises. It always includes theinclude = FALSE
code chunk option because we never want to show the code or the results to students. Instead, the purpose of the test code chunk is to ensure that the correct answer — that is, the code we want students to enter into the exercise code chunk — works.
The three code chunks are always followed by a triple hash. We want students to pause after they have submitted their answers so that they are more likely to consider the output from their submission before moving on.
The last part of an exercise is the end, our main opportunity to drop some knowledge. See Knowledge Drops for extensive discussion.
The last part of the topic is another knowledge drop. It is not another exercise. It is just a knowledge drop after the last exercise which tries to take a broader overview. It is often separated from that last exercise by a simple
###
. Again, this can’t be more than a sentence or two. But it should be more substantive than a simple “Good job.” Recall the 10,000 items which we want to mention. For example, if the topic has involved creating a scatter plot, then the last exercise will be putting the final touches on that scatter plot. The knowledge drop should be something about scatter plots in general, not a minor point about the particular scatter plot which the student has just created.
Question types
To create the exercise headers, you use three hashes. Make sure you
number your exercises – ### Exercise 1
,
### Exercise 2
and so on.
There are two main types of questions. First, we have normal
coding questions. Students write code and press the Run Code
button. Second, we have text questions which require students
to either write prose or to copy/paste the results of running specific
commands. Prose is needed for questions like “Explain the meaning of
potential outcomes.” Copy/paste situations arise when students are
instructed to do something like connect to Github or edit a qmd. We
confirm that the students have completed these questions by having them
issue a command like list.files()
and then copy/pasting the
command and the output. We often abbreviate those instruction using
CP/CR, which stands for copy and paste
the command and the result.
Code questions
Here is an example code question:
### Exercise 2
Start your code with `cces`. Use the pipe operator `|>` to add the
function `filter()`, selecting the rows where `state` is equal to
"Massachusetts". To set something equal to a value in `filter()`
use two equal signs: `==`.
```{r filter-2, exercise = TRUE}
```
```{r filter-2-hint-1, eval = FALSE}
cces |>
filter(state == "...")
```
```{r filter-2-test, include = FALSE}
cces |>
filter(state == "Massachusetts")
```
###
`==` is used because it is **checking** whether the value of the
variable on the left is equal to the value on the left. See
[here](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Comparison.html)
for discussion of other relational operators in R.
A single equation symbol, `=`, is used to set something equal to
something else.
First, the Start of the exercise sets the stage. It sometimes teaches something new, connects to a previous exercise, provides a useful link, whatever. If it is long enough, it is followed by a triple hash. If not, the text continues to the instruction. Most of the time, as above, there is only the instruction, telling the student, step-by-step, what to do.
Second, the instruction requires that students write some code. Good instructions generate results when the student presses Run Code. Tutorial answers should require the smallest incremental number of characters, relative to the last question, for students to type. That is one way you know that your learning curve is shallow. If a exercise code chunk requires the students to type a lot of characters, you should split up the exercise into multiple separate exercises.
Third, any exercise which requires the copying of code from the prior exercise should place the Copy previous code button below the exercise code chunk.
Fourth, tutorials should be so simple that 95% of the students can answer 95% of the questions easily. One way to ensure that is to add a hint to almost every coding question.
Hints must always have the same code chunk name as the exercise chunk for which they are the hint, with a “-hint-n” added at the end. So, if an exercise code chunk is named “ex-1”, then the hint associated with that exercise is named “ex-1-hint-1”. A second hint for that same question would be named “ex-1-hint-2”, and so on.
When you create a hint, always use eval = FALSE
within
the parentheses in the code chunk. This is because hints will often
include “…” and other symbols which do not run as correct R code. So, we
need to tell R not to run it or an error will occur during
R CMD check
. Example:
```{r ex-1-hint-1, eval = FALSE}
This is an example hint. Normally sentences like these
would cause an error in R because it is not proper code.
However, since we include eval = FALSE in the r-chunk,
R ignores all errors!
```
You need to wrap the text in a long hint by hand, inserting the carriage returns yourself. R will not wrap the text automatically.
Often, hints look like this:
The “…” indicates places where the student needs to insert some code, a value or a function name. The code in hints should be formatted correctly.
Students can not see the first hint after clicking through to the next hint. So, make sure the last hint is the one you most want them to have access to, i.e., the one which provides the key information. If students can see the last hint, they should have no reason to consult any earlier hints. We rarely provide more that one hint.
Hints are only allowed for coding questions, not for text questions.
Fifth, the third code chunk, after the exercise and hint code chunks, is the test code chunk. Place the answer — the code which you want students to enter into the exercise code chunk — into the test code chunk. Since the test code chunk will be evaluated with the tutorial is knitted (which also happens during testing), this guarantees that correct answer will work for students.
Indeed, the workflow for writing an exercise often begins by, first,
entering the code which we want students to provide into the test code
chunk. We then copy/paste that same code into the hint code chunk,
replacing some of the functions and/or arguments with ...
as appropriate. We then ask the question which, we hope, will cause
students to answer with the same code as we have in the test code
chunk.
There are some instances in which we can’t test code which we want students to use. The most common case is code which requires the web, generally for downloading data. In that case, we delete the test code chunk.
Sixth, we separate the code chunks from the end of the exercise by
using ###
again.
For simple questions which result in a display of some data, one approach is to write “You should see that the value of height in row 1 is 23.” This allows the students to know that they are on the right track. Never hard-code a number. Use R to inline calculate it, even though this can be a bother.
But, much more common, is to use the end to drop some knowledge, especially about a function which was used in the answer to this exercise, or to one of the previous exercises in this topic.
Note that we provide as many exercises as possible. For example,
every tutorial features a question for each package which must be
loaded. We require students to type in items like
library(tidyverse)
even though they have done so many times
in the past. Every exercise is another opportunity to make the learning
curve as shallow as possible and to drop some knowledge. More questions
are better than fewer.
One way to measure the shallowness of the learning curve is to examine how many new characters each exercise requires for its answer. Better three exercises, each of which loads a different package, then one exercise which loads three packages at once.
Text questions
There are two types of text questions: 1) those that provide the students with the correct answer, after they have submitted their own answer, and, 2) those that do not provide an answer. Examples:
### Exercise 6
Explain potential outcomes in about two sentences.
```{r definitions-6}
question_text(NULL,
message = "This is where we place the correct answer. It will appear only after
students have submitted their own answers. Note that we do not need to wrap the
answer text by hand.",
answer(NULL,
correct = TRUE),
allow_retry = FALSE,
incorrect = NULL,
rows = 6)
```
For the message
argument, you should provide an
excellent answer. We want to allow students to check
for themselves that they got, more or less, the correct answer. Note how
we set allow_retry
to FALSE. This means that, after they
see our answer, students can’t modify their answer. The
rows
argument decides how many rows the empty text input
will have.
Always specify (approximately) how much you want students to write. Reasonable units are: one sentence, two sentences and a paragraph. Pick one of these three unless you have a good reason not to.
For paragraph questions, you should mention specific words or phrases which the students should include in their answers. If your suggested answer includes the word “validity,” for example, then tell the students to include (and define) validity as part of their answer.
However, for many written questions, we don’t provide an answer, so we don’t mind if students resubmit. In that case, we use
### Exercise 7
From the Console, run `list.files()`. CP/CR.
```{r file-creation-7}
question_text(NULL,
answer(NULL, correct = TRUE),
allow_retry = TRUE,
try_again_button = "Edit Answer",
incorrect = NULL,
rows = 3)
```
This format is most commonly used for “process” questions in which we have told students to do something and then confirm that they have done it by copying/pasting the result from a command.
tutorial.helpers::show_file()
is a handy function for
confirming that students have modified text files as instructed. For
example, after telling students to edit the _quarto.yml
file, we can check that they did so with:
In the Console, run:
```
tutorial.helpers::show_file("_quarto.yml")
```
CP/CR.
show_file()
provides a variety of arguments which cause
it to return only selected lines rather than the entire file.
Keep in mind that show_file()
will not be available to
students in their Console by default. We can either always call it with
tutorial.helpers::show_file()
, as above, or always (and
after each restart of the R Session!) have the student type
library(tutorial.helpers)
at the Console by hand.
Maybe the instructions should always use the double colon, but remind
students the first time in each tutorial that they could just
library(tutorial.helpers)
. Or, don’t use the double colon
and then, the first time it is used, remind students that, when they see
the “No function found” error, they need to
library(tutorial.helpers)
at the Console.
Tips
Each coding exercise should always spit out something. Interactivity is good! Students should always look at what their code is producing. There are some situations in which students need to make assignments and which, because of this, will result in no output when the Run Code button is pressed. But:
Do this as little as possible. Why not just make a pipe?
When you do this, you will need to create the permanent object yourself because student work in an exercise code chunk has no lasting effects. Doing so is annoying and error prone.
You can have the student not only do the assignment but also, as part of this same exercise, print out the object. This works well.
Follow a coding Style
Guide, especially spaces around operators like ” = “. Use only one
command per line in pipes and graphics, with proper indentation. Indent
plotting commands after the call to ggplot()
.
Do not create an object in one question and then assume that it will
be available in subsequent questions. It won’t be! Each question is
independent of every other question. They live in separate R instances.
The only exception (which we make use of) is that objects created in the
initial setup
chunk for the entire tutorial are available
in all later questions, just the way that library()
commands executed there do not need to be executed again.
Tutorials are knitted/run from the directory in which they are
located. So, if you want to read in a file from a data/
directory, you write:
x <- read_csv("data/myfile.csv")
in an R code chunk, presumably in the global setup
chunk. But, if you try to execute that line of code with
Command + Return
, it will fail because, by default, you are
located in the main directory of your.package when you
start working on your tutorials. Using setwd()
will solve
this problem.
> getwd()
[1] "/Users/davidkane/Desktop/projects/r4ds.tutorials"
> setwd("inst/tutorials/031-data-files/")
> getwd()
[1] "/Users/davidkane/Desktop/projects/r4ds.tutorials/inst/tutorials/031-data-files"
>
Command + Return
will now work because your R session is
“located” in the same location as that from which the tutorial will be
run when it is knitted.
Pipes
The most common type of code questions involve the step-by-step process of building a pipe, the final output of which is a nice looking graphic.
You want to first show the graphic that you will create by the end of the topic. You show it once at the start of the topic and once before the last exercise, as a reminder of what the graphic should look like so students do not need to scroll all the way back up.
You should put the code for the graph in the code chunk at the start of the topic. Save the code to an object. The name of the object should have a “_p” suffix, where the “p” stands for “plot.” This way, you only have to put the object name in the code chunk at the end of the topic rather than copying the code.
You then build up the graphic, line by line, over a series of exercises, providing hints along the way.
Knowledge Drops
The most difficult part of tutorial creation is writing the “knowledge drops,” the snippets of wisdom (and the associated links) which are used at the end of each exercise. These generally come in two categories: details about R functions/packages/websites and background information about the substative data science problem at hand.
Do not expect this to be easy! Good knowledge drops are hard. Make them short. Students will not read more than a sentence or two.
Perhaps the best place for a knowledge drop, especially for written questions, is at the start of the exercise. That is, instead of just asking the question immediately, provide a sentence or two of knowledge drops, even if this information is not really needed to answer the question. Students tend to read those sentences closely since they might be relevant to the question they need to answer.
Rhetorical questions (almost) always work poorly for knowledge drops.
A knowledge drop should not be a road sign. Example: “In the next section we will explore the data further.” Don’t waste time telling students what you expect to do next, or what you have just completed doing. Teach them something real!
Advice for R Details Knowledge Drops
-
Not sure what knowledge to drop? Look up the help page for one of the functions used in the topic. Ideally, this will be the function which was used in this exercise, but it can be one used a few lines above. The help page will have two areas of interest:
- Arguments for the function, some of which will not have been used in the topic but which are still worth mentioning. A good knowledge drop will mention this argument and describe when it might be useful.
- Related functions. For example, use the “Useful filter functions”
section of the
?dplyr::filter
help page. All of these make for good knowledge drops! Another example is the “See also” section of the?arrange
help page. We can’t possibly use every R function in an exercise, but we can at least describe some of them in knowledge drops.
-
Use high quality links.
- The second edition of R for Data Science is amazing. Link to it as often as you can.
- Anything associated with Posit (formerly named RStudio) is high quality. Examples include sites associated with Tidyverse and Tidymodels.
Avoid links to low quality sites like
https://rdrr.io/
,https://www.rdocumentation.org/
, andhttps://www.datamentor.io/
. Any site which uses ads should be avoided.-
Links to function documentation is useful. Indeed, we should link at least once to the documentation for every function which we use. Two categories:
- Links to the help pages for base function are tricky. It is not
clear to me which location is the best to use for this purpose. My
suggestion is to use links involving the
stat.ethz.ch
server, which is the home base (I think) for the core R developers. Consider examples forcut()
andprint()
. - Links to functions from the Tidyverse should use the docs
maintained by Posit. Consider these examples for
arrange()
andaes()
.
- Links to the help pages for base function are tricky. It is not
clear to me which location is the best to use for this purpose. My
suggestion is to use links involving the
One common approach is like this:
See `?readr::locale` for [details](https://readr.tidyverse.org/articles/locales.html).
Note how we concisely provide both the command which brings up the help page and a link to the help page itself.
Think back to our initial discussion of the 10,000 pieces of
information which we want to mention at some point. Among these are many
pages linked to from the help pages for the core Tidyverse
functions, like arrange()
and aes()
.
Advice for Background Information Knowledge Drops
The second category of knowledge drop involves background information about the current problem. See the primer.tutorials package for some good examples. In this situation, there is a large amount of information, generally from a textbook, which we want students to read. But we can’t just copy/paste entire paragraphs from the book as knowledge drops because students won’t read paragraphs. At best, they will read two sentences. So, we need to decide on the 10 or so more important sentences from the source material, copy/pasting them if that is allowed, rewording them if necessary. Then, we need to find a portion of the tutorial with 5 or so exercises in a row with no knowledge drops. Then we add two sentences from the material to each of the knowledge drops for those 5 exercises.
A knowledge drop does not need to be relevant to the question which was asked in that exercise as long as it is relevant to the knowledge drops which are contained in the exercises which come before and/or after.
Addins
The tutorial.helpers package provides a collection of RStudio Addins which facilitate the creation and testing of tutorial packages. Read about them in the “Rstudio Addins” vignette. Three of the vignettes create new exercises. A fourth, “Format Tutorial Chunk Labels,” renumbers all the exercises in a tutorial if you add or remove an exercise. It also ensures consistency in code chunk labels. Highly recommended!
Inputs
In addition to tutorial.Rmd
, a tutorial will often use
other inputs. The two most common locations for storing these inputs are
data
and images
directories at the same level
as the tutorial.Rmd
file. Any file in data
or
images
will be available at run time. (Note that the
directories must have these names. Something like my_data
will not work.)
Data
If you need for an R object to be accessible in an exercise code
chunk, create it in the initial global setup
code chunk at
the top of the tutorial.
Be wary code which downloads data from the web. This won’t work if
the student does not have an internet connection when she creates the
tutorial. Instead, save the code which downloaded the data and then
place that object in an RDS file in the data
directory.
Here is an example from the “Wrangling Census data with tidyverse tools”
tutorial from the tidycensus.tutorials
package.
median_age <- get_acs(geography = "county",
variables = "B01002_001",
year = 2020)
write_rds(median_age, "data/median_age.rds")
median_age <- read_rds("data/median_age.rds")
The first two commands download data and save it to an RDS file in
the data
directory.
This code assumes that you are located in the same directory as the
tutorial.Rmd
file. You only run those commands once, and
then you comment them out because you don’t want them re-run each time
the tutorial is created. The read_rds()
call is never
commented out because we always need the median_age
object.
When designing tutorials which use objects like
median_age
, we generally write two exercise code chunks.
The first has the student run the same code as that which we used to
create the object ourselves. This won’t work if the student is not
connected to the web but, with luck, in that case they will get a
sensible error message. The second question informs the students that we
have, behind the scenes, assigned the result of the function to an R
object. We then ask the student to just print out that object. We don’t
have them do the assignment themselves, not least because we don’t like
questions which don’t generate any output.
We use a similar approach with models which can take awhile to fit. Example:
fit_gauss <- brm(formula = biden ~ 1,
data = poll_data,
refresh = 0,
silent = 2,
seed = 9)
write_rds(fit_gauss, "data/fit_gauss.rds")
fit_gauss <- read_rds("data/fit_gauss.rds")
Again, this code only works if you are in the tutorial directory, not in the higher directory of the R project itself. Also, the first two commands are commented out, unless you are running them by hand to create the object.
What happens if the data is too large? See the “Arrow” tutorial in
the r4ds.tutorials
for an example. First, we generally switch away from code exercises and
use written exercises. Students run the required commands and then
copy/paste the command/response. Big downloads don’t work well in
exercise code chunks. Second, we create small versions of this big data
in the global setup
chunk. This allows us to create test
code chunks for most of the exercises which follow. These tests will run
much more quickly with this smaller data. Also, for any package on CRAN,
we need to keep the overall size of the package as small as
possible.
Images
To add images to a tutorial, first make a directory called
images
in the folder that contains
tutorial.Rmd
. Store all images for that tutorial there. You
can work with those files in all the usual ways.
Use include_graphics()
to add the image into the
document. Include this code in its own chunk, in the place where you
want the image to appear in the tutorial.
include_graphics()
is part of the knitr
package, so you need library(knitr)
in the setup code
chunk. Note that you do not need to name these code chunks.
Because students will complete the tutorials using screens of very
different widths, it is a good idea to put
knitr::opts_chunk$set(out.width = '90%')
in your
setup
code chunk. In this way, images will appear as a
sensible size regardless of whether they are using a phone screen or a
big monitor.
Files
Other files — most commonly text files or RDS files — are handled in
a similar fashion to images. Create a directory called data
in the same directory as the tutorial.Rmd
, just as we do
with images
.
There are two main uses for files in data
. First, they
can be used at “compile time” (when the tutorial.Rmd
is
knit to html) for making plots or doing anything else. Second, and more
importantly, they are available to students in the exercise code blocks
during “run time” (when students are doing the tutorial).
Complex text
You sometimes want to include “complex” text in a tutorial. This is
most common when trying to teach students how to use R code chunks and
other strings which Rmarkdown wants to process in certain ways. You can
sometimes get away by placing such text in environments surrounded by
three, or even four, backticks. This works often, but not always.
Comments characters like #
are especially problematic. We
also use the parsermd
package behind the scenes. It does not work as well as one might
like.
The “file” trick solves this problem. Create a txt file,
example.txt
, with the text which you want to appear in the
tutorial. You can either leave it in the same directory as the
tutorial.Rmd
file or, probably better, place it in either
the data
or images
directory. You then add
this code chunk to your tutorial.
The file
code chunk object loads the specified file. The
other code chunk options ensure that the text is echoed but not
evaluated.
Processing submissions
Instructors have different needs and priorities when it comes to
processing student answers. For now, we provide one function,
tutorial.helpers::process_submissions()
to help them. Read
the help page: ?process_submissions
. There are many
arguments and options.
Checks
The simplest way to test the tutorial.Rmd
with which you
are working on is to hit the “Run Document” button. This is the same
thing as running rmarkdown::render()
on that file:
rmarkdown::render("inst/tutorials/02-terminal/tutorial.Rmd")
This assumes that you are located in the main directory of
your.package, as you normally would be. I am not sure
if this will catch all potential errors, but it will catch many issues,
and it is very quick. Replace 02-terminal
with the
appropriate directory.
Test before submitting a PR
Once you are done editing your tutorial, you need to make sure it works before you submit a pull request.
Click “Install and Restart” from the Build tab. Then, hit “Start Tutorial” in the Tutorial tab. This mimics the experience that users will have. This will catch some common errors, like having two code chunks with the same name. (I am not sure if this does more or less than the simple test as above.)
Do a full test, which means running
R CMD check
. Go to the top right window of RStudio. Click the Build pane and hit the “Check” button (with the green check mark). You will then see a bunch of code and tests running. Make sure it says “OK” next to “testthat”. You should always run this before submitting a pull request.
What to do if R CMD check
fails
Read the error message at the bottom of the Build pane. You want to see “R CMD check succeeded.” If not, there is a problem. The error message will often provide a clue as to where in your code the error occurred.
If that error message is not detailed enough, go to the
your.package.rcheck
folder, which should be located in the same directory asyour.package
on your computer. This is a folder created by theR CMD check
process, and it will be automatically deleted if the check process succeeds. If the process fails, theyour.package.rcheck
folder stays around so that you can examine it. The key file istestthat.Rout.fail
, which should be in thetests
directory. It has more details on what went wrong.
The most common source of errors is something wrong with the hint
code chunks, which are not evaluated when you just Run Document. Make
sure the eval = FALSE
argument is set in the code chunk for
all hints.
Difficult bugs
The most common issue is that an author will use a package like ggthemes in their tutorial but then forget to include
library(ggthemes)
in thesetup
code chunk. This will not cause any error in checking because ggthemes is not actually used in the tutorial itself — unless it is included in a test code chunk, as it should be — so the tutorial knits without a problem. But then the student tries to typelibrary(ggthemes)
as an answer to an exercise and gets an error because the package has not been installed.Note that
R CMD check
does not seem to catch cases in which youlibrary()
a package in a tutorial but that package is not in DESCRIPTION. But such a discrepancy will cause an error on Github Actions because, there, you only have access to packages that have been installed as part of that test.Be careful of the way that Github is sloppy in how it deals with capitalization changes, especially when you change the name of a file. For example, you might first commit a file named
Rproj.png
. Later, you decide to change all file names for images to all lower case. So, you change the name of the file torproj.png
. Commit and push. Everything is great, right? No! Even if Github shows you the new file name it might still have that file asRproj.png
internally, with the capital “R”. This will cause errors when your run your checks on Github:
Error: Cannot find the file(s): "images/rproj.png"
But the file is there! You can see it! The tests work on your local machine. The easiest solution is to delete the file (and commit that change). And then change the name of the file to something else and use it.
-
R CMD check
will test that all tutorials have the default code chunks exactly as they are in the Helper Tutorial template. So, use the template. If either the “Information” or “Download answers” chunks are missing,R CMD check
will return something like “Missing a component part from file /path/to/your/Rmd/file.Rmd”.