A Beginners Guide to Troubleshooting R Code

Learning Objectives

In this coding club, we will

Discuss commonly observed errors.
How to approach and debug R code.
How to find help.

Debugging is the process of identifying and fixing errors or potential errors in your code. There are many functions and packages that are available for debugging R code. For example, see this chapter from Advanced R.

Keep in mind that not all problems with your code are going to result in an error message. The code may run fine, but the results are unexpected or the code runs seemingly without end.

This session of the BTEP Coding Club will focus on ways to approach (and avoid) errors from a beginner perspective. We will NOT cover various debugging tools.

Approaching Common Errors in R.

Identifying errors in your script.

First, it is important to recognize that RStudio is extremely helpful in identifying mistakes in our code in the IDE editor pane via code diagnostics in the margin. We can skim our code easily for potential typos and other mistakes.

Common Errors

It is impossible to comprehensively review all error messages, but some error / warning messages are more common than others and worth a review.

Note

Most errors center around a common theme: R is looking for something and can't find it.

Syntax errors

Errors related to typos and missing punctuation (quotes, parentheses, braces, etc.).

Unmatched parentheses, curly braces, square brackets or quotes.

# unmatched parantheses
res <- sum(1,2

Error: <text>:3:0: unexpected end of input
1: # unmatched parantheses
2: res <- sum(1,2
  ^

The R console will prompt you for a completed phrase with a +. The easiest way to see or fix these is by opening your R script in an editor and checking for error flags.

Typos in function, variable, data set, object, package name, etc.

#misspelled object
#mtcars
mean(mcars$wt)

Error in mean(mcars$wt): object 'mcars' not found

object not found errors can be fixed by checking to make sure the object name was correctly typed and/or that the object exists. You can see a list of objects that you created using ls() or by checking your global environment.

#misspelled function
# looking for mean()
men(mtcars$wt)

Error in men(mtcars$wt): could not find function "men"

could not find function errors typically result from a typo in the function name or because the package from which the function is supplied has not been installed and/or loaded.

Note

install.packages() requires the name of the package in single or double quotes, while library() to load an installed package, does not.

Errors in function arguments

These include missing arguments, arguments supplied with incorrect data types, and typos.

Before using any function, you should know what that function actually does and make sure you are including all required arguments.

# No warning but unexpected results due to a typo
mean(c(1:10, NA), na.RM=T)

[1] NA

#corrected
mean(c(1:10, NA), na.rm=T)

[1] 5.5

#wrong data type supplied to argument x; results in a warning and NA output
mean(letters[1:5])

Warning in mean.default(letters[1:5]): argument is not numeric or logical:
returning NA

[1] NA

#no argument throws an error
mean()

Error in mean.default(): argument "x" is missing, with no default

Tip

Always, always, always read the documentation. Use ?function_name().

Misusing operators

Misuse of operators can be applied to many different scenarios. However, the most common seems to surround ==, which is used to assess equality.

For example,

x = 5

is used to assign 5 to the object x.

BUT

#returns TRUE or FALSE
x == 5

[1] TRUE

paste("x is assigned to the value", x)

[1] "x is assigned to the value 5"

x == 6

[1] FALSE

paste("x is assigned to the value", x)

[1] "x is assigned to the value 5"

#reassigns the object x to 8  
x = 8 
paste("x is assigned to the value", x)

[1] "x is assigned to the value 8"

== checks for equality. If we use =, when we mean ==, we could accidentally overwrite objects.

In certain contexts, this type of mistake will result in the following error:

x <- c(5,1,2,3)

for(i in seq_along(x)) {
    if(x[i] = 5) { 
      cat(i, "\n")  }
}

Error: <text>:4:13: unexpected '='
3: for(i in seq_along(x)) {
4:     if(x[i] =
               ^

Another common error is of the format, Error in if..., which usually means that a logical (if statement) is not returning a logical value (TRUE/FALSE). These tend to be caused by the presence of NAs.

x <- c(NA,5,1,2,3)

for(i in seq_along(x)) {
    if(x[i] == 5) { 
      cat(i, "\n")  }
}

Error in if (x[i] == 5) {: missing value where TRUE/FALSE needed

Calling functions with the wrong type of closure

If you use a function with brackets rather than parentheses,

mean[1:10]

Error in mean[1:10]: object of type 'closure' is not subsettable

ggplot2 syntax errors

library(ggplot2)
ggplot(mtcars) +
  geom_point(aes(wt, mpg, color = factor(cyl)
                 shape = factor(cyl)))

Error: <text>:4:18: unexpected symbol
3:   geom_point(aes(wt, mpg, color = factor(cyl)
4:                  shape
                    ^

Unexpected symbol in errors generally mean there is a punctuation mistake, such as a missing comma. This error is not specific to ggplot2 but is likely to creep in when plotting.

library(ggplot2)
ggplot(mtcars) +
  geom_point(aes(wt, mpg, color = factor(cyl),
                 shape = factor(cyl)))

+ scale_color_manual(c("blue","yellow","red"))

Error:
! Cannot use `+` with a single argument
ℹ Did you accidentally put `+` on a new line?

In ggplot2 the + symbol follows each layer and must be on the right hand side of the expression.

Similarly, the native R pipe (|>) and the magrittr pipe (%>%) are continuation characters that are placed on the right hand side of an expression. Be careful regarding placement, and make sure code follows.

Object naming errors

Incorrectly naming objects results in errors

Names should:

Avoid spaces or special characters EXCEPT '_' and '.'
Not include numbers or symbols at the beginning of an object name.
Avoid common names with special meanings (?Reserved) or assigned to existing functions (These will auto complete in RStudio).

2 <- seq(100,200,25)

Error in 2 <- seq(100, 200, 25): invalid (do_set) left-hand side to assignment

Errors related to syntactically invalid names generally creep in when loading data with R. Base R import functions will automatically fix column names, which may lead to unexpected results. For example, here the sample names (column names) start with a number prior to import. This is against naming conventions.

data <- read.delim("../data/S5_CommonErrors/SF_example_RNASeq_1.txt")
head(data)

                     gene_id X1_Cell1_Rep1 X2_Cell1_Rep2 X3_Cell1_Rep3
1 ENSG00000001630.11_CYP51A1       6877.07       6614.00       7057.98
2   ENSG00000002016.12_RAD52        282.99        286.62        286.52
3      ENSG00000002330.9_BAD       1946.00       1662.00       2121.00
4   ENSG00000002834.13_LASP1      17636.00      19333.00      18917.00
5     ENSG00000003056.3_M6PR       3874.00       4107.00       4005.00
6    ENSG00000003393.10_ALS2       2041.00       2150.00       2141.00
  X4_Cell2_Rep1 X5_Cell2_Rep2 X6_Cell2_Rep3
1      11305.33      10760.54      10047.00
2        265.41        235.00        254.24
3        608.00        711.00        576.00
4       4583.00       4464.00       3892.00
5       5741.00       5703.00       4978.00
6       1687.00       1624.00       1426.00

While this behavior can be modified with the check.names argument, invalid names will likely cause issues downstream.

In contrast, readr functions will not correct names unless defaults are modified with the .name_repair argument. For example,

data1 <- readr::read_delim("../data/S5_CommonErrors/SF_example_RNASeq_1.txt")

Rows: 10000 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): gene_id
dbl (6): 1_Cell1_Rep1, 2_Cell1_Rep2, 3_Cell1_Rep3, 4_Cell2_Rep1, 5_Cell2_Rep...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(data1)

# A tibble: 6 × 7
  gene_id            `1_Cell1_Rep1` `2_Cell1_Rep2` `3_Cell1_Rep3` `4_Cell2_Rep1`
  <chr>                       <dbl>          <dbl>          <dbl>          <dbl>
1 ENSG00000001630.1…          6877.          6614           7058.         11305.
2 ENSG00000002016.1…           283.           287.           287.           265.
3 ENSG00000002330.9…          1946           1662           2121            608 
4 ENSG00000002834.1…         17636          19333          18917           4583 
5 ENSG00000003056.3…          3874           4107           4005           5741 
6 ENSG00000003393.1…          2041           2150           2141           1687 
# ℹ 2 more variables: `5_Cell2_Rep2` <dbl>, `6_Cell2_Rep3` <dbl>

and this will result in an error:

data1$1_Cell1_Rep1

Error: <text>:2:7: unexpected numeric constant
1: 
2: data1$1
         ^

#use back ticks to get around this
head(data1$`1_Cell1_Rep1`)

[1]  6877.07   282.99  1946.00 17636.00  3874.00  2041.00

Warning

This issue is likely to come up with bioinformatics data, so consider renaming sample names early in your data analysis workflow if required.

Import errors

Though a bit unrelated to this section, import errors are also quite frequent for beginners due to misunderstandings regarding the working directory. Always know what directory you are working in (getwd()) and where the files you want to work with are located in relation to that directory. You may also want to check for typos in the file name, the file extension, and whether you correctly used / rather than \. While Windows OS uses the \ in file paths, in R, a \ is an escape character.

data2 <- read.delim("../S5_CommonErrors/SF_example_RNASeq_1.txt")

Warning in file(file, "rt"): cannot open file
'../S5_CommonErrors/SF_example_RNASeq_1.txt': No such file or directory

Error in file(file, "rt"): cannot open the connection

Overwriting objects leads to unexpected results

x <- 1:3
x <- 1:10
mean(x)

[1] 5.5

Tip

Check your environment pane for created objects. Be careful, as it is easy to overwrite existing objects.

Indexing errors

Indices in R start with 1. Incorrect usage of indexing in data structures such as vectors or data frames will not necessarily result in an error, but will often lead to unexpected results. A general subscript out of bounds error generally refers to accessing an element of a vector, list, or other data structure that isn't there.

Going beyond the range of vec1 below results in NA.

vec1 <- c(1:10)
vec1[11]

[1] NA

and subsetting outside the range of a data frame results in NAs or NULL.

mtcars[1:10, 12]

NULL

tail(mtcars[1:40, ])

     mpg cyl disp hp drat wt qsec vs am gear carb
NA.2  NA  NA   NA NA   NA NA   NA NA NA   NA   NA
NA.3  NA  NA   NA NA   NA NA   NA NA NA   NA   NA
NA.4  NA  NA   NA NA   NA NA   NA NA NA   NA   NA
NA.5  NA  NA   NA NA   NA NA   NA NA NA   NA   NA
NA.6  NA  NA   NA NA   NA NA   NA NA NA   NA   NA
NA.7  NA  NA   NA NA   NA NA   NA NA NA   NA   NA

Indexing outside the range of a matrix results in an error.

m <- matrix(1:6, nrow=2)
m[3, 3]

Error in m[3, 3]: subscript out of bounds

To avoid unexpected results or errors associated with incorrect indexing, know the structure of your data. Use functions such as str(), dim(), nrow(), ncol().

Other errors from subsetting outside of the bounds of a data frame include:

mtcars[,13]

Error in `[.data.frame`(mtcars, , 13): undefined columns selected

and

mtcars[[13]]

Error in .subset2(x, i, exact = exact): subscript out of bounds

Package errors

Errors regarding packages usually revolve around:

function masking issues (e.g., dplyr::select() vs MASS::select()).
- call the package directly.
dependency loading issues (e.g., Error: package or namespace load failed for ‘PACKAGE.NAME.HERE’)
- install dependencies.
failure to compile issues.
- reinstall and do not compile from source

Data type coercion errors

Converting between data types can often result in unexpected results. The following warning is common, if you are attempting to coerce a variable to a different data type that isn't possible.

as.numeric(c("1","two"))

Warning: NAs introduced by coercion

[1]  1 NA

Again, it is important to know the data type of an R object (Use typeof(), mode(), and class()).

Improper handling of missing Data (NAs).

vec1 <- c(1,2,NA,4)

sum(vec1)

[1] NA

Memory management errors

Error: vector memory exhausted (limit reached?) occurs when you do not have enough RAM available. You can track memory usage using Rstudio and remove objects as needed with rm(). You could also potentially switch over to the NIH HPC Biowulf.

You may also find this guide helpful.

Trouble-shooting

General steps to debug R code:

Read the message carefully.
Check for typos and missing punctuation.
Check your global environment for created objects (or use ls()).
Check object attributes (Use str(), dim(),typeof(), class(), etc.)

Step through the code line by line (See below example).

library(nycflights13)
library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

flights |>
  filter(dest == "IAH") |> 
  group_by(year, month, day) |> 
  summarize(
    arr_delay = mean(arr_dely, na.rm = TRUE)
  )

Error in `summarize()`:
ℹ In argument: `arr_delay = mean(arr_dely, na.rm = TRUE)`.
ℹ In group 1: `year = 2013`, `month = 1`, `day = 1`.
Caused by error in `mean()`:
! object 'arr_dely' not found

flights |>
  filter(dest == "IAH")

# A tibble: 7,198 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      623            627        -4      933            932
4  2013     1     1      728            732        -4     1041           1038
5  2013     1     1      739            739         0     1104           1038
6  2013     1     1      908            908         0     1228           1219
7  2013     1     1     1028           1026         2     1350           1339
8  2013     1     1     1044           1045        -1     1352           1351
9  2013     1     1     1114            900       134     1447           1222
10  2013     1     1     1205           1200         5     1503           1505
# ℹ 7,188 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

flights |>
  filter(dest == "IAH") |> 
  group_by(year, month, day)

# A tibble: 7,198 × 19
# Groups:   year, month, day [365]
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
  <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
1  2013     1     1      517            515         2      830            819
2  2013     1     1      533            529         4      850            830
3  2013     1     1      623            627        -4      933            932
4  2013     1     1      728            732        -4     1041           1038
5  2013     1     1      739            739         0     1104           1038
6  2013     1     1      908            908         0     1228           1219
7  2013     1     1     1028           1026         2     1350           1339
8  2013     1     1     1044           1045        -1     1352           1351
9  2013     1     1     1114            900       134     1447           1222
10  2013     1     1     1205           1200         5     1503           1505
# ℹ 7,188 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

flights |>
  filter(dest == "IAH") |> 
  group_by(year, month, day) |> 
  summarize(
    arr_delay = mean(arr_dely, na.rm = TRUE)
  )

Error in `summarize()`:
ℹ In argument: `arr_delay = mean(arr_dely, na.rm = TRUE)`.
ℹ In group 1: `year = 2013`, `month = 1`, `day = 1`.
Caused by error in `mean()`:
! object 'arr_dely' not found

Google the error message if necessary.
- Remove information specific to your data (e.g., object names) or files from the message.
- Be sure to include important search keywords. For example, if you know what function or data type the error is associated with, you can include that information. You should 100% include R as a keyword in any search.
Post to a forum.
Take a break.

Tip

At times, updating packages or simply restarting the R Session can address particularly troublesome errors.

Tip

You can also use common debugging tools (e.g., traceback(), debug()). These will likely be more useful for intermediate R users.

Getting help.

ALWAYS, ALWAYS, ALWAYS read the documentation.

Use the help pane in the lower right of RStudio or the functions, help() and help.search() or ? and ??.
Check out package vignettes (vignette()).
Check the Github site if one is available for help with a specific package. There may also be known issues listed under the Github Issues tab.
Google for help!

Use Google to troubleshoot error message or simply find help performing a specific task. But, make sure you are precise and informative in your search. Stack Overflow is a particularly great resource. A good Google search includes 3 elements:
1. The specific action (e.g., how to rename a column).
2. The programming language (e.g., R statistics).
3. The specific style/technique for coding (e.g., dplyr or tidyverse package)
  Example: "How to rename a column in R with dplyr/tidyverse”. --- https://crd230.github.io/tips.html.

Post a Question to a forum.

If you can't find help, you may need to post to a forum like Posit Community, Stack Overflow, or Bioconductor.

To post a question, you should include at minimum the following:

a descriptive and informative title with keywords (related to packages, functions, methods, or errors)
a description of the problem
sessionInfo()
readable, well-formatted code
a reproducible example (reprex) including a minimal dataset

See tips here. Do not include screen shots of your code or console, as this cannot be easily copied and reproduced.

Note

Stack overflow has a help page that includes guidelines for asking good questions.

How to make a reprex?

The package reprex is used to

Prepare reprexes for posting to GitHub issues, StackOverflow, in Slack messages or snippets, or even to paste into PowerPoint or Keynote slides. ---https://reprex.tidyverse.org/index.html

Creating a minimal dataset.

This can usually be achieved by using built in datasets (data()) or creating a small data set example using data.frame() or related function. The help pages of functions usually include a miminal example dataset. If for some reason you need to use your own data, check out the package datapasta.

To use reprex, you simply copy the code you want to include on Github, Stack Overflow, or other forum, and call reprex(). By default, it will create reproducible code ready for Github, Stack Overflow, or Discourse.

For example, using datapasta with reprex to create an example dataset with our own data:

library(reprex)
library(datapasta)

data <- read.delim("../data/S5_CommonErrors/SF_example_RNASeq_1.txt",check.names = FALSE)
data <- head(data)
dpasta(data)

data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
           gene_id = c("ENSG00000001630.11_CYP51A1",
                       "ENSG00000002016.12_RAD52","ENSG00000002330.9_BAD",
                       "ENSG00000002834.13_LASP1","ENSG00000003056.3_M6PR",
                       "ENSG00000003393.10_ALS2"),
    `1_Cell1_Rep1` = c(6877.07, 282.99, 1946, 17636, 3874, 2041),
    `2_Cell1_Rep2` = c(6614, 286.62, 1662, 19333, 4107, 2150),
    `3_Cell1_Rep3` = c(7057.98, 286.52, 2121, 18917, 4005, 2141),
    `4_Cell2_Rep1` = c(11305.33, 265.41, 608, 4583, 5741, 1687),
    `5_Cell2_Rep2` = c(10760.54, 235, 711, 4464, 5703, 1624),
    `6_Cell2_Rep3` = c(10047, 254.24, 576, 3892, 4978, 1426)
)

#copy to clipboard and run reprex
data<- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
           gene_id = c("ENSG00000001630.11_CYP51A1",
                       "ENSG00000002016.12_RAD52","ENSG00000002330.9_BAD",
                       "ENSG00000002834.13_LASP1","ENSG00000003056.3_M6PR",
                       "ENSG00000003393.10_ALS2"),
    `1_Cell1_Rep1` = c(6877.07, 282.99, 1946, 17636, 3874, 2041),
    `2_Cell1_Rep2` = c(6614, 286.62, 1662, 19333, 4107, 2150),
    `3_Cell1_Rep3` = c(7057.98, 286.52, 2121, 18917, 4005, 2141),
    `4_Cell2_Rep1` = c(11305.33, 265.41, 608, 4583, 5741, 1687),
    `5_Cell2_Rep2` = c(10760.54, 235, 711, 4464, 5703, 1624),
    `6_Cell2_Rep3` = c(10047, 254.24, 576, 3892, 4978, 1426)
)

#also include code to reproduce
(data2 <- rename_with(data,~ paste0("S",.x),contains("Cell")))

Run reprex after copying to clipboard.

reprex(session_info=TRUE,style=TRUE)

Tips for keeping your code organized and avoiding errors

Read the documentation!
- Know what the functions you use actually do.
- Read package vignettes.
Keep your code neat and clean. You can do this by following a style guide, for example, the tidyverse style guide.
- If you use RStudio, you can use ctrl + shift + A to reformat code.
- See this resource on Coding Etiquette for styling tips.
- Include organized code blocks with coding sections (# Name of Section ----).
Implement good data management practices.
- Use R projects to organize data, code, outputs, and other related files in one place.
- Use relative file paths to keep code from unexpectedly breaking and understand your directory tree.
Know the structure of your data.
- Keep track of created R objects (See the global environment; use ls()).
- Use glimpse(), str(), dim(), and related functions (class()).

A Beginners Guide to Troubleshooting R Code

Learning Objectives

Approaching Common Errors in R.

Identifying errors in your script.

Common Errors

Syntax errors

Unmatched parentheses, curly braces, square brackets or quotes.

Typos in function, variable, data set, object, package name, etc.

Errors in function arguments

Misusing operators

Calling functions with the wrong type of closure

ggplot2 syntax errors

Object naming errors

Incorrectly naming objects results in errors

Import errors

Overwriting objects leads to unexpected results

Indexing errors

Package errors

Data type coercion errors

Improper handling of missing Data (NAs).

Memory management errors

Trouble-shooting

Getting help.

Post a Question to a forum.

How to make a reprex?

Creating a minimal dataset.

Tips for keeping your code organized and avoiding errors

Sources