Skip to content

Lesson 2: Basics of R Programming: R Objects and Data Types

Objectives

To understand some of the most basic features of the R language including:

  • Creating and manipulating R objects.
  • Understanding object types and classes.
  • Using mathematical operations.

To get started with this lesson, you will first need to connect to RStudio on Biowulf. To connect to NIH HPC Open OnDemand, you must be on the NIH network. Use the following website to connect: https://hpcondemand.nih.gov/. Then follow the instructions outlined here.

R objects

Objects (and functions) are key to understanding and using R programming.

Everything assigned a value in R is technically an object. Mostly we think of R objects as something in which a method (or function) can act on; however, R functions, too, are R objects. R objects are what gets assigned to memory in R and are of a specific type or class. Objects include things like vectors, lists, matrices, arrays, factors, and data frames. Don't get too bogged down by terminology. Many of these terms will become clear as we begin to use them in our code. In order to be assigned to memory, an r object must be created.

Creating and deleting objects

To create an R object, you need a name, a value, and an assignment operator (e.g., <- or =). R is case sensitive, so an object with the name "FOO" is not the same as "foo".

Note

You can use alt + - on a PC to generate the -> or option + - on a mac.

Using = for assignment?

To improve the readability of your code, you should use the -> operator to assign values to objects rather than =. = has other purposes. For example, setting function arguments.

Let's create a simple object and run our code. There are a few methods to run code:

  • The run button
  • Key shortcuts (Windows: ctrl+Enter, Mac: Command+Return)
  • Type directly into the console.

Use comments (#) to annotate your code for better reproducibility.

#Create an object called "a" assigned to a value of 1.  
a <- 1  

#Simply call the name of the object to print the value to the screen
a 
[1] 1

In this example, "a" is the name of the object, 1 is the value, and <- is the assignment operator.

Now, if we use a in our code, R will replace it with its value during execution. Try the following:

a + 5
[1] 6
5 - a
[1] 4
a^2
[1] 1
a + a
[1] 2

Naming conventions and reproducibility

There are rules regarding the naming of objects.

  1. Avoid spaces or special characters EXCEPT '_' and '.'
  2. No numbers or underscores at the beginning of an object name.

    For example:

    1a<-"apples" # this will throw and error
    1a
    
    Error in parse(text = input): <text>:1:2: unexpected symbol
    1: 1a
        ^
    

    Note

    It is generally a good habit to not begin sample names with a number.

    In contrast:

    a<-"apples" #this works fine
    a
    
    [1] "apples"
    

    What do you think would have happened if we didn't put 'apples' in quotes?

    Strings

    R recognizes different types of data (See below). We have used numbers above, but we can also use characters or strings. A string is a sequence of characters. It can contain letters, numbers, symbols and spaces, but to be recognized as a string it must be wrapped in quotes (either single or double). If a sequence of characters are not wrapped in quotes, R will try to interpret it as something other than a string like an R object.

  3. Avoid common names with special meanings (See ?Reserved) or assigned to existing functions (These will auto complete).

See the tidyverse style guide for more information on naming conventions.

How do I know what objects have been created?

To view a list of the objects you have created, use `ls()' or look at your global environment pane.

Object names should be short but informative. If you use a, b, c, you will likely forget what those object names represent. However, something like This_is_my_scientific_data_from_blah_experiment is far too long. Strike a nice balance.

Reassigning objects

To reassign an object, simply overwrite the object.

#Create an object with gene named 'tp53'
gene_name<-"tp53"
gene_name
[1] "tp53"
#Re-assign gene_name to a different gene
gene_name<-"GH1"
gene_name
[1] "GH1"

Warning

R will not warn you when objects are being overwritten, so use caution.

Deleting objects

# delete the object 'gene_name'
rm(gene_name)
#the object no longer exists, so calling it will result in an error
gene_name
Error: object 'gene_name' not found

Object data types

Data types are familiar in many programming languages, but also in natural language where we refer to them as the parts of speech, e.g. nouns, verbs, adverbs, etc. Once you know if a word - perhaps an unfamiliar one - is a noun, you can probably guess you can count it and make it plural if there is more than one (e.g. 1 Tuatara, or 2 Tuataras). If something is a adjective, you can usually change it into an adverb by adding “-ly” (e.g. jejune vs. jejunely). Depending on the context, you may need to decide if a word is in one category or another (e.g “cut” may be a noun when it’s on your finger, or a verb when you are preparing vegetables). These concepts have important analogies when working with R objects.
--- datacarpentry.org

The type and class of an R object affects how that object can be used or will behave. Examples of base R data types include double, integer, complex, character, and logical.

R objects can also have certain assigned attributes like class (e.g., data frame, factor, date), and these attributes will be important for how they interact with certain methods / functions. Ultimately, understanding the type and class of an object will be important for how an object can be used in R. When the type (mode) of an object is changed, we call this "coercion". You may see a coercion warning pop up when working with objects in the future.

The type of an object can be examined using typeof(), while the class of an object can be viewed using class(). typeof() returns the storage mode of any object. Here, I am using mode and type interchangeably but they do differ. To find out more check out the help docs: ?mode() or ?typeof.

We now know what data types are, but what is a class?

'class' is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification. If an object has no specific class assigned to it, such as a simple numeric vector, it's class is usually the same as its mode, by convention. ---stackexchange

It is often most useful to use class() and typeof() to find out more about an object or str() (more on this function later).

Let's create some objects and determine their types and classes.

chromosome_name <- 'chr02'
typeof(chromosome_name)
## [1] "character"
class(chromosome_name)
## [1] "character"

od_600_value <- 0.47
typeof(od_600_value)
## [1] "double"
class(od_600_value)
## [1] "numeric"

df<-head(iris)
typeof(df)
## [1] "list"
class(df)
## [1] "data.frame"


chr_position <- '1001701bp'
typeof(chr_position)
## [1] "character"
class(chr_position)
## [1] "character"

spock <- TRUE
typeof(spock)
## [1] "logical"
class(spock)
## [1] "logical"

There are also functions that can gauge types directly, for example, is.numeric(), is.character(), is.logical(). And, there are functions for explicit coercion from one type to another: as.double(), as.integer(), as.factor(), as.character(), etc.

If an object has a class attribute, there is likely an associated "constructor function", or function used to build an object of that class. For example, ?data.frame(), ?factor(). We will discuss both data frames and factors in a later lesson.

Special null-able values

There are also special use, null-able values that you should be aware of. Read more to learn about NULL, NA, NaN, and Inf.

Mathematical operations

As mentioned, an object's type/mode can be used to understand the methods that can be applied to it. Objects of mode numeric can be treated as such, meaning mathematical operators can be used directly with those objects.

This chart from datacarpentry.org shows many of the mathematical operators used in R.
() are additionally used to establish the order of operations.

Let's see this in practice.

#create an object storing the number of human chromosomes (haploid)
human_chr_number<-23
#let's check the type of this object
typeof(human_chr_number)
[1] "double"
#Now, lets get the total number of human chromosomes (diploid)
human_chr_number * 2 #The output is 46! 
[1] 46

Moreover, we do not need an object to perform mathematical computations. R can be used like a calculator.

For example,

(1 + (5 ** 0.5))/2
[1] 1.618034

A function is an object.

R functions are saved as objects, and if we type the name of the function, we can see the value of the object (i.e., the code underlying the function). Functions are important to R programming, as anything that happens in R is due to the use of a function.

Looking up Compiled Code

When looking at R source code, sometimes calls to one of the following functions show up: .C(), .Call(), .Fortran(), .External(), or .Internal() and .Primitive(). These functions are calling entry points in compiled code such as shared objects, static libraries or dynamic link libraries. Therefore, it is necessary to look into the sources of the compiled code, if complete understanding of the code is required. --- RNews 2006

We have used some R functions in Lesson 1 (e.g. getwd() and setwd())! Let's look at another example using the round() function.
round() "rounds the values in its first argument to the specified number of decimal places (default 0)" --- R help.

Consider

round(5.65) #can provide a single number
[1] 6
round(c(5.65,7.68,8.23)) #can provide a vector
[1] 6 8 8

In this example, we only provided the required argument in this case, which was any numeric or complex vector. We can see that two arguments can be included by the context prompt while typing (See below image). The optional second argument (i.e., digits) indicates the number of decimal places to round to. Contextual help is generally provided as you type the name of a function in RStudio.

#provide an additional argument rounding to the tenths place
round(5.65,digits=1) 
[1] 5.7

At times a function may be masked by another function. This can happen if two functions are named the same (e.g., dplyr::filter() vs plyr::filter()). We can get around this by explicitly calling a function from the correct package using the following syntax: package::function().

The pipe (|>, %>%).

Functions can be chained together using a pipe (|>, %>%). The pipe improves the readability of the code by minimizing nesting.

For example,

ex<- -5.679

ex |> round() |> abs()
[1] 6

We will talk about the pipe more in part 2 and 3 of this series. For now, it is helpful to know that it exists and what it is doing.

Differences between |> and %>%

There are some crucial differences between the native pipe |> and the maggitr pipe (%>%). Check out this blog for details.

Pre-defined objects

Base R comes with a number of built-in functions, vectors, data frames, and other objects. You can view all using the function, builtins(). If you are interested in built-in datasets, check out help(package="datasets").

Acknowledgments

Material from this lesson was either taken directly or adapted from the Intro to R and RStudio for Genomics lesson provided by datacarpentry.org.