ncibtep@nih.gov

Bioinformatics Training and Education Program

DATA WRANGLING IN R

DATA WRANGLING IN R

 When: Feb. 9th, 2022 1:00 pm - 2:15 pm

This class has ended.
To Know
  • Where: Online Webinar
  • Organized By: NIH Training Library

About this Class

Data Wrangling in R is the third class in the NIH Library Introduction to R Series. A basic understanding of R and R Data Types is expected. This class provides a basic overview of manipulating, analyzing and exporting data with the R tidyverse. R is a programming language and open source environment for statistical computing and graphics. The R class series is a comprehensive collection of training sessions offered by the NIH Library Data Services Program that is designed to teach non-programmers how to write modular code and to introduce best practices for using R for data analysis and data visualization. Each class uses both evidence-based best practices for programming and practical hands-on lessons. By the end of this class, students should be able to: describe the purpose of Tidyverse packages; select certain columns or rows in a data frame; describe the function of the pipe operator; add new columns to a dataframe that are functions of existing columns; use the split-apply-combine concept for data analysis; use summarize, group by, and count to split a data frame into groups of observations, apply summary statistics for each group, and then combine the results; describe the concept of a wide and a long table format and for which purpose those formats are useful; describe the function of key-value pairs; reshape a data frame using the gather commands from the tidyr package; export a data frame to a .csv file. Students are encouraged to install R(link is external) and RStudio(link is external) and download the class data before the class so that they can follow along with the instructor. Attendees will need to download the class data before the class.