ncibtep@nih.gov

Bioinformatics Training and Education Program

Introduction to Data Wrangling Using Python: Part 1 of 2

Introduction to Data Wrangling Using Python: Part 1 of 2

 When: Mar. 10th, 2026 10:00 am - 11:00 am

Learning Level: Intermediate

To Know

Where:
Online
Organizer:
NIH Library
Presented By:
Cindy Sheffield (NIH Library)

About this Class

This one-hour online training, is the first of a two-part series, which introduces participants to cleaning and exploring a patient health dataset using Python and pandas. Attendees will load tabular data, inspect structure and data types, summarize columns, and identify common data quality problems such as missing values, inconsistent formats, and duplicate records. They will then apply practical fixes, including standardizing height and weight units, parsing and normalizing dates of birth, splitting combined fields, and using Boolean masks to flag or correct implausible values.​

By the end of this session students will be able to:

  • Import CSV data into pandas DataFrames and quickly understand column types, basic statistics, and overall data quality.​
  • Identify duplicate or repeated patient records and decide whether to keep, correct, or remove them.​
  • Detect and handle missing or inconsistent values using methods such as isna, fillna, filtering, and conditional replacement.​
  • Standardize mixed formats (for example, heights with and without units, date strings in different formats, and numeric values embedded in text).​
  • Create derived columns such as systolic and diastolic blood pressure, and use logical conditions to flag questionable or out-of-range values.​

Attendees are expected to have:

  • Basic Python coding knowledge
  • Familiarity with an IDE and loading script and data files into the IDE. (Colab, Jupyter Notebooks) 

Requirements: 

  • Participants will receive a script file and data files prior to the training. These should be loaded and ready to use before the training session begins. 

You can register for Part 2 in this series via the link below: 

https://www.nihlibrary.nih.gov/training/introduction-data-wrangling-using-python-part-2-2