ncibtep@nih.gov

Bioinformatics Training and Education Program

Clustering with R and RStudio

Coding Club Seminar Series

Clustering with R and RStudio

 When: Mar. 12th, 2025 11:00 am - 12:00 pm

Seminar Series Details:

Presented By:
Brian Luke (Advanced Biomedical Computational Science, ABCS)
Where:
Online Webinar
Organized By:
BTEP

About Brian Luke (Advanced Biomedical Computational Science, ABCS)

Brian Luke, Ph.D., is a Senior Principal Computational Scientist with the Advanced Biomedical Computational Science (ABCS) group.  

About this Class

Clustering is one of the fundamental unsupervised machine learning algorithms. It is often used to group quantitative proteomic or RNAseq expression data to suggest sub-types of a particular cancer. This presentation covers building a distance/dissimilarity matrix, agglomerative and divisive hierarchical clustering and its associated dendrogram, and K-means clustering with a principal component and nonlinear projection of the resulting clusters. Comparing different clustering using the silhouette width is also presented. Though non-biological, the R dataset “swiss” will be used to show the basic techniques involved in clustering that can be applied to datasets of importance in cancer.

 

Additionally, this event is part 2 of a complementary event, Introduction to Clustering, in the Statistics for Lunch Series, sponsored by the Advanced Biomedical Computational Science group at the Frederick National Laboratory for Cancer Research. It is recommended (but not required) for attendees to attend this complementary event, as it will provide a theoretical introduction to clustering as a statistical methodology.