Lesson 3 Exercise Questions: BaseR dataframe manipulation and factors
The filtlowabund_scaledcounts_airways.txt
includes normalized and non-normalized transcript count data from an RNAseq experiment. You can read more about the experiment here.
We are going to use the filtlowabund_scaledcounts_airways.txt
file for this exericise. Get the data here.
Putting what we have learned to the test:
The following questions synthesize several of the skills you have learned thus far. It may not be immediately apparent how you would go about answering these questions. Remember, the R community is expansive, and there are a number of ways to get help including but not limited to google search. These questions have multiple solutions, but you should try to stick to the tools you have learned to use thus far.
-
Import the filtlowabund_scaledcounts_airways.txt into R and save to an R object named transcript_counts. Try not to use the dropdown menu for loading the data.
Solution
transcript_counts <-read.delim("../data/filtlowabund_scaledcounts_airways.txt")
-
What are the dimensions of
transcript_counts
?Solution
dim(transcript_counts)
-
What are the column names?
Solution
colnames(transcript_counts)
-
Is there a difference in the number of transcripts with greater than 0 normalized counts (
counts_scaled
) per sample? What commands did you use to answer this question.Solution
table(transcript_counts[transcript_counts$counts_scaled>0,]$sample)
-
How many categories of transcripts are there? Think about what you know regarding factors. Why is this number much smaller than the results of question 4?
Solution
nlevels(factor(transcript_counts$transcript,exclude=NULL))
-
Subset
transcript_counts
to only include the following columns: sample, cell, dex, transcript, avgLength, counts_scaled. Save this new dataframe to a new object calledtransc_df
.Solution
transc_df <- transcript_counts[c("sample","cell","dex", "transcript","avgLength", "counts_scaled")]
-
Using your new data frame from question six (
transc_df
), rename the column "sample" to "Sample".Solution
colnames(transc_df)[1]<-"Sample"
-
What is the mean and standard deviation of "avgLength" across the entire
transc_df
data frame? Hint: Read the help documentation formean()
andsd()
.Solution
mean_avgLength<- mean(transc_df$avgLength) sd_avgLength<- sd(transc_df$avgLength)
-
Make a data frame with the column names "Mean" and "Standard_Dev" that holds the values from question 8. Hint: check out the function
data.frame()
.Solution
data.frame(Mean=mean_avgLength, Standard_Dev=sd_avgLength)