Lesson 3 practice questions

Question 1

Import hcc1395_chr22_rna_seq_counts.csv and store it as hcc1395_chr22_counts.

Solution

import pandas

hcc1395_chr22_counts=pandas.read_csv("./hcc1395_chr22_rna_seq_counts.csv")

How many rows and columns are in hcc1395_chr22_counts?

Solution

hcc1395_chr22_counts.shape

(1335, 7)

What are the column names in hcc1395_chr22_counts and how to view the first 10 rows of this data set?

Solution

hcc1395_chr22_counts.head(10)

Alternatively, use hcc1395_chr22_counts.columns to get the column headings for this data frame.

How many genes start with the letter "C" in hcc1395_chr22_counts?

Solution

hcc1395_chr22_counts.loc[hcc1395_chr22_counts.loc[:,'Geneid'].str.startswith("C")]

Import hcc1395_deg_chr22.csv and store it as hcc1395_deg_chr22.

Solution

hcc1395_deg_chr22=pandas.read_csv("./hcc1395_deg_chr22.csv")

Remove ".bam" from the column headers of hcc1395_deg_chr22.

Solution

hcc1395_deg_chr22.columns=hcc1395_deg_chr22.columns.str.replace(".bam", "")

Subset out the following columns from hcc1395_deg_chr22 and store it as hcc1395_deg_chr22_1.

Solution

hcc1395_deg_chr22_1=hcc1395_deg_chr22.loc[:,["name", "log2FoldChange", "PAdj"]]

Use the .head function to check of the subsetting was done correctly.

hcc1395_deg_chr22_1.head()

Add a column to hcc1395_deg_chr22_1 that contains the negative log10 of the PAdj value.

Solution

import numpy

hcc1395_deg_chr22_1["-log10PAdj"]=numpy.negative(numpy.log10(hcc1395_deg_chr22_1.loc[:,"PAdj"]))