C6 Exercises#

c6e1: Processing a csv gene expression file#

Gene expression levels from different species are stored in the input file (c6e1_input_data.csv). Each line in the input file contains, separated by commas, the next information: species, sequence, gene name and expression level.

The extension of the file, csv, indicates that is a comma-separated values file. They are the standard way to provide information in plain text ordered in columns. It is like a table, where the lines are the rows and the columns are delimited by commas. Also *.tsv files are standard files for providing plain text data, see tsv. The difference is that these files delimit the columns by tabulars, instead of commas, and the provided data becomes more human readable (note: not always).

a. Print all the gene names of Drosophila ananassae and Drosophila yakuba

b. Print all the gene names with sequence lengths in the range of [90,100] nt. Print also their sequence lengths.

c. Print all the gene names with GC content (in percentage) greater than 50% and expression level lower than 200. For those print also the sequence, percentage of GC content and expression level. Note: be careful and even if, for you, it is very clear that the input file contains a number, it is possible that python takes it as an str while reading it.

d. Print all the gene names starting by “k” or “h”, but those of Drosophila melanogaster. Print the gene name and the species.

e. Print each gene name indicating if its sequence AT content (percent) is high (greater or equal than 65%), medium between [45%, 65%), or otherwise, low. Print also the calculated AT content (percent) with 1 decimal.

Sample#

Input a, b, c, d, e:

c6e1_input_data.csv

Output a:

hdu045
teg436
hdt739

Output b:

teg436 98
kdy533 90

Output c:

hdt739 cgcgcgctcgcgcatacggcctaatgcgcgcgctagcgatgc 71.43 85

Output d:

hdu045 Drosophila ananassae
hdt739 Drosophila yakuba
kdy533 Drosophila simulans

Output e:

kdy647 high 72.5
jdg766 medium 56.4
hdu045 medium 53.0
teg436 medium 45.9
hdt739 low 28.6
kdy533 medium 53.3