On-site exercises¶
Chapter 1 (aka C1)¶
C1.1 Browsing NCBI annotations¶
a. Using your web browser: search the annotation of the gene APP gene at NCBI for Homo sapiens¶
b. Find the sequence of the first (canonical) mRNA that corresponds to the gene¶
c. Find the sequence of the protein (amino acids) that corresponds to the first (canonical) mRNA of the gene¶
Chapter 2 (aka C2)¶
C2.1 Operations on numbers: arithmetic priority/print()¶
Write in a comment the value you expect in each expression. Then,
- For expressions involving more than 2 operators, add a comment in your code. Which operator has priority?
- Finally, check the answers in the cell below using print()
# For instance,
# what is your expected result? which operator has priority?
# (8 + 2) / 2
print((8 + 2) / 2) # 5.0; priority for + because of the parenthesis
5.0
Expressions, think first and then check:
2 - 10
3 * 5
9 / 2
9 // 2
9.0 / 2
5 - 3 * 2 # priority for - or *?
(5 - 3) * 2 # priority for - or *?
2 ** 4 # what means **?
8 / 2 ** 2 # priority for / or **?
(8 / 2) ** 2 # priority for / or **?
C2.3 Print escape characters/newlines/tabs¶
Print the same text including tabulations using a single call to the print() function and newline special codes
1st residue: A
2nd residue: C
3rd residue: G
4th residue: T
C2.4 Text and numbers/print variables¶
Define 4 variables, that represent the number of the different residues of a DNA sequence, as follows:
num_of_A = 24 # adenines
num_of_T = 25
num_of_C = 21
num_of_G = 29
Use the variables to print the following output:
Number of A = 24
Number of T = 25
Number of C = 21
Number of G = 29
C2.5 Numbers and variables: from kilograms to grams/variables¶
Here is a simple program that converts the mass of a wandering albatross (Diomedea exulans) from kilograms to grams and then prints out the resulting value. Copy (type it yourself, but comments) and run the next:
# Convert from kilograms to grams
mass_kg = 11.937 # descriptive variable name (never x = 11.937)
mass_g = mass_kg * 1000
print(mass_kg, "kg =", mass_g, "g") # nice formated output
Expected result:
11.937 kg = 11937.0 g
C2.6 Numbers and variables: start coding yourself¶
From pounds to kilograms/variables.¶
Similarly to the code above:
- Create a variable that stores a body mass in pounds (lb) and assign it a value of 3.5 (Desert Cottontail Rabbit)
- Convert this value to kilograms (1 lb is equal to 0.45359237 kg) and store this value into a new variable (use a nice name)
- Print the expected result using also the previous variables (arguments of print)
Expected result:
3.5 lb = 1.587573295 kg
C2.7 Calculate the total biomass in grams for 3 White-throated Woodrats (Neotoma albigula) and then convert it to kilograms. The total biomass is simply the sum of the biomass of all individuals, but in this case we only know that the average size of a single individual is 250 grams/variables.¶
a. Follow the procedure below and also observe how the names of the variables make sense:¶
- Create a string variable species and assign it to "White-throated Woodrats"
- Create a variable mass_gr and assign it the mass of a single Neotoma albigula
- Create a variable number_of_individuals and assign it the number of individuals
- Create a variable biomass_gr and assign it a value by multiplying the two variables above together
- Create a variable g2kg an assign its value (1 gram in kilograms)
- Convert the value of biomass into kilograms (using the variable g2kg) and assign to a new variable (biomass_kg)
- Print the next expected result using as many variables as possible as arguments
Expected result:
3 White-throated Woodrats are about 0.75 kg
b. Now you are going to practice to change the values of some variables¶
Duplicate the code (of C2.7.a in C2.7.b); if you execute it, you will see twice the previous expected result. Now, you are going to perform a similar calculation for 5 Morning Doves; the average size of a Morning Dove is 128 grams.
Then, only in the duplicated code:
- Update the string variable species and assign it to "Morning Doves"
- Update the variable mass_gr to the mass of a single Morning Dove
- Update the variable number_of_individuals to 5
- Run your code
Expected result:
3 White-throated Woodrats are about 0.75 kg
5 Morning Doves are about 0.64 kg
c2.8 String operators: + and */also +=¶
Build and print the a DNA sequence using the next string operators: + (string concatenation) and * (string replication). That is, do not type 21 times A and so on. Follow the next instructions:
- Create a variable seq with an empty string
- Update the value seq, concatenating 21 times A (use * for the number of times you want to repeat the A; learn how using google)
- Update seq by concatenating 19 times C (use += and *). Note that += is also new for you
- Update seq by concatenating 16 times G and 23 times T
- Display the value of seq
Expected result:
AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTTTTT
C2.9 Functions/Type conversions¶
Built-in functions¶
Use the built-in functions abs(), round(), int(), float(), and str() to print out the answers to the questions below. A built-in function is one that you don’t need to import a module to use. Use another function, help(), to learn how to use any of the functions that you don’t know how to use appropriately. help() needs the name of the function you want information about (e.g. help(abs)).
- The absolute value of -15.5
- print the help of function round() using function help()
- print 3.8 rounded to the nearest integer using standard rounding
- print 4.483847 rounded to one decimal place
- convert 3.8 to an integer format using int(), assign the value to a variable, and print it
- convert the answer to the previous question to a string and assign it back to the same variable name, print out the value
- convert the answer to the previous question to a float and assign it back to the same variable name, print out the value
C.10 Math module/import a module (new concept)¶
Use the sqrt() and log() functions from the math module, along with the built-in round() function to print the answers to the following questions to the screen.
- How long is one side of a square plant census quadrat that has an area of 10 $km^2$?, round it to two decimal places
- The number of species in a region can be estimated based on the area (in $km^{2}$) of that region using the next equation: number of species = 3.5 + 0.25 * log(area). As said before, log is the natural logarithm and it is not a built-in function. For the area of 10 $km^2$ what is the estimated number of species? That is, the integer part of the calculated number
Expected solution:
One side of the square plant(quadrat) is 3.16 km
Estimated number of species in the region = 4
C2.11 Gene expression/import math module¶
Let consider the following results of a gene expression analysis of a given gene in cells from a patient and a control:
patient = 42.55 # expression value
control = 10.12 # expression value
pvalue = 1e-04 # p-value for significant over expression
In the literature, the gene is considered significantly over-expressed with a log transformed fold change (base-2 logarithm) greater than 2 and a log transformed p-value (negative of the base-10 logarithm) greater than 3. Calculate these values for comparison as follows:
- Import the functions log2 and log10 from the math module, or the whole math module
- Assign the variables from above (patient, control, pvalue)
- Compute the fold change (ratio between patient and control), store the value in variable fc and print it
- log transform the fold change (base-2 log), store the value in variable log2fc and print it
- log10 transform the P-value, multiply it by -1, store the value in variable significance_pval and print it
Expected results:
Fold change = 4.204545454545455
Log2 fold change = 2.071949841879015
-1 * Log10(P-value) = 4.0
C2.12 Calculating the AT content of a DNA sequence/length/count¶
Let define the AT content of a sequence as the percentage of occurrences of A and T (A and T, not only AT) in the total length of the sequence. Then, calculate and print the AT content of the defined DNA sequence (be careful while typing it). You have to use the function len(), the string method str.count(), and some arithmetic operators (such as + and /).
seq = "TATAGATTACAGGG"
Expected solution:
seq = TATAGATTACAGGG
AT content of TATAGATTACAGGG = 64.28571428571429 %
C2.13 A longer DNA sequence/replace¶
Follow this procedure:
- Copy/paste your code from above (C2.12)
- Replace the seq variable assignment with the following:
seq = """TATA GATTACA
GGG""" # string on 2 lines
- Run it. The result is different to that of C2.12, because the variable seq is different
- Now, in the same program duplicate the code (just copy it again)
- Only in the duplicated code, replace the seq assignment with new statements. In those statements, you have to replace, within the variable seq, the space and newline characters by and empty str; for that, use the string method str.replace()
- Finally, run the program again an you should obtain the same result that in C2.12
Expected solution:
seq = TATA GATTACA
GGG
AT content of TATA GATTACA
GGG = 56.25 %
seq = TATAGATTACAGGG
AT content of TATAGATTACAGGG = 64.28571428571429 %
C2.14 Complementing DNA/replace method¶
Calculate and print the complement of a sequence defined as follows using the replace() method:
seq = "GATTACAGGGTATA"
Extra task: Get the reverse complement
Expected solution:
GATTACAGGGTATA (Seq)
CTAATGTCCCATAT (Complement seq using str.replace())
TATACCCTGTAATC (Extra task: reverse complement)
C2.15 Complementing DNA/maketrans and translate/reverse complement¶
str.translate() method¶
Calculate and print the complement of a sequence defined as follows using the maketrans() and translate() methods:
seq = "TATAGATTACAGGG"
Finally, get the reverse complement of the sequence
Expected solution:
TATAGATTACAGGG (seq)
ATATCTAATGTCCC (Complement seq using str.maketrans() and str.translate())
CCCTGTAATCTATA (Reverse complement seq)
C2.16 Restriction fragment lengths/str.find()/str.split()¶
The recognition site (motif) of a bacterial restriction enzyme is CCAGG. For the next sequence:
seq = "TATACCAGGGATTACAGGG"
The cut is just before the recognition site. That is: 5'--->TATA*CAGGGAT--->3'.
Then, follow the next steps:
- Display the sequence and its length
- Print the index (position) of the motif in the sequence using the method str.find()
- Create and print a list containing the two cut fragments. In this case, for the sake of simplicity, exclude the motif in the second fragment and just use the method str.split()
- Print the length of each fragment (including the motif)
- Print the addition of the lengths of the two fragments: it should be equal to the length of the original DNA sequence.
Expected solution:
TATACCAGGGATTACAGGG length: 19
Index of the motif in the sequence (array coord.): 4
List of 2 fragments (excluding the motif in the 2nd): ['TATA', 'GATTACAGGG']
Size of 1st fragment: 4
Size of 2nd fragment (complete): 15
Checking the length: 19
Sources¶
- https://docs.python.org/3.11/
- partly from www.programmingforbiologists.org