# On-site exercises¶

**Chapter 1** (aka C1)¶

**C1.1** Browsing NCBI annotations¶

**a.** Using your web browser: search the annotation of the gene **APP gene** at **NCBI** for *Homo sapiens*¶

*Homo sapiens*

**b.** Find the sequence of the first (canonical) mRNA that corresponds to the gene¶

**c.** Find the sequence of the protein (amino acids) that corresponds to the first (canonical) mRNA of the gene¶

Solution:

https://www.ncbi.nlm.nih.gov/gene/351

**C1.1.b**¶

Solution:

- Go to the reference sequences https://www.ncbi.nlm.nih.gov/gene/351#reference-sequences
- Select the first mRNA https://www.ncbi.nlm.nih.gov/nuccore/NM_000484.4 and scroll down to see the mRNA seq (corresponding DNA)
- Another way: in the top-left menu of https://www.ncbi.nlm.nih.gov/nuccore/NM_000484.4 select FASTA instead of GeneBank https://www.ncbi.nlm.nih.gov/nuccore/NM_000484.4?report=fasta&log$=seqview
**Copy and paste**the sequence into a**plain text editor**

**C1.1.c**¶

Solution:

- Go to the reference sequences https://www.ncbi.nlm.nih.gov/gene/351#reference-sequences
- Select the protein that corresponds first mRNA (NP_000475) https://www.ncbi.nlm.nih.gov/protein/NP_000475.1 and scroll down to see the amino acid seq
- Another way: in the top-left menu of https://www.ncbi.nlm.nih.gov/protein/NP_000475.1 select FASTA instead of GenePept https://www.ncbi.nlm.nih.gov/protein/NP_000475.1?report=fasta&log$=seqview
**Copy and paste**the sequence into a**plain text editor**

**Chapter 2** (aka C2)¶

**C2.1** Operations on numbers: arithmetic priority/print()¶

Write in a comment the value you expect in each expression. Then,

- For expressions involving more than 2 operators, add a comment in your code. Which operator has priority?
- Finally, check the answers in the cell below using
**print()**

```
# For instance,
# what is your expected result? which operator has priority?
# (8 + 2) / 2
print((8 + 2) / 2) # 5.0; priority for + because of the parenthesis
```

5.0

Expressions, think first and then check:

```
2 - 10
3 * 5
9 / 2
9 // 2
9.0 / 2
5 - 3 * 2 # priority for - or *?
(5 - 3) * 2 # priority for - or *?
2 ** 4 # what means **?
8 / 2 ** 2 # priority for / or **?
(8 / 2) ** 2 # priority for / or **?
```

```
print((8 + 2)/ 2) # Just an example: priority for + because of the parenthesis
```

5.0

```
print(2 - 10) # -8
print(3 * 5) # 15
print(9 / 2) # 4.5
print(9 // 2) # 4
print(9.0 / 2) # 4.5
print(5 - 3 * 2) # -1; priority for *
print((5 - 3) * 2) # 4; priority for - because of the parenthesis
print(2 ** 4) # 16; to the power of
print(8 / 2 ** 2) # 2.0; priority for **
print((8 / 2) ** 2) # 16.0; priority for / cause of parenthesis
```

-8 15 4.5 4 4.5 -1 4 16 2.0 16.0

```
print("1st residue:\tA")
print("2nd residue:\tC")
print("3rd residue:\tG")
print("4th residue:\tT")
```

1st residue: A 2nd residue: C 3rd residue: G 4th residue: T

**C2.3** Print escape characters/newlines/tabs¶

Print the same text including tabulations using a single call to the **print()** function and newline special codes

```
1st residue: A
2nd residue: C
3rd residue: G
4th residue: T
```

#### Solution¶

**C2.3**

```
print("1st residue:\tA\n2nd residue:\tC\n3rd residue:\tG\n4th residue:\tT")
```

1st residue: A 2nd residue: C 3rd residue: G 4th residue: T

**C2.4** Text and numbers/print variables¶

Define 4 variables, that represent the number of the different residues of a DNA sequence, as follows:

```
num_of_A = 24 # adenines
num_of_T = 25
num_of_C = 21
num_of_G = 29
```

Use the variables to print the following output:

```
Number of A = 24
Number of T = 25
Number of C = 21
Number of G = 29
```

#### Solution¶

**C2.4**

```
num_of_A = 24 # adenines
num_of_T = 25
num_of_C = 21
num_of_G = 29
print("Number of A =", num_of_A)
print("Number of T =", num_of_T)
print("Number of C =", num_of_C)
print("Number of G =", num_of_G) # alt: call print() one time instead of four
```

Number of A = 24 Number of T = 25 Number of C = 21 Number of G = 29

**C2.5** Numbers and variables: from kilograms to grams/variables¶

Here is a simple program that converts the mass of a wandering albatross (*Diomedea exulans*) from kilograms to grams and then prints out the resulting value. Copy (**type it yourself**, but comments) and run the next:

```
# Convert from kilograms to grams
mass_kg = 11.937 # descriptive variable name (never x = 11.937)
mass_g = mass_kg * 1000
print(mass_kg, "kg =", mass_g, "g") # nice formated output
```

Expected result:

```
11.937 kg = 11937.0 g
```

```
# Convert from kilograms to grams
mass_kg = 11.937 # descriptive variable name (never x = 11.937)
mass_g = mass_kg * 1000
print(mass_kg, "kg =", mass_g, "g") # nice formated output
```

11.937 kg = 11937.0 g

**C2.6** Numbers and variables: start coding yourself¶

#### From pounds to kilograms/variables.¶

Similarly to the code above:

- Create a variable that stores a
**body mass in pounds (lb)**and assign it a value of**3.5**(Desert Cottontail Rabbit) - Convert this value to kilograms (1 lb is equal to 0.45359237 kg) and store this value into a new variable (use a nice name)
- Print the expected result using also the previous variables (arguments of print)

Expected result:

```
3.5 lb = 1.587573295 kg
```

```
# Convert 3.5 pounds into kilograms
mass_lb = 3.5
lb2kg = 0.45359237 # 1 lb in kg; not really needed: a bit verbose
mass_kg = mass_lb * lb2kg
print(mass_lb, "lb =", mass_kg, "kg")
```

3.5 lb = 1.587573295 kg

**C2.7** Calculate the total biomass in grams for 3 White-throated Woodrats (*Neotoma albigula*) and then convert it to kilograms. The total biomass is simply the sum of the biomass of all individuals, but in this case we only know that the average size of a single individual is 250 grams/variables.¶

**a.** Follow the procedure below and also observe how the names of the variables make sense:¶

- Create a string variable
**species**and assign it to "White-throated Woodrats" - Create a variable
**mass_gr**and assign it the mass of a single Neotoma albigula - Create a variable
**number_of_individuals**and assign it the number of individuals - Create a variable
**biomass_gr**and assign it a value by multiplying the two variables above together - Create a variable
**g2kg**an assign its value (1 gram in kilograms) - Convert the value of biomass into kilograms (using the variable g2kg) and assign to a new variable (
**biomass_kg**) - Print the next expected result using as many variables as possible as arguments

Expected result:

```
3 White-throated Woodrats are about 0.75 kg
```

**b.** Now you are going to practice to change the values of some variables¶

Duplicate the code (of C2.7.a in C2.7.b); if you execute it, you will see twice the previous expected result. Now, you are going to perform a similar calculation for 5 Morning Doves; the average size of a Morning Dove is 128 grams.

Then, **only in the duplicated code**:

- Update the string variable
**species**and assign it to "Morning Doves" - Update the variable
**mass_gr**to the mass of a single Morning Dove - Update the variable
**number_of_individuals**to 5 - Run your code

Expected result:

```
3 White-throated Woodrats are about 0.75 kg
5 Morning Doves are about 0.64 kg
```

```
species = "White-throated Woodrats"
mass_gr = 250
number_of_individuals = 3
biomass_gr = number_of_individuals * mass_gr
g2kg = 10 ** -3 # or 0.001
biomass_kg = biomass_gr * g2kg
print(number_of_individuals, species, "are about", biomass_kg, "kg")
```

3 White-throated Woodrats are about 0.75 kg

**C2.7.b**¶

Solution:

```
# #############
# Previous code
# #############
species = "White-throated Woodrats"
mass_gr = 250
number_of_individuals = 3
biomass_gr = number_of_individuals * mass_gr
g2kg = 10 ** -3 # or 0.001
biomass_kg = biomass_gr * g2kg
print(number_of_individuals, species, "are about", biomass_kg, "kg")
# ################################
# Duplication of the previous code
# (with updates)
# ################################
species = "Morning Doves"
mass_gr = 128 # now for a single Dove in gr
number_of_individuals = 5
biomass_gr = number_of_individuals * mass_gr
g2kg = 10 ** -3 # or 0.001
biomass_kg = biomass_gr * g2kg
print(number_of_individuals, species, "are about", biomass_kg, "kg")
# Note: a clever solution will be to use your own function, but we are not still there
# ------------------------------------------------------------------------------------
```

3 White-throated Woodrats are about 0.75 kg 5 Morning Doves are about 0.64 kg

**c2.8** String operators: + and */also +=¶

Build and print the a DNA sequence using the next string operators: + (string concatenation) and * (string replication). **That is, do not type 21 times A** and so on. Follow the next instructions:

- Create a variable
**seq**with an empty string - Update the value
**seq**, concatenating 21 times A (**use * for the number of times you want to repeat the A**; learn how using google) - Update
by concatenating 19 times C (use += and *). Note that += is also new for you**seq** - Update
by concatenating 16 times G and 23 times T**seq** - Display the value of
**seq**

Expected result:

```
AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTTTTT
```

#### Solution¶

**C2.8**

```
seq = ""
seq = seq + "A"*21
seq += "C"*19 # += is new for you
seq += "G"*16 + "T"*23 #
print(seq) # Now you can easily change the num of times any residue is repeated
```

AAAAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTTTTT

**C2.9** Functions/Type conversions¶

### Built-in functions¶

Use the built-in functions **abs()**, **round()**, **int()**, **float()**, and **str()** to print out the answers to the questions below. A built-in function is one that you don’t need to import a module to use. Use another function, **help()**, to learn how to use any of the functions that you don’t know how to use appropriately. **help()** needs the name of the function you want information about (e.g. **help(abs)**).

- The absolute value of -15.5
- print the help of function
**round()**using function**help()** - print 3.8 rounded to the nearest integer using standard rounding
- print 4.483847 rounded to one decimal place
- convert 3.8 to an integer format using
**int()**, assign the value to a variable, and print it - convert the answer to the previous question to a string and assign it back to the same variable name, print out the value
- convert the answer to the previous question to a float and assign it back to the same variable name, print out the value

#### Solutions:¶

**C2.9**

```
abs(-15.5)
```

15.5

```
help(round)
```

Help on built-in function round in module builtins: round(number, ndigits=None) Round a number to a given precision in decimal digits. The return value is an integer if ndigits is omitted or None. Otherwise the return value has the same type as the number. ndigits may be negative.

```
round(3.8)
```

4

```
round(4.483847, 1)
```

4.5

```
# cast
num = int(3.8)
print(num)
type(num)
```

3

int

```
num = int(3.8)
num = str(num)
print(num)
type(num)
```

3

str

```
num = int(3.8)
num = str(num)
num = float(num)
print(num)
type(num)
```

3.0

float

**C.10** Math module/**import** a module (new concept)¶

Use the **sqrt()** and **log()** functions from the **math** module, along with the built-in **round()** function to print the answers to the following questions to the screen.

- How long is one side of a square plant census quadrat that has an area of 10 $km^2$?, round it to two decimal places
- The number of species in a region can be estimated based on the area (in $km^{2}$) of that region using the next equation:
**number of species = 3.5 + 0.25 * log(area)**. As said before, log is the natural logarithm and it is not a built-in function. For the area of 10 $km^2$ what is the estimated number of species? That is, the integer part of the calculated number

Expected solution:

```
One side of the square plant(quadrat) is 3.16 km
Estimated number of species in the region = 4
```

#### Solution¶

**C2.10**

```
import math
area = 10 # km**2
print("One side of the square plant(quadrat) is", round(math.sqrt(area), 2), "km")
print("Estimated number of species in the region =", int(3.5 + 0.25 * math.log(area)))
```

One side of the square plant(quadrat) is 3.16 km Estimated number of species in the region = 4

**C2.11** Gene expression/import math module¶

Let consider the following results of a gene expression analysis of a given gene in cells from a patient and a control:

```
patient = 42.55 # expression value
control = 10.12 # expression value
pvalue = 1e-04 # p-value for significant over expression
```

In the literature, the gene is considered significantly over-expressed with a log transformed fold change (base-2 logarithm) greater than 2 and a log transformed p-value (negative of the base-10 logarithm) greater than 3. Calculate these values for comparison as follows:

- Import the functions
**log2**and**log10**from the**math**module, or the whole math module - Assign the variables from above (patient, control, pvalue)
- Compute the fold change (ratio between patient and control), store the value in variable
**fc**and print it - log transform the fold change (base-2 log), store the value in variable
**log2fc**and print it - log10 transform the P-value, multiply it by -1, store the value in variable
**significance_pval**and print it

Expected results:

```
Fold change = 4.204545454545455
Log2 fold change = 2.071949841879015
-1 * Log10(P-value) = 4.0
```

#### Solution¶

**C2.11**

```
import math
patient = 42.55 # expression value
control = 10.12 # expression value
pvalue = 1e-04 # p-value for significant over expression
fc = patient / control
print("Fold change =", fc)
log2fc = math.log2(fc)
print("Log2 fold change =", log2fc) # This is 2.07, > 2
check_significance_pval = (-1) * math.log10(pvalue) # This is 4; that is > 3
print("-1 * Log10(P-value) =", check_significance_pval)
```

Fold change = 4.204545454545455 Log2 fold change = 2.071949841879015 -1 * Log10(P-value) = 4.0

**C2.12** Calculating the AT content of a DNA sequence/length/count¶

Let define the **AT content** of a sequence as the percentage of occurrences of A and T (A and T, not only AT) in the total length of the sequence. Then, calculate and print the AT content of the defined DNA sequence (be careful while typing it). You have to use the function **len()**, the string method **str.count()**, and some arithmetic operators (such as + and /).

```
seq = "TATAGATTACAGGG"
```

Expected solution:

```
seq = TATAGATTACAGGG
AT content of TATAGATTACAGGG = 64.28571428571429 %
```

#### Solution¶

**C2.12**

```
seq = "TATAGATTACAGGG"
print("seq =", seq)
print("AT content of", seq, "=", 100 * (seq.count('A') + seq.count('T')) / len(seq), "%")
```

seq = TATAGATTACAGGG AT content of TATAGATTACAGGG = 64.28571428571429 %

**C2.13** A longer DNA sequence/replace¶

Follow this procedure:

- Copy/paste your code from above (C2.12)
- Replace the seq variable assignment with the following:

```
seq = """TATA GATTACA
GGG""" # string on 2 lines
```

- Run it. The result is different to that of C2.12, because the variable seq is different
- Now, in the same program duplicate the code (just copy it again)
- Only in the duplicated code, replace the seq assignment with new statements. In those statements, you have to replace, within the variable seq, the space and newline characters by and empty str; for that, use the string method
**str.replace()** - Finally, run the program again an you should obtain the same result that in C2.12

Expected solution:

```
seq = TATA GATTACA
GGG
AT content of TATA GATTACA
GGG = 56.25 %
seq = TATAGATTACAGGG
AT content of TATAGATTACAGGG = 64.28571428571429 %
```

#### Solution¶

**C2.13**

```
seq = """TATA GATTACA
GGG"""
print("seq =", seq)
print("AT content of", seq, "=", 100 * (seq.count('A') + seq.count('T')) / len(seq), "%\n")
# Code duplication
seq = seq.replace(" ", "") # all the ocurrences of " " in the str
seq = seq.replace("\n", "") # in the next exercise we will see an alternative to replace
# you can also replace with regular expressions (more advanced)
print("seq =", seq)
print("AT content of", seq, "=", 100 * (seq.count('A') + seq.count('T')) / len(seq), "%")
```

seq = TATA GATTACA GGG AT content of TATA GATTACA GGG = 56.25 % seq = TATAGATTACAGGG AT content of TATAGATTACAGGG = 64.28571428571429 %

**C2.14** Complementing DNA/replace method¶

Calculate and print the complement of a sequence defined as follows using the **replace()** method:

```
seq = "GATTACAGGGTATA"
```

Extra task: Get the reverse complement

Expected solution:

```
GATTACAGGGTATA (Seq)
CTAATGTCCCATAT (Complement seq using str.replace())
TATACCCTGTAATC (Extra task: reverse complement)
```

#### Solution¶

**C2.14**

```
seq = "GATTACAGGGTATA"
print(seq, "\t(Seq)")
seq = seq.replace("C", "g")
seq = seq.replace("T", "a")
seq = seq.replace("G", "c")
seq = seq.replace("A", "t")
seq = seq.upper()
print(seq, "\t(Complement seq using str.replace())")
print(seq[::-1], "\t(Extra task: reverse complement)")
```

GATTACAGGGTATA (Seq) CTAATGTCCCATAT (Complement seq using str.replace()) TATACCCTGTAATC (Extra task: reverse complement)

**C2.15** Complementing DNA/maketrans and translate/reverse complement¶

### str.translate() method¶

Calculate and print the complement of a sequence defined as follows using the **maketrans()** and **translate()** methods:

```
seq = "TATAGATTACAGGG"
```

Finally, get the reverse complement of the sequence

Expected solution:

```
TATAGATTACAGGG (seq)
ATATCTAATGTCCC (Complement seq using str.maketrans() and str.translate())
CCCTGTAATCTATA (Reverse complement seq)
```

#### Solution¶

**C2.15**

```
seq = "TATAGATTACAGGG"
translation_table = seq.maketrans("CGTA", "GCAT")
# print(translation_table)
print(seq, "\t(Seq)")
seq = seq.translate(translation_table)
print(seq, "\t(Complement seq using str.maketrans() and str.translate())")
print(seq[::-1], "\t(Reverse complement seq)")
```

TATAGATTACAGGG (Seq) ATATCTAATGTCCC (Complement seq using str.maketrans() and str.translate()) CCCTGTAATCTATA (Reverse complement seq)

**C2.16** Restriction fragment lengths/str.find()/str.split()¶

The recognition site (motif) of a bacterial restriction enzyme is **CCAGG**. For the next sequence:

```
seq = "TATACCAGGGATTACAGGG"
```

The cut is just before the recognition site. That is: 5'--->TATA*CAGGGAT--->3'.

Then, follow the next steps:

- Display the sequence and its length
- Print the index (position) of the motif in the sequence using the method
**str.find()** - Create and print a list containing the two cut fragments. In this case, for the sake of simplicity, exclude the motif in the second fragment and just use the method
**str.split()** - Print the length of each fragment (including the motif)
- Print the addition of the lengths of the two fragments: it should be equal to the length of the original DNA sequence.

Expected solution:

```
TATACCAGGGATTACAGGG length: 19
Index of the motif in the sequence (array coord.): 4
List of 2 fragments (excluding the motif in the 2nd): ['TATA', 'GATTACAGGG']
Size of 1st fragment: 4
Size of 2nd fragment (complete): 15
Checking the length: 19
```

#### Solution¶

**C2.16**

```
seq = "TATACCAGGGATTACAGGG"
print(seq, "\tlength:", len(seq))
motif = "CCAGG"
motif_pos = seq.find(motif)
seq_l = seq.split(motif)
print("Index of the motif in the sequence (array coord.):", motif_pos)
print("List of 2 fragments (excluding the motif in the 2nd):", seq_l)
print("Size of 1st fragment:" , len(seq_l[0]))
print("Size of 2nd fragment (complete):", len(seq_l[1]) + 5)
print("Checking the length:", len(seq_l[0]) + len(seq_l[1]) + len(motif))
```

TATACCAGGGATTACAGGG length: 19 Index of the motif in the sequence (array coord.): 4 List of 2 fragments (excluding the motif in the 2nd): ['TATA', 'GATTACAGGG'] Size of 1st fragment: 4 Size of 2nd fragment (complete): 15 Checking the length: 19

# Sources¶

- https://docs.python.org/3.11/
- partly from www.programmingforbiologists.org

```
```