C3 Exercises#

In the previous chapter we had:#

c2e6: splicing structure (simplified for c3e1)#

a DNA sequence (from the + strand) has the next structure:

Exon1-Intron-Exon2

The first exon runs from the start of the DNA sequence (biological coordinates not array coordinates) up to the position 63.

  • Notes:

    • Be careful with your problem formulation. In this case, position 63 is the end but it is within the intron.

    • Do not mix up biological coordinates and array coordinates.

The second exon starts in the position 91 (biological coordinates, ending up at the end of the sequence. Considering that the whole exons code for protein (CDS: Coding Dna Sequences).
a. Print the exon sequences and their lengths, one line per sequence.

Sample#

Input:

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT

Output a:

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCG 62
ATCATCGATCGATATCGATGCATCGACTACTAT 33

c3e1: write splicing structure#

From the results from problem c2e6, write the exons in separate files, but in lower case:
File names:

  • “c3e1_output_exon1.txt”

  • “c3e1_output_exon2.txt”

Indicate to the standard output (monitor) that you have already written the files.

Sample#

Input:

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT

Output:

...already written the file: c3e1_output_exon1.txt
...already written the file: c3e1_output_exon2.txt

"c3e1_output_exon1.txt" must contain:
atcgatcgatcgatcgactgactagtcatagctatgcatgtagctactcgatcgatcgatcg

"c3e1_output_exon1.txt" must contain:
atcatcgatcgatatcgatgcatcgactactat

c3e2: write some fasta files and a multifasta#

a Write four fasta files, with for different repeat sequences of length 10:
File names:

  • “c3e2_output_polyA.fa”

  • “c3e2_output_polyT.fa”

  • “c3e2_output_polyG.fa”

  • “c3e2_output_polyC.fa”

Indicate to the standard output (monitor) that you have already written the files.

b Write the same four sequences in a multifasta file
File names:

  • “c3e2_output_multifasta.fasta”
    Indicate to the standard output (monitor) that you have already written the file.

Sample#

Output a:

"c3e2_output_polyA.fa" must contain:  
>poly A  
AAAAAAAAAA  

"c3e2_output_polyT.fa" must contain:  
>poly T  
TTTTTTTTTT  

"c3e2_output_polyG.fa" must contain:  
>poly G  
GGGGGGGGGG  

"c3e2_output_polyC.fa" must contain:  
>poly C  
CCCCCCCCCC  

...already written the file: c3e2_output_polyA.fa
...already written the file: c3e2_output_polyT.fa
...already written the file: c3e2_output_polyG.fa
...already written the file: c3e2_output_polyC.fa

Output b:

"c3e2_output_multifasta.fasta" must contain:  
>poly A  
AAAAAAAAAA  
>poly T  
TTTTTTTTTT  
>poly G  
GGGGGGGGGG  
>poly C  
CCCCCCCCCC  

...already written the file: c3e2_output_multifasta.fasta  

Note:#

  • “fa” or “fasta” are both extension for fasta files

Tip:#

  • be aware of the end of lines: “\n”