C4 Exercises
Contents
C4 Exercises#
c4e1: Processing DNA in a file#
We have 5 different DNA sequences in the input file (“c4e1_input_seqs.txt”). Each sequence in a different line. Every DNA sequence contains at the start the very same sequence fragment (14 nt from a sequencing adapter) and at the end 20 nt of a repeat (poly_ATGC; that is 5 times the repeat).
In an output file (“c4e1_output.fasta”) should be displayed an “alignment”. For each input sequence:
In one line the original sequence and its length
In another line the “shifted” sequence (without the 14 nt fragment and without the 20 nt poly_ATCG) followed by its length (without considering the starting fragment and ending repeats).
And between the previous lines, another line showing the alignment (with “|” characters).
See the sample for the sake of clarity.
Sample#
Input:
"c4e1_input_seqs.txt" contains:
ATTCGATTATAAGCTCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCATGCATGCATGCATGC
ATTCGATTATAAGCACTGATCGATCGATCGATCGATCGATGCTATCGTCGTATGCATGCATGCATGCATGC
...and so on
Output:
"c4e1_output.fasta" contains, for 2 of the 5 DNA input sequences:
ATTCGATTATAAGCTCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCATGCATGCATGCATGC 76
||||||||||||||||||||||||||||||||||||||||||
TCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC 42
ATTCGATTATAAGCACTGATCGATCGATCGATCGATCGATGCTATCGTCGTATGCATGCATGCATGCATGC 71
|||||||||||||||||||||||||||||||||||||
ACTGATCGATCGATCGATCGATCGATGCTATCGTCGT 37
...and so on
c4e2: Multiple exons from genomic DNA#
A sequence of DNA is contained in one line within the file “c4e2_input_genomic_dna.txt” and the file “c4e2_input_exons.txt” contains the positions of 4 exons within the DNA sequence: the start and end (separated by comma) of the 4 exons are in 4 different lines.
Write in a file (“c4e2_output.txt”) the exons concatenated with a human readable spacer (“<—>”)
Sample#
Input:
"c4e2_input_genomic_dna.txt" contains:
TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCA...
"c4e2_input_exons.txt" contains:
5,58
72,133
...
Output:
"c4e2_output.txt" contains:
CGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCG<--->CGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGA<--->CGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTA<--->CGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCG
Note:#
The exon positions: start, end are in array coordinates [0,…], not in biological coordinates [1,…]. That is, start is inclusive and end is exclusive.