## Proteins in bioinformatics --- ### Protein coding genes in genomic sequences and annotation of proteins -DNA, proteins: length scale - Protein length - Protein weight -bioinformatics works with life, humans and computers -storing information: -strings of bits (computers) -DNA sequences -protein sequences -applications in biology: -a DNA storage device -fasta file format, Multifasta file format -Uniprot -UCSC genome browser -BLAT --- ### Homology -homology intro: homology, similarity, analogy (examples) -protein. relation between: sequence, structure and function -the Rost curve -gene duplication, speciation -paralogy (gene duplication), orthology (speciation) -convergent evolution -horizontal gene transfer -BLAST, BLASTP -first protein sequence against a database search (Russell F. Doolittle) --- ### Multiple sequence alignment (MAS) -introduction to alignment -classical alignment representation: * (identity), : (conserved), . (semi-conserved), blank (mismatch) -an application of sequence alignment to epigenetics: bisulphite sequencing -introduction to sequence alignment -pairwise alignment vs. multiple sequence alignment: why is good to perform a MAS? -pairwise alignment: a brute force algorithm -basic metrics: Hamming, Levenshtein -scoring schemes, substitution matrices (Dayhoff PAM, BLOSUM) -gaps (indels) -classification of algorithms: global alignments (Needleman and Wunsch) vs. local alignments (Smith and Waterman) -multiple alignment implies pairwise alignment. -pairwise alignment does not implies multiple alignment. -pairwise alignment, different methods and applications -dynamic programming: from the manhattan graph problem to seq. alignment. - combinatorial optimization: Seven bridges of Königsberg (graphs), Travelling Salesman Problem (TSP), Manhattan tourist problem. -dynamic programming computational complexity and the necessity of using heuristics. What is an heuristic? -word methods for pairwise alignment: BLAST, FASTA -sequence alignment profiles. sequence logos -can we align a profile against a sequence?. can we align a profile against a profile? -multiple sequence alignment, the algorithm --- ### Protein structure -central dogma of Molecular Biology -amino acids, polypeptides -chemical properties of the amino acids, classification -protein structure (different chemical interactions, dissociations energies) -protein structure: primary, secondary, tertiary, quaternary structures. -protein folding problem: 1. Levinthal's paradox 2. Anfinsen's dogma -relation between: prot. sequence, prot. structure and prot. function --- ### PDB (protein data bank) -intro to the data Bank. -evolution of the database -current state, diff. experimental method contributions -computed structural models (AlphaFold) -X-ray crystallography -nuclear magnetic resonance -electron microscopy -RCSB PDB, what can I retrieve from a PDB id? -PDB text file description -graphical tools for visualisation: Chimera