C1. Why Python?
Contents
C1. Why Python?#
Introduction#
The evolution of technology in society#
In this case, a pair of images are worth thousand words
Before |
After 2013 |
---|---|
Photos by: (Luca Bruno/AP) and (Michael Sohn/AP)
The evolution of technology in biology#
The day before yesterday |
Yesterday |
Today |
---|---|---|
Recommended bibliography#
On the role of experiments in scientific development:
Galison, Peter. Image and Logic. Chicago: University of Chicago Press, 1997
In contraposition, on the role of theory in scientific progress:
Kuhn, Thomas S. The Structure of Scientific Revolutions (1962; fourth edition 2012)
What kind of language is Python?#
Some properties:
It is a high-level language
It is interpreted
It is a multi-paradigm language
Some advantages of Python:#
General purpose language
Lots of python programmers all over the world and in the scientific community
Resources, learning material, courses
Help
Libraries
Easier to learn with it because of its syntax
Consistent and very readable; therefore also easy to maintain
Most popular technologies#
Including programming languages
2022 Survey: stackoverflow.com
Companies coding in Python:#
Youtube
Google
Spotify
Instagram
Dropbox
Netflix (Machine learning)
Pinterest
Quora
Reddit
Yahoo!
And most of the Biotechnology companies
Scientific articles on Python#
The digital toolbox (Nature 2015)
Programming: Pick up Python (Nature 2015)
Array programming with NumPy (Nature 2020)
SciPy 1.0: fundamental algorithms for scientific computing in Python (Nature Methods 2020)
Note that Python is used within the methods and/or figures of the scientific articles
Present and Past in Bioinformatics#
Web-query against Pubmed:#
Python [Title] Again, be aware that usually Python is not the main subject of the article (title)
Conclusion: Perl was used in the past and Python nowadays
Numbers vs. text in Biology#
Biologists need to deal with sequences: DNA, RNA, proteins.
Bioinformaticians use the Fasta format to represent any sequence
NCBI#
Fasta format#
PTEN.fa or PTEN.fasta
>AAD13528.1 PTEN [Homo sapiens]
MTAIIKEIVSRNKRRYQEDGFDLDLTYIYPNIIAMGFPAERLEGVYRNNIDDVVRFLDSKHKNHYKIYNL
CAERHYDTAKFNCRVAQYPFEDHNPPQLELIKPFCEDLDQWLSEDDNHVAAIHCKAGKGRTGVMICAYLL
HRGKFLKAQEALDFYGEVRTRDKKGVTIPSQRRYVYYYSYLLKNHLDYRPVALLFHKMMFETIPMFSGGT
CNPQFVVCQLKVKIYSSNSGPTRREDKFMYFEFPQPLPVCGDIKVEFFHKQNKMLKKDKMFHFWVNTFFI
PGPEETSEKVENGSLCDQEIDSICSIERADNDKEYLVLTLTKNDLDKANKDKANRYFSPNFKVKLYFTKT
VEEPSNPEASSSTSVTPDVSDNEPDHYRYSDTTDSDPENEPFDEDQHTQITKV
Multifasta#
sars2.fa or sars2.fasta
>sp|P0DTC2|SPIKE_SARS2 Spike glycoprotein OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=S PE=1 SV=1
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFS
NVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIV
NNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLE
GKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK
CTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISN
CVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD
YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVN
FNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP
GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSY
ECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQE
VFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDC
LGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM
QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
TLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRA
SANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPA
ICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDP
LQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL
QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDD
SEPVLKGVKLHYT
>sp|P0DTC4|VEMP_SARS2 Envelope small membrane protein OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=E PE=1 SV=1
MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYS
RVKNLNSSRVPDLLV
>sp|P0DTC5|VME1_SARS2 Membrane protein OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=M PE=1 SV=1
MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAYANRNRFLYIIKLIFLWLLWPV
TLACFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASFRLFARTRSMWSFNPETNILL
NVPLHGTILTRPLLESELVIGAVILRGHLRIAGHHLGRCDIKDLPKEITVATSRTLSYYK
LGASQRVAGDSGFAAYSRYRIGNYKLNTDHSSSSDNIALLVQ
Last but not least#
We will use only modern Python. That is, only Python 3 and not the old Python 2