Lecture notes on network biology

Gregorio Alanis-Lobato, PhD

Postdoctoral Fellow

Computational Biology and Data Mining Lab

Johannes Gutenberg University, Mainz

Powered by p5.js, Cytoscape.js and plotly.js.

Tested on Chrome ≥ v55 and Firefox ≥ 51.

For information and feedback, contact me.

Complex systems

Complex system: A group of heterogeneous entities whose interaction leads to collective behaviours and emergent properties that the individual parts do not display.

Emergent properties: Properties that a complex system displays, but that none of its constituents have. They are usually the result of simple rules of interaction between system components.

Fallacy of division: Failure to realise that a property is emergent, can lead to the fallacy of division. The taste of saltiness, for example, is a property of salt, but neither sodium nor chlorine are salty.

Complex systems (cont.)

Flocking is a very common example of emergent property. It represents the emergence of self-organisation in systems composed of many autonomous birds or insects.

Being an emergent property, flocking behaviour arises from the interaction of individual agents adhering to a set of simple rules. As a result, flocking can be easily simulated in the computer. In 1986, Craig Reynolds developed an artificial life program, where boids (bird-oid objects) observe the following rules:

- Separation: Steer to avoid crashing with nearby flockmates.
- Alignment: Steer towards the average heading of nearby flockmates.
- Cohesion: Steer to move towards the average position of nearby flockmates.

Drag the mouse to add boids (a maximum of 50 are added), click to pause or run:

Check out Conway's Game of Life for another example of complexity emerging from simple rules.

Complex networks

To facilitate the analysis of complex systems, they can be represented as networks/graphs, where system components are abstracted by nodes/vertices and their interactions by edges/links/ties connecting them. Measurement of node and edge features allows for the topological characterisation of the network.

Complex networks (cont.)

In order to figure out the origin of the topological characteristics of complex systems, we can model the formation of their network representations by imposing connectivity rules on nodes. The following simulation allows you to explore two network formation scenarios:

⚫ Random:: - A link between two nodes is formed with probability p.
⚫ Duplication-divergence:: - We start with a network of two connected nodes.; - A node is chosen at random and duplicated with all its links.; - Each link of the new node is retained with probability p.; - If no link is retained, the duplicate is discarded.

p = 0.05

Complex networks (cont.)

The network representations of many complex systems, like the Internet or social networks, have the following three important topological properties:

⚫ Scale free:: - In complex networks there are many nodes of low degree and only a few highly connected ones (known as hubs). This makes it difficult to have an idea of the scale of the node degrees (hence the name scale-free).
⚫ Small world:: - The average shortest path length in the network grows logarithmically with the number of nodes. This means that with a small number of hops it is possible to go from one node to any other.
⚫ Strongly clustered:: - The average clustering coefficient in complex networks is very high when compared to random networks. As a result, nodes tend to create tightly knit groups with a relatively high density of links. The clustering coefficient of a node is defined as the number of observed links between its direct neighbours, divided by the total they could form.

Protein interaction networks

Most functions within the cell emerge thanks to a complex network of protein interactions. Failure of the control mechanisms behind these delicate relationships can lead to complex human disorders.

Being a complex network, the human protein interactome presents the scale free, small world and strong clustering properties. These topological features have a strong impact on the function and dynamics of the network. For example, hubs have been shown to be highly conserved proteins with essential functions, strong clustering is indicative of the presence of groups of proteins and complexes involved in similar processes and the small world property has an important effect on the way that signals (hormones, ligands, energy, infections, etc.) are spread throughout the system.

Although the emergence of strong clustering in complex networks is still a very active subject of research, the most probable reason for the presence of a scale free degree distribution in the human protein interactome is the duplication of genes and their subsequent functional divergence due to mutations.

High-throughput measurement of PPIs

Of the many methodologies that can measure protein-protein interactions, two are currently in wide use for large-scale mapping: the yeast-two-hybrid (Y2H) system and affinity- or immunopurification followed by some form of mass spectrometry (AP/MS).

Yeast-two-hybrid screening:

Affinity purification:

Co-Immunoprecipitation:

Protein interaction databases

When research groups screen for protein interactions for a certain project, they usually accompany the associated publications with a list of such interactions or they deposit them in specialised repositories where expert curators analyse the data and make it available in standardised formats.

Examples of the above-mentioned repositories are the BioGRID, IntAct and MINT.

There are also resources that integrate data from the above repositories and facilitate the construction and analysis of high-quality protein networks. These databases report interactions together with a confidence score that reflects how reliable they are. Some representative examples are STRING, GeneMania and HIPPIE.

Exercise

Consider the following hypothetical scenario:

The University Medical Centre at JGU-Mainz is carrying out a study on children with Progeria, an abnormal congenital condition characterised by premature aging. Progeria's most evident manifestations are premature greying, hair and hearing loss, cataracts, arthritis, wrinkles and loose skin. The latter being the result of muscle and skin cell senescence.

Based on different screenings involving sick and healthy children, researchers at UniMedizin have identified 44 genes associated with Progeria and have asked you to analyse them in order to better understand the molecular basis of this disease and pinpoint potential drug targets.

Armed with the Network Biology tools that you've just learnt, you are set to help them out.

1. Use HIPPIE's Network Query to visualise high-confidence interactions within the set of Progeria genes from UniMedizin. Verify that there is indeed a significant association between these genes and Progeria.

         FBLN5
         LZTS2
         AP2B1
         FN1
         TRAF2
         VIM
         ACTN1
         PKM2
         ARHGEF10L
         MAPK7
         TES
         BAG6
         EFEMP2
         TRAF3IP3
         FBLN2
         PLSCR1
         UBE2I
         RBPMS
         PRMT1
         RBCK1
         TUBA1A
         TRAF1
         LMNA
         TMPO
         LMNB1
         GAPDH
         NTN4
         NFKB1
         PPP1CA
         RNF31
         C1QBP
         KPNA2
         SRI
         ZMYND11
         CBX3
         DYNLL1
         GCA
         NR4A1
         RPL4
         ZYX
         ACTN4
         LGALS3BP
         MAPRE1
         SP100

2. Identify the top-3 hubs in the Progeria subnetwork. Study their function in UniProt. Are these proteins functionally related? What could their role be in Progeria?

3. Identify the node with the highest clustering coefficient in the Progeria subnetwork. Study its function and the function of its direct partners in UniProt. Can you deduce the function of this protein complex? (hint: you don't really have to calculate clusterings here, instead look for closed-triangle motifs in the network)

4. The UniMedizin researchers tell you that they're particularly interested in the Zyxin protein (ZYX), because its function is not well-known. Can you infer it by analysing the function of its partners in the Progeria subnetwork?

5. What's the size of the largest connected component (LCC) of the Progeria subnetwork? Taking random sets of 44 proteins from the human protein interactome results in LCCs with an average size of 3. What does this difference tell you about the inter-connectedness of the Progeria subnetwork?

6. The average shortest path length of the human protein interactome is approximately 5. The average shortest path length of the Progeria subnetwork is slightly larger than 3. What can you deduce about the importance of this difference?

7. Previous research has pointed at protein LMNA as one of the most important factors behind Progeria. This protein is crucial in nuclear assembly, nuclear membrane formation and telomere dynamics. Study the function of the direct partners of LMNA in UniProt. Do these interactions support the importance of LMNA in this disease? Why?

8. The role of LMNA in telomere dynamics intrigues you. You know that at every cellular division telomeres get shorter, up to the point where they can't shrink anymore and cells enter a state called cellular senescence. You suspect that this might be connected to the molecular basis of Progeria. Are there any differences in telomere length between Progeria patients and age-matched children? (hint: do a Google search for progeria and telomere length).

9. You now have three important pieces of information for your final report for UniMedizin: the function of topologically important nodes in the Progeria subnetwork, your analyses about the inter-connectedness of the subnetwork and the telomere length involvement in aging. Can you put these three pieces together and speculate about the biological processes that are affected by mutations in the 44 genes identified by your colleagues at UniMedizin?

10. What kind of molecular functions could be the target of a new therapy against Progeria, based on your network analysis?

Conclusions

A wide range of network-based approaches have been and are being developed to address problems with relevance to biology and human health. Some of the scenarios where network analysis is playing an important role are:

⚫ Gene function prediction: Since there is a big number of genes with no known function, looking at the function of the network partners of a gene product can help in the determination of its biological role.
⚫ Detection of protein complexes and other modular structures: Given the evidence that biological networks show modularity and principles of higher-order organisation, we can exploit this to identify protein complexes and network motifs that have an impact in biological pathways and processes.
⚫ Prediction of new interactions: The identification of special connectivity features of interacting proteins and genes can help in the prediction of new interactions.
⚫ Analysis of disease modules: Human diseases are rarely the result of mutations in a single gene. This has resulted in a new approach to medicine and pharmaceutics, where protein subnetworks are the target of therapy rather than individual genes.

Questions

1. Give a definition for complex system and for emergent properties.

2. Give an example of the fallacy of division.

3. What's the degree of the white node in the following network? What's the length of its shortest path to the black node? What's the clustering coefficient of the striped node? What's the size of the network's largest connected component?

4. Consider the following two networks and their degree distributions. Which one is more likely to represent a complex system and why?

5. List the topological properties that are common to most network representations of complex systems, like the human protein interactome.

6. What are two of the ingredients that might be responsible for the scale free degree distribution of protein interaction networks?

7. Mention two high-throughput experimental techniques to measure protein-protein interactions.

8. Mention two protein-protein interaction databases from which you can construct high-quality protein networks.

9. Mention three problems with relevance to biology and human health, where network analysis is playing an important role.