A series of recently introduced algorithms and models advocates for the existence of a hidden metric space underlying the structure of the network representation of complex systems. Interestingly, it has been shown that if the geometry of this metric space is hyperbolic, it is possible to accurately describe the formation and dynamics of real networks, like the Internet or the international trade system.
Here, we present evidence that this also holds true in a biological context. The network above is a high-quality human protein interactome (hPIN) contained in the native representation of the hyperbolic plane (i.e. a circle of radius R ∼ ln(N), where N is the number of network nodes). We constructed this network based on a stringent subset of the Human Integrated Protein-Protein Interaction rEference (HIPPIE) v2.0 and embedded it to the hyperbolic plane using LaBNE+HM, an approach that combines manifold learning and maximum likelihood estimation for fast yet accurate embeddings. The resulting hPIN contains 10824 proteins and 66154 interactions between them (PPIs).
Once mapped to hyperbolic space, each hPIN protein i lies at polar coordinates (ri, θi). Inferred radial coordinates r, associated with node popularity or seniority, agree with actual protein birth-times. This means that proteins that appeared early in evolution, tend to be closer to the centre of the hyperbolic plane, while younger species-specific proteins lay on its periphery. To determine the birth-time of the hPIN nodes, proteins from the manually curated database SwissProt were grouped based on near full-length similarity and/or high threshold of sequence identity using FastaHerder2. If proteins from two evolutionarily distant organisms are part of the same group, this suggests that the protein family is ancient. If proteins are only found in the species of interest, then they are species-specific and young.
On the other hand, grouping proteins by their inferred angular coordinates θ (a variable abstracting the characteristics that make a node similar to others) reveals the functional and spatial organisation of the cell. This is supported by the three gene ontologies (biological process or BP, cellular component or CC and molecular function or MF) and by KEGG pathways.
Using both dimensions, we can compute hyperbolic distances between proteins, which hint at their likelihood of interaction. In addition, the underlying metric space of the hPIN can be used to study how biological signals efficiently navigate the network without knowledge of its global structure. To do so, the inferred hyperbolic coordinates of proteins are used as addresses to get a signal closer and closer to a target via greedy routing. This means that the source checks which one of its direct neighbours is hyperbolically closest to the target and sends the signal there. The new node checks amongst its direct partners for the one closest to the target, and so on, until the signal reaches its destination.
The RESET button will bring the webtool to its default settings.
Proteins panel: You can colour proteins according to similarity-based clusters, age group and specific protein classes (transcription factors, receptors, transporters, RNA-binding proteins, constituents of the cytoskeleton or proteins involved in proteolysis) based on the available choices of the dropdown menu. Searching for a particular protein in the corresponding textbox (UniProt ID, Gene Symbol and Entrez ID are accepted) will show a list of suggestions (a minimum of 3 characters is needed for the list of suggestions to appear). Selecting one will show the position and gene symbol of the selected protein in hyperbolic space, coloured by similarity-based cluster. If name labels are too obstructive, they can be deactivated for future searches by unticking the corresponding checkbox. Hovering the mouse over this protein will show a tooltip with a link to its UniProt entry, a link for its in-depth analysis in HIPPIE, the protein's exact hyperbolic coordinates and its assigned age group. The link to HIPPIE, allows for a detailed analysis of the subnetwork including the protein of interest and all those not further than a hyperbolic distance of 20 from it (if the number of closest proteins is too big, only the 25 closest are considered).
Protein interactions and greedy routing panel: In order to check whether two proteins interact or to greedy route a signal between them, type a protein name (UniProt ID, Gene Symbol and Entrez ID are accepted) in the From... and To... textboxes and select the proteins from the corresponding lists of suggestions (a minimum of 3 characters is needed for the list of suggestions to appear). Clicking the CHECK PPI/ROUTE SIGNAL button will then show the position of the selected proteins in hyperbolic space and the corresponding link or greedy path between them. Source and target are shown in green and red, respectively. Intermediate proteins in greedy paths are shown in black. If it was not possible to find a greedy path between the selected proteins, you will be notified and the last protein in the path will be shown in grey.
Cluster information panel: Hovering over the hPIN proteins will show specific similarity-based cluster information in this panel, regardless of the colour mode chosen from the dropdown menu of the Proteins panel.
Protein age determination based on sequence identity is error-prone due to non-continuous evolution, extinction and sequencing biases. The labels of similarity-based clusters represent the most over-represented BPs, CCs, MFs and KEGG pathways associated with their constituents. This means that some cluster members may not be involved in or be part of such biological functions, cell compartments or pathways.
If you find this web-tool useful, please cite the following publications: