• What is ProteinPathTracker?
    ProteinPathTracker is an easy-to-use web tool that allows to track the evolutionary history of a query protein.
  • For what should it be used?
    To study the path through evolution of a query protein. ProteinPathTracker tracks the homologs of a query protein in one of the six available paths, and provides their GO terms. The study of the homolog's annotations may serve to know how did the protein evolve in the selected proteomes.
  • For which type of user is it prepared?
    ProteinPathTracker is designed to look as simple as it can get, so that any user may execute it. More advanced functionalities like locking a region are optional.
  • Which datasets are used?
    193 complete reference proteomes were downloaded from the UniProt Proteomes section on 27th April 2017. These proteomes belong to different taxonomic groups. GO terms for each protein were obtained from their individual UniProt entries. The proteomes are distributed in six evolutionary paths:
    1. cellular organisms → Homo
    2. Primates → Homo
    3. Viridiplantae → Arabidopsis
    4. Fungi → Schizosaccharomyces
    5. Bacteria → Escherichia
    6. Arthropoda → Drosophila

  • How should you use ProteinPathTracker?
    ProteinPathTracker needs just a protein sequence in FASTA format to execute, either pasted or uploaded. The rest of the parameters are optional. After providing the protein sequence, click in "Track ProteinPathTracker, track!" to start its execution. Results will appear in the same window in less than one minute.
  • What does the "evolutionary path" mean?
    As the web tool looks for homologs through evolution, the evolutionary path shows the proteomes in which homologs are searched. By default, the selected proteomes are used, but in some taxa the user may change the desired proteome.
  • Which parameters can be modified to start the execution of ProteinPathTracker?
    Evalue cutoff to study the significance of the Blast searches can also be modified (default evalue = 0.1).
       By default, the lock region strategy is off, but it can be turned on by selecting the coordinates of the lock region to follow in the homologs. The minimum length of any locked region is 10 aminoacids. If the user locks a region, the minimum %coverage of the selected locked region in the results can also be specified. By default, a minimum of a 75% coverage of the initial locked region must be met to consider positive a region.

  • What does the functionality "lock region" mean?
    The evolution of a particular subregion is followed throughout the protein path in evolution, via its mapping in the different homologs.
  • When should you lock a region?
    When you are interested in a specific motif or domain in the query protein. It allows the tracing of a domain in, for example, multidomain proteins. Locking an annotated region with a particular function may help in assessing whether the homologs share it or not, and even when it appeared in evolution.
  • Why did you not get any locked regions?
    If you didn't get any locked region, even if you selected it, try by selecting a different region or query (amongst the orthologs found), or by modifying the minimum locked region coverage (75% by default).


  • How does ProteinPathTracker work?
    It looks for homologs in the selected proteomes following a step-by-step strategy. First, it looks for the most similar protein in our database to the query protein; that database was generated by joining together all the default proteomes from the selected path. Using that protein as a way to get in the proteome path (taxon X), it looks for its ortholog in the previous proteome (taxon X-1). Then, the identified ortholog from the proteome X-1 is used to look for the ortholog in the next proteome X-2, etc. The same strategy is followed from proteome X to proteome X+1, then X+2, ...
    If the tool is not sure that a found homolog is the ortholog, it is annotated as homolog; it may be the ortholog, but we cannot be sure of it. In that case, the last ortholog found is again used to look for new orthologs in the next proteome. The complete evolutionary path is covered using this strategy.

  • How does the "lock region" functionality work?
    When an ortholog sequence is found, ProteinPathTracker tries to map the locked region in it. To do so, the locked region has to cover at least the minimum %coverage (75% by default). To map the region in a sequence, ProteinPathTracker uses the coordinates of the last mapped locked region. If it cannot be found doing so, the mapping is done again using the initial coordinates from the query protein. Following this strategy we can "rescue" a region that was lost in any of the orthologs; for example, if one protein is a fragment and lacks the locked region, it can still be mapped in the following orthologs because the query protein will be used.
    The maximum length of the initial locked region is 100 amino acids.
    As the locked region is ultimately mapped using the coordinates in the query, ProteinPathTracker yields different results depending on the query sequence used for each execution. The locked region may be changed iteratively after an execution by selecting a query amongst the list of orthologs and a new set of coordinates in it.

Output: example

  • Overview

Query → UniProt:F1QYJ3.
Path 1 (default).
Lock region → from 20 to 40.

  • Results


  • In this example, the region between positions 20 and 40 of the SMN protein from D.rerio was locked (UniProt:F1QYJ3). It includes a binding site for GEMIN2 ("WDD..L" motif). Using the lock region strategy, it can be easily seen that all orthologous proteins contain such motif. To ease the interpretation of the results from the locked region mapping, a logo built out of such regions is provided (using the source code from WebLogo version 2.8.2 [http://weblogo.berkeley.edu/ || Crooks et al, 2004].

  • Below the logo of the locked region (if selected), a table is displayed with an overview of the different GO terms which appear in at least one protein. The columns of the table represent the different proteins (hover over them to know their taxa, organism, ID, if it's reviewed or not, and if it's homolog or ortholog). The GO terms are ordered by category: molecular function, cellular component and biological process. A filled cell means presence of the GO term in the protein.