logo PolyX2 is a web tool to search for homorepeats in a given protein dataset. It processes ~500 proteins per second. File upload is limited to 50 Mb (~2 minutes running time). For bigger datasets we recommend to use the standalone version of the script (see below).

Mier P and Andrade-Navarro MA. PolyX2: fast detection of homorepeats in large protein datasets. Genes 13(2022), 758. PMID:35627143.


EXECUTION

Upload a file with one or more protein sequence/s, in fasta format

or paste the sequence/s here: [example1: HD_HUMAN] [example2: SARS-CoV-2 complete proteome]



Minimum number of identical residues in a local window of amino acids.

Will search homorepeats from amino acids:























Your results will be available in:


PRECOMPUTED

Here you have some precomputed datasets and their homorepeats (default parameters).

DatasetResults
Drosophila melanogaster proteomelogo
Homo sapiens proteomelogo
Isoforms (UniProt v2020_06)logo
SwissProt (UniProt v2020_06)logo


DOWNLOAD

You can alternatively download the source code and run it locally. All possible polyX will be searched for by default.


HELP

Input

A fasta file with one or more protein sequences.

Thresholds

  • [X] > Minimum number of identical residues in the polyX. Must be greater than half of the parameter "window length". Default = 8.
  • [Y] > Window length: minimum length of the polyX. Default = 10.

    These thresholds result in parameter k, which is the maximum number of guest amino acids allowed in a window (k = [Y] - [X]). This parameter must be smaller than half of the window size. Otherwise, the execution will be halted and an error message will be triggered.

    Depending on the selected thresholds, the script will locate different polyX. However, as they are by definition the minimum amount of identical residues and minimum window length, long pure polyX will be found irrespective of the threshold.

    The selection of the thresholds is at the discretion of the user. For example, choosing parameters [X = 6] and [Y = 10] can lead to detect 'SESRSDVSSS' as a polyS region, which does not seem as a real polyS. We recommend using one of the following settings:

  • [X = 8] and [Y = 10] (default), to look for long polyX.
  • [X = 4] and [Y = 6], to look for short polyX. Long polyX will also be found.

    Output

    A file with the polyX regions found with the selected thresholds in the input file. Example for the protein HD_HUMAN (example 1), with default parameters:

    StartEndAa+AaAa/lenIDpolyX
    1838Q-21/21sp|P42858|HD_HUMANQQQQQQQQQQQQQQQQQQQQQ
    3952PLQ12/14sp|P42858|HD_HUMANPPPPPPPPPPPQLP
    6378PGQ13/16sp|P42858|HD_HUMANPQPQPPPPPPPPPPGP
    26332643EWD9/11sp|P42858|HD_HUMANEEEWDEEEEEE

    Columns in the output file:

  • Start: starting coordinate of the polyX.
  • End: finishing coordinate of the polyX.
  • Aa: most prevalent amino acid in the polyX.
  • +Aa: other amino acids, apart from the most prevalent, in the polyX.
  • Aa/len: number of residues of the most prevalent amino acid versus polyX length.
  • ID: protein ID.
  • polyX: sequence of the polyX.

    In the results page, there is also an overview table with the number of homorepeats found per amino acid.