Input
A fasta file with one or more protein sequences.
Thresholds
[X] > Minimum number of identical residues in the polyX. Must be greater than half of the parameter "window length". Default = 8.
[Y] > Window length: minimum length of the polyX. Default = 10.
These thresholds result in parameter k, which is the maximum number of guest amino acids allowed in a window (k = [Y] - [X]). This parameter must be smaller than half of the window size. Otherwise, the execution will be halted and an error message will be triggered.
Depending on the selected thresholds, the script will locate different polyX. However, as they are by definition the minimum amount of identical residues and minimum window length, long pure polyX will be found irrespective of the threshold.
The selection of the thresholds is at the discretion of the user. For example, choosing parameters [X = 6] and [Y = 10] can lead to detect 'SESRSDVSSS' as a polyS region, which does not seem as a real polyS. We recommend using one of the following settings:
[X = 8] and [Y = 10] (default), to look for long polyX.
[X = 4] and [Y = 6], to look for short polyX. Long polyX will also be found.
Output
A file with the polyX regions found with the selected thresholds in the input file. Example for the protein HD_HUMAN (example 1), with default parameters:
| Start | End | Aa | +Aa | Aa/len | ID | polyX |
| 18 | 38 | Q | - | 21/21 | sp|P42858|HD_HUMAN | QQQQQQQQQQQQQQQQQQQQQ |
| 39 | 52 | P | LQ | 12/14 | sp|P42858|HD_HUMAN | PPPPPPPPPPPQLP |
| 63 | 78 | P | GQ | 13/16 | sp|P42858|HD_HUMAN | PQPQPPPPPPPPPPGP |
| 2633 | 2643 | E | WD | 9/11 | sp|P42858|HD_HUMAN | EEEWDEEEEEE |
Columns in the output file:
Start: starting coordinate of the polyX.
End: finishing coordinate of the polyX.
Aa: most prevalent amino acid in the polyX.
+Aa: other amino acids, apart from the most prevalent, in the polyX.
Aa/len: number of residues of the most prevalent amino acid versus polyX length.
ID: protein ID.
polyX: sequence of the polyX.
In the results page, there is also an overview table with the number of homorepeats found per amino acid.
|