This page contains basic information about FastaHerder2, and its four usage modes.
FastaHerder2 is a web resource based on the clustering of similar sequences (in length and in %identity). It allows the user to...
...cluster a set of sequences (mode 1).
...co-cluster a sequence to a previously-clustered database (mode 2). Examples: ABC2_SCHPO, CYSJ_ECOLI, DCR1_SCHPO.
...find sequence and the clusters it belongs to using an AC or an ID (mode 3). Examples: P53_HUMAN, Q9BZZ5, ADA_BOVIN.
...search clusters using a combination of selected annotations (mode 4). Examples: example 1, example 2.
The pre-clustering step reduces the complexity of the protein database, easing the interpretation of the results of a sequence similarity search.
|MODE 1: CLUSTER|
To use FastaHerder2 mode 1: CLUSTER, you have to upload a file of protein sequences in fasta format, or paste the sequences in the available text area. More than one sequence must be submitted to start the execution of FastaHerder2. The input file can be up to 2 Mb; if it's bigger, you will get an error. If you want to cluster bigger files, please contact us and we will provide you with a solution.
Once we have your sequences, FastaHerder2 will run upon clicking on the buttom GO.
The threshold tolerance parameter controls the stringency of the clustering. FastaHerder2 clusters near-full length homologs allowing for lower sequence identity thresholds. Longer sequences could be clustered together with larger differences in length. This will have the effect of increassing the compression. The parameter's value depends on the query's length:
FastaHerder2 will cluster the input sequences to produce three different output files, that will appear in the results' page:
Results also provide information about the number of initial sequences, clusters and reclusters, and %compression of the initial set of sequences.
|MODE 2: CO-CLUSTER|
To use FastaHerder2 mode 2: CO-CLUSTER, you have to upload a file with one sequence in fasta format, or paste it in the available text area. Once we have your sequences, FastaHerder2 will run upon clicking on the buttom GO.
FastaHerder2 will co-cluster the input sequence to a previously-clustered database. It finds the most appropriate cluster for the input sequence in each of the available clustered databases. As of July 2015, we have available a clustered version of SwissProt (release 2015_05) along with 50 complete reference proteomes.
First of all, to summarize the results it presents a heatmap featuring all positional annotations concerning domains of the leaders from all of the clusters found. It therefore provides information about the domain architecture of a protein in different taxonomic groups. Next to the leader is written the organism it belongs, colored depending on its taxonomic group:
Then, the results' page features the information drawn from each cluster the input sequence can belong to. If displayed (clicking on the cluster), the shown information is:
|MODE 3: FIND SEQUENCE IN CLUSTERS|
To use FastaHerder2 mode 3: FIND SEQUENCE IN CLUSTERS, you have to write any UniProt AC or ID. FastaHerder2 will run upon clicking on the buttom GO. It will search the clustered databases (see mode 2) using the query identifier.
The results' page features the information drawn from each cluster the input sequence belongs to. If displayed (clicking on the database), the shown information is:
|MODE 4: SEARCH CLUSTERS|
To use FastaHerder2 mode 4: SEARCH CLUSTERS, you must select at least one annotation from the available to restrict the search. The default selection is DM "doesn't mind", but it can be YES (at least one sequence from the cluster must have that annotation) or NO (the cluster must not have any sequence with that annotation).
Clusters are built from SwissProt release 2015_05 (see mode 2). The restricted search allows the user to locate the complete set of clusters that match the selected restrictions. The available annotations to select are:
In the first section of the results, search settings are shown. It features the selected restrictions. The information drawn from each cluster that matches the restrictions is shown. If displayed (clicking on the cluster's leader), the shown information is the same as in mode 3. The user can also display the whole cluster. If there are more than 200 results, they are not displayed. The user should then restrict more the search, to obtain fewer results.
|ABOUT US| CONTACT| HELP!|