logo Motifs from Annotated Groups in Alignments

MAGA is a new simpler way to infer information from alignments. Based on a regular procedure usually done visually by experts, it looks for group-conserved residues in an alignment. Provided an alignment in FASTA format and a way to group the sequences in it, MAGA seeks to access an unexplored layer of information difficult to appreciate visually when many sequences are aligned, even for a trained bioinformatician.

Make Alignments Great Again!


Input alignment. To start the execution of MAGA, we only need a sequence alignment in FASTA format. Paste or upload it.






In a first step, MAGA will process the alignment and show a profile of shared amino acids in all the sequences. Then, you will be able to iteratively group the sequences as you wish. As an example, in a first iteration, you can group the sequences by taxonomy (group 1 = mammals, group 2 = rodents, ...); in a second iteration, the sequences can be grouped by a phenotipic feature evolved by convergent evolution (group 1 = all sequences from organisms with wings, group 2 = sequences from organisms without wings). You decide the reason for the grouping! Only group-conserved amino acids will be showcased in each case. Previous results are kept in the results page to let you compare the results from different iterations.

Now, click on the next button and let's start analyzing the alignment


HOW DOES IT WORK?

logo In this example we illustrate the use of MAGA, and the criteria we used to consider if a residue is conserved or not. We used as input query a multiple alignment of six simple sequences with seven alignment positions. In a first iteration, MAGA produces a profile with the shared residues from all input sequences ("Initial output").

Position 1All residues are the same (“M”) → Conserved residue in all groups (in green).
Position 4When present, all residues the same (“N”), and the [number of “N” > number of gaps] → Conserved residue in all groups (in green).

Then, in this example we iteratively chose to cluster five times the sequences in two groups, with different components. It is possible to cluster the sequences in up to six groups. As a result, in each iteration MAGA produced a profile with both the shared residues from all input sequences (in green) and group-conserved residues (colored depending on its group).

To consider a residue in a position as conserved in a group:
  • There must be an amino acid which is more prevalent than any of the other amino acids for that group in that position. See grouping 3 position 7 group 1 for conserved (2C vs 1L) following this checking and grouping 2 position 7 group 1 for not conserved (1C vs 1L).
  • The most prevalent amino acid must be more prevalent than the gaps in that group in that position. See grouping 3 position 5 group 2 for conserved (2R vs 1gap) following this checking and grouping 4 position 5 group 2 for not conserved (1R vs 1gap).
  • The most prevalent amino acid must represent more than 50% of all amino acids of that group in that position. Depending on the percentage of conservation of a residue in a group, it is shown in the profile in three different ways:
    • 50% < Conservation ≤ 75% → Colored. See grouping 4 position 7 group 1 (3C vs 1L).
    • 75% < Conservation < 100% → Colored and italics. See grouping 5 position 7 group 1 (4C vs 1L).
    • Conservation = 100% → Colored and bold. See grouping 1 position 7 group 1 (1C).
The results produced by MAGA in this case are simple to reproduce visually, but try with a more complex query. We envision MAGA to be used with a multiple alignment with >20 sequences and >200 aligned residues. The best thing about this, is that in a following iteration you could group the sequences differently, and new group-conserved residues may appear. Try it!



EXAMPLE ALIGNMENT

To test MAGA with a case study, we prepared a multiple sequence alignment of 34 sequences from the Argonaute protein family, from three subfamilies: AGO, PIWI and CE. CE is an Argonaute Caenorhabditis elegans-specific subfamily, equally distant from the AGO and PIWI subfamilies. We suggest you check which residues are shared between the pairs [CE]+[AGO], [PIWI]+[AGO], [CE]+[PIWI], and all of them together. To do so, we recommend you:

  1. Copy the alignment (or upload the file) into the "input alignment" textbox above. Then, click, "Start execution".
  2. In the results, all sequences are taken as part from the same group. It shows very minimal residues shared between all sequences (in green).
  3. Group the sequences in three groups, one per subfamily. The ones starting with "CE", place them in group 1; with "PIWI" in group 2; with "AGO" in group 3. And label the groups with "CE", "PIWI" and "AGO" respectively. Then, click, "Execute again!".

Now you can analyze the results attending at the color code (red for shared residues in group 1, blue for group 2, purple for group 3).
Of course, you can tweak the groups as desired, for example by organism. That would mean placing all C. elegans proteins in one group, all Drosophila melanogaster proteins in a different one, and so on.