To facilitate the in-depth analysis of the orthologous relationships within the groups of proteins, we provide precomputed high-quality Multiple Sequence Alignments (MSA) and maximum-likelihood trees via the web interface. |
Multiple Sequence Alignment (MSA) are built using the full length protein sequences of each protein for each orthologous groups. To deliver the highest quality MSA we computed the alignment using the AQUA software [PMID: 19926669] setup with Muscle (v3.7) [PMID: 15034147] and MAFFT (6.611) [PMID: 18372315] as aligners, RASCAL (1.34b) [PMID: 12801878] to refine the alignment and norMD (1.3) [PMID: 11734009] to assess the MSA quality. Gblocks was used to remove the badly aligned regions (using the default settings, except for the following: Minimum Number Of Sequences For A Flanking Position = Half of the total number of sequences in the MSA, Minimum Length Of A Block = 2; Allowed Gap Positions = all). PAUP was then used to remove the uninformative site. All of these steps aims at reducing the MSA complexity to keep the essential phylogenetic signal. |
Thus, a phylogenetic tree was computed using the PhyML (v3) program [PMID: 20525638]. The parameters were set to compute 100 bootstrap replicates and to optimize topology, branch lengths and rate parameters. In general, sequence format conversion (e.g. MSA in fasta format to phylip format) to allow crosstalk between those programs was done using the ReadSeq program [PMID: 18428689]. MSA visualization is available using the web interface thanks to the Jalview applet [PMID: 19151095] and the phylogeny can be visualized thanks to the iTOL [PMID: 17050570 ] webserver.
|
|