HELP Area
eggNOG uses a relational database system (PostgreSQL) to store primary data and precomputed orthologous groups.
General Search
Search types: Protein names, COG names, Protein sequences (BLAST) and Full text
Search format: Full text, protein name and COG name searches can be mixed with multiple search terms. The search terms must be on individual lines within the search box. The search box expands up to 20 lines. Sequence searches can either be a single sequence over multiple lines or multiple sequences in fasta format.
Comments: Please note that multiple searches can take time and that sequence searches use our BLAST server and can take up to 10 minutes depending on the load and sequence size. Non sequence searches use human as the default species, but can be changed on the result page.
File Search
Search types: Protein names, COG names, Protein sequences (BLAST) and Full text
Search format: Full text, protein name and COG name searches can be mixed with multiple search terms and taxonomic ids. The search terms must be on individual lines within the uploaded file with one search term and taxonomic id pair per line and the search term and taxonomic id TAB seperated. Sequence searches must be in fasta format and are not taxonomic specific.
Comments: Please note that multiple searches can take time and that sequence searches use our BLAST server and can take up to 10 minutes depending on the load and sequence size. Please do not upload fasta file with more than 30 records.
Multiple Sequence Alignment and Tree Method
To facilitate the in-depth analysis of the orthologous relationships within the groups of proteins, we provide precomputed high-quality Multiple Sequence Alignments (MSA) and maximum-likelihood trees via the web interface.
Multiple Sequence Alignment (MSA) are built using the full length protein sequences of each protein for each orthologous groups. To deliver the highest quality MSA we computed the alignment using the AQUA software [PMID: 19926669] setup with Muscle (v3.7) [PMID: 15034147] and MAFFT (6.611) [PMID: 18372315] as aligners, RASCAL (1.34b) [PMID: 12801878] to refine the alignment and norMD (1.3) [PMID: 11734009] to assess the MSA quality. Gblocks was used to remove the badly aligned regions (using the default settings, except for the following: Minimum Number Of Sequences For A Flanking Position = Half of the total number of sequences in the MSA, Minimum Length Of A Block = 2; Allowed Gap Positions = all). PAUP was then used to remove the uninformative site. All of these steps aims at reducing the MSA complexity to keep the essential phylogenetic signal.
Thus, a phylogenetic tree was computed using the PhyML (v3) program [PMID: 20525638]. The parameters were set to compute 100 bootstrap replicates and to optimize topology, branch lengths and rate parameters. In general, sequence format conversion (e.g. MSA in fasta format to phylip format) to allow crosstalk between those programs was done using the ReadSeq program [PMID: 18428689]. MSA visualization is available using the web interface thanks to the Jalview applet [PMID: 19151095] and the phylogeny can be visualized thanks to the iTOL [PMID: 17050570 ] webserver.