Setup parameters or use the defaults
The key step of phylostratigraphic analysis is the search of orthologous genes. There are two methods implemented in Orthoweb.
1) KEGG Orthology Groups. They are the lists of orthologous genes constructed by KEGG researchers. You don't have to setup any thresholds of the sequence similarity.
2) Best Similarity Table. It is the list of genes with the parameters of similarity such as Smith-Waterman score and identity of sequences. You must setup the values of these parameters to filter orthologous genes.
KO groups contain the information about both orthologous and paralogous genes. Using the option "All genes" make Orthoweb to put every organism from KO group into analysis. It means that if KO group has, for example, 3 paralogous genes of one organism, their phylostratigraphic age will be the same. With option "Only same label" orthoweb will count organism only if it has gene in KO group with absolutely same label to analysed gene. The phylostratigraphic age of paralogous genes will be different but there is a chance to miss a lot of genes since not all of them has label in KO group.
If you are using BST method to detect the orthologous genes, you have to setup the values of two parameters
1) The identity of aminoacid sequences of protein coded by studied gene and potential orthologous gene. Only the genes with identity value higher than threshold counts as orthologous genes.
2) The Smith-Waterman score of comparing the sequences of studied gene and potential orthologous gene. Only the genes with SW-Score higher than threshold counts as orthologous genes.
Here you can setup two parameters:
1) First one is the distance from taxon of dN/dS analysis. It is known that this type of analysis mostly used only comparing the sequences of close related organisms.
The distance equal to 1 puts in dN/dS analysis only the organisms of same Genus.
For example, if we are analyze human genes, the value 2 means that we will check the dN/dS value of other organisms from Hominidae family.
2) The second field allows to put codes of specific species. For example if you want to compare the sequences of studied human gene not with all hominidaes but only with gorilla, you have to put "ggo" code in the field.
You can include the next options:
1) DI analysis. It's macroevolution analysis based on dN/dS.
It should be noted that even if you use KO groups to detect the phylostratigraphic age for the DI analysis BST thresholds are required.
2) GO analysis. Orthoweb will download the gene ontology associations of studied genes from http://geneontology.org/.
3) SNP analysis. Orthoweb will download the number of SNP in studied gene's sequences from NCBI dbsnp.
4) Online database. By default, the data used in analysis stores in the local database to significally decrease the time spent for analysis.
With this option you can download the newest data from online database.
There are three ways to input the genes into Orthoweb:
1) You can directly write the genes in the form using ' ; ' as separator.
2) You can upload the '.txt' file with genes. First line must be the header (with any text),
other lines must contains genes, one gene per line.
3) You can upload the '.txt' or '.tsv' file for network visualization. The first line must contains
the name of the columns. There has to be "node1" (or "#node1") and "node2" columns with gene labels
and edge column named with subword "score", containing the edge weight. Other columns will be ignored.
Each other line must contains gene labels (first two columns) and number in interval [0, 1]
for "score" column. All data must be separated by tabs.
You can also upload the file with expression data. Use the link above to see the examples or check the guide section.
The link "Get network visualization" visualize the data before analysis.
Get visualization