The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. A gene product's biology is represented by three independent structured, controlled vocabularies: molecular function, biological process and cellular component. For more information on GO, see the SGD GO Help page or the GO consortium home page.To provide the most detailed information available, gene products are annotated to the most granular GO term(s) possible. For example, if a gene product is localized to the perinuclear space, it will be annotated to that specific term only and not the parent term nucleus. In this example the term perinuclear space is a child of nucleus. However, for many purposes, such as analyzing the results of microarray expression data, it is very useful to "calculate" on GO, moving up the GO tree from the specific terms used to annotate the genes in a list to find GO parent terms that the genes may have in common.
This GO Term Finder tool allows you to do this - It finds significant GO terms shared among a list of genes from your organism of choice, helping you discover what these genes may have in common (example results for SGD and a simple query list). To map granular GO annotations for genes in a list to more general terms binning them into broad categories, please use the GO Term Mapper tool.
multitest/NODE1X.list multitest/NODE2X.txt multitest/run2/NODE1X.listBy default all files will be processed. If the archive contains other files, specify the file name extension of the gene list files (for example 'txt' or 'list') in the advanced options section.
To create an archive using tar (most commonly found on UNIX or MacOS X), you could do something like this:
tar -cf myList.tar orfs1.txt orfs2.txt ... or tar -cf myList.tar *.txt or tar -cf myList.tar cluster_orfs_dir
On Windows, use an archive utility such as WinZip to create a .zip or .tar file. Create a new archive file and just drag the files or directories into it that you wish to submit.
Once you have created the .tar or .zip file, simply hit "Browse" and select it as the file to upload. Note that the extension (.tar, .zip, etc.) must correctly match the file type in order for the server to properly process the file.
The table below lists the types of identifiers in the gene association files that the GO Term Finder program can currently accept for gene names. It also provides links to tools that help you to convert from one identifier system to another, so that if you need to, you can convert your identifiers into different types of identifiers in the gene association files that can be used by the GO Term Finder.
For example, if you have a file of gene identifiers that are a different type than those listed in the table, you can use a tool provided by UniProt that converts several types of identifiers into UniProt_IDs/Accession. You can then use the gene association files that take UniProt_IDs/Accession as input of list of gene names.
GO Term Finder looks for significant GO terms shared among groups of genes in your list of input genes (see table below). To determine the statistical significance of a particular GO term associated with a group of genes in the list, GO Term Finder calculates the p-value - the probability or chance of seeing at least x number of genes out of the total n genes in the list annotated to a particular GO term, given that y number of genes out of the total N genes within the genome known to have that GO term annotation (i.e. given the background distribution). The closer the p-value is to zero, the more significant the particular GO term associated with the group of genes is (i.e. the less likely the observed annotation of the particular GO term to a group of genes occurs by chance).
Terms from the Function Ontology for Different Mouse Gene Numbers with P-value Cutoff of 0.01 | |||||||
---|---|---|---|---|---|---|---|
Gene Ontology Term | Cluster Frequency | Genome Frequency of Use | P-value | Genes Annotated to the Term | |||
calcium-transporting ATPase activity | 3 out of 9 genes (33.3%) | 5 out of 33884 genes (0.0%) | 2.46e-09 | MGI:105368, MGI:1347353, MGI:1889008 | |||
ATPase activity | 3 out of 9 genes (33.3%) | 237 out of 33884 genes (0.7%) | 0.00052 | MGI:105368, MGI:1347353, MGI:1889008 | |||
carrier activity | 3 out of 9 genes (33.3%) | 410 out of 33884 genes (1.2%) | 0.00265 | MGI:105368, MGI:1347353, MGI:1889008 | |||
calcium-transporting ATPase activity | 3 out of 9 genes (33.3%) | 5 out of 15000 genes (0.0%) | 2.83e-08 | MGI:105368, MGI:1347353, MGI:1889008 | |||
ATPase activity | 3 out of 9 genes (33.3%) | 237 out of 15000 genes (1.6%) | 0.00579 | MGI:105368, MGI:1347353, MGI:1889008 | |||
carrier activity | - | - | - | - |
The p-value of a GO term associated with a group of genes in your gene list is affected by the total number of genes estimated for an organism. The higher the total number of genes estimated for the organism, the closer the p-value is to zero and the more significant the particular GO term annotation to the group of genes in the list (see table above, compare respectively rows 1, 2 and 3 with rows 4, 5 and 6). For example, as shown in the table above, when searching the function ontology with a p-value cutoff of 0.01, no significant 'carrier activity' GO term was found for the list of 9 mouse genes for the specified 15,000 total mouse genes (row 6, due to a p-value above the p-value cutoff of 0.01)), while 3 genes out of the 9 genes in the list annotated to the 'carrier activity' GO term were found for the estimated 33,884 total mouse genes (row 3) with a p-value = 0.00265, which is still below the p-value cutoff of 0.01. Thus, though the same number of mouse genes (410) within the mouse genome annotated to the 'carrier activity' GO term, the higher total number of genes (33,884 versus 15,000) estimated for the mouse lowers the frequency the term used to annotate genes in the entire mouse genome, thereby yields the lower p-value for the group of genes in the list annotated to the 'carrier activity' GO term.
The p-value of a GO term associated with a group of genes in your gene list is also affected by the number of genes within an organism having that GO term annotation. The higher the number of genes within the organism with a particular GO term annotation that a group of genes in the list have, the further the p-value is to zero and the less significant the particular GO term is associated with the group of genes in the list. For example, as shown in the table above, though the same 3 mouse genes in the list are annotated to the 'calcium- transporting ATPase activity' (row 1) and 'carrier activity' (row 3) GO terms, the 'calcium-transporitng ATPase activity' GO term associated with the 3 mouse genes is more significant (i.e. lower p-value) than the 'carrier activity' GO term associated with those same 3 mouse genes, due to higher number of genes within the mouse genome annotated to 'carrier activity' GO term.
For more information on how GO Term Finder determines the statistical significance of GO terms annotation, please see the Description of GO Term Finder Algorithm at SGD or How GO Term Finder Calculates P-values (also available in PDF ).
Gene Association File Table lists the total annotated gene products and total estimated gene products for each organism. If the total estimated gene number of an organism is known, the GO Term Finder program's default total gene number for the organism is the organism's total estimate gene number. If not, the GO Term Finder program will use the total number of annotated genes existed in the organism's gene association file as the default total gene number.
If you prefer to use a different total gene number for an organism in the background distribution calculation of GO terms, you can type the number of gene products you estimate for the organism in the provided text box to override the program's default total gene number for the organism. However, if the gene number you entered is smaller than the total number of annotated genes existed in the organism's gene association file, the GO Term Finder program will not use the gene number you entered but uses the program's default total gene number for the organism.
Instead of providing a number, you can upload a background
distribution of gene products. In this case, the total number of genes
will be the number in the background distribution. This option is
useful if you need to find significant terms within a smaller context
than that of the entire population of genes with annotation.
The FDR is calculated by running 50 sumulations with random genes, and counting the average number of times a p-value as good as or better than a p-value generated from the real data is seen. This is used as the numerator. The denominator is the number of p-values in the real data that are as good as or better than it.
Thus, instead of setting your cutoff based on p-value, the FDR allows you to choose a cutoff that
has an acceptable level of false discovery. FDR is the percentage of the GO
terms with p-values as good as or better than a particular GO term with this FDR would be expected to be
false positives. False positives are the expected number of false positives if that particular annotated
GO term is chosen as the cutoff.
With this option checked, terms that are related by regulation (and
possibly in no other way) are also included in the search, in just
the same way as the traditional links:
is_a:
relationship: part_of
These links are always followed.
Gene Association File Table lists the organism default gene URLs used by the GO Term Finder program.
For example, 'https://www.yeastgenome.org/locus/xxxx' is the GO Term Finder program's default gene URL for Saccharomyces cerevisiae, where xxxx is a SGD_ID, SGD gene name, or SGD systematic ORF name (e.g. https://www.yeastgenome.org/locus/YPL250C).
You can also use the provided text box to enter the gene or protein URL you know displaying the type of
information about a gene or protein you prefer over the type of information displaying by the GO Term Finder program's
default URL. For example, 'http://www.ensembl.org/Homo_sapiens/geneview?gene=xxxx' is the GO Term Finder program's
default gene URL for human protein, where xxxx is a UniProt_ID or UniProt_Accession (e.g.
http://www.ensembl.org/Homo_sapiens/geneview?gene=SX30_HUMAN). If you prefer to use the UniProt (the Universal
Protein Resource) URL 'http://www.uniprot.org/cgi-bin/upEntry?id=' to display information about a protein
(e.g.
http://www.uniprot.org/cgi-bin/upEntry?id=SX30_HUMAN), you can type the UniProt URL in the provided text box
to override the program's default URL.
multitest/NODE1X.list multitest/NODE1X.pcl multitest/NODE2X.txt multitest/run2/NODE1X.list multitest/run2/NODE1X.pclThen entering 'list' will cause only those files to be processed, ignoring the 'pcl' and 'txt' files.
In general, the ontology and gene association files are downloaded nightly from GO FTP site. Occasionally, there may be a problem with a particular file causing a delay in updating it. For example, sometimes an association file does not conform exactly to our understanding of the specification. In that case, the file is removed from the annotation selection pop-up menu, and a notice is printed below the pop-up menu, until the situation is resolved. There may be other reasons for a delay in updating a particular file.
The tables below show the version, GOC validation dates (where available and applicable), and other information for files that are currently in use.
GO Term Finder Ontology Files | Version |
---|---|
go-basic.obo | releases/2025-02-06 |
GO Term Mapper Ontology Files | Version |
goslim_generic.obo | go/2025-02-06/subsets/goslim_generic.owl |
goslim_yeast.obo | go/2025-02-06/subsets/goslim_yeast.owl |
goslim_pombe.obo | |
goslim_plant.obo | |
goslim_chembl.obo | |
goslim_goa.obo | 1.854 |
Organism, Gene Associations, and Authority |
Total Annotated Gene Products |
Total Estimated Gene Products |
Identifiers | Example IDs | Identifier Conversion Tool(s) | Evidence Code Counts |
---|---|---|---|---|---|---|
Skin parasite - Leishmania major L. major GeneDB gene_association.GeneDB_Lmajor README | 4132 | Systematic_ID Systematic_ID | L302.10 L2256.04 LM5.39 sample list | EXP(41) IDA(216) IPI(45) IMP(104) IGI(26) IEP(2) ISS(159) ISO(8697) ISA(183) ISM(184) IGC(1) RCA(46) TAS(8) IC(4) IEA(4) | ||
Trypanosome - Tryanosoma brucei T. brucei GeneDB gene_association.GeneDB_Tbrucei README | 6301 | Systematic Name Gene Name Gene Synonym | Tb927.7.4670 RRP4 TB927.7.4670 sample list | EXP(124) IDA(10768) IPI(539) IMP(1061) IGI(47) IEP(16) ISS(460) ISO(465) ISA(955) ISM(3598) RCA(1132) TAS(581) NAS(4) IC(47) | ||
Default URL template: http://www.genedb.org/genedb/Search?organism=tryp&name= | ||||||
Candida - Candida albicans CGD gene_association.cgd README | 60627 | CGD_ID Standard Name Systematic name | CAL0004982 CaO19.6783 CA5922 Contig4-2621_0008 orf6.8848 sample list | IDA(3520) IPI(77) IMP(6549) IGI(975) IEP(48) ISS(1880) ISO(343) ISA(164) ISM(1328) TAS(47) NAS(169) IC(23) ND(16441) IEA(314263) | ||
Default URL template: http://www.candidagenome.org/cgi-bin/locus.pl?locus= | ||||||
Slime mold - Dictyostelium discoideum DictyBase gene_association.dictyBase | 9687 | 12098 | DictyBase_ID Gene Name Alias | DdP2X DDB_G0272004 p2xA sample list | IDA(4365) IPI(1244) IMP(3277) IGI(572) IEP(230) ISS(3462) IGC(79) TAS(449) NAS(6) IC(145) ND(6242) IEA(37715) | |
Default URL template: http://dictybase.org/db/cgi-bin/dictyBase/locus.pl?locus= | ||||||
Fruit fly - Drosophila melanogaster FlyBase gene_association.fb README | 14789 | 16085 | FlyBase_ID Gene Symbol Gene Synonym | FBGN0031491 alpha4GT1 4-N-acetylgalactosaminyltransferase-1 CG17223 alpha1 sample list | EXP(4) IDA(21114) IPI(3977) IMP(25359) IGI(4226) IEP(793) ISS(12234) ISO(3) ISA(34) ISM(3193) IGC(29) TAS(2563) NAS(1512) IC(1504) ND(7543) IEA(19063) | |
Default URL template: http://flybase.bio.indiana.edu/.bin/fbidq.html? | ||||||
Chicken - Gallus gallus GOA @EBI gene_association.goa_chicken README | 18363 | 30837 | UniProt_Accession (or Ensembl_ID) UniProt_ID (or Ensembl_ID) International Protein Index | FGB IPI00588322 FIBB_CHICK Q02020 sample list | EXP(17) IDA(1891) IPI(459) IMP(769) IGI(21) IEP(172) ISS(6523) ISO(70) ISA(326) ISM(23) RCA(3) TAS(637) NAS(123) IC(18) ND(30) IEA(105033) | |
Cow - Bos taurus GOA @EBI gene_association.goa_cow README | 25962 | 37225 | UniProt_Accession (or Ensembl_ID) UniProt_ID (or Ensembl_ID) International Protein Index | FGG P12799 IPI00699860 FIBG_BOVIN sample list | EXP(22) IDA(1982) IPI(654) IMP(277) IGI(9) IEP(6) ISS(21927) ISO(18) ISA(138) RCA(2) TAS(774) NAS(89) IC(83) ND(46) IEA(133928) | |
Human - Homo sapiens GOA @EBI gene_association.goa_human README | 44695 | UniProt_Accession (or Ensembl_ID) UniProt_ID (or Ensembl_ID) International Protein Index | TGFR1_HUMAN IPI00005733 P36897 TGFBR1 sample list | EXP(1101) IDA(129053) IPI(264407) IMP(27326) IGI(2540) IEP(925) ISS(32035) ISO(519) ISA(1439) ISM(3) RCA(471) TAS(99400) NAS(18577) IC(1520) ND(310) IEA(129878) | ||
Default URL template: http://www.ensembl.org/Homo_sapiens/geneview?gene= | ||||||
Human - Homo sapiens GOA @EBI + Ensembl gene_association.goa_human_ensembl README | 19817 | UniProt_Accession (or Ensembl_ID) UniProt_ID (or Ensembl_ID) International Protein Index with additional crossreferenced gene symbols | FZD6 B4DRN0_HUMAN ENSG00000164930 B4DRN0 sample list | EXP(613) IDA(88710) IPI(213005) IMP(25206) IGI(2047) IEP(933) ISS(28265) ISO(11) ISA(1439) ISM(3) RCA(470) TAS(95949) NAS(7152) IC(6837) ND(1706) IEA(73570) | ||
Default URL template: http://www.ensembl.org/Homo_sapiens/geneview?gene= | ||||||
Human - Homo sapiens GOA @EBI + XREFs gene_association.goa_human_hgnc README | 19817 | UniProt_Accession (or Ensembl_ID) UniProt_ID (or Ensembl_ID) International Protein Index with additional crossreferenced gene symbols | HGNC:4854 FZD6 O60353 HGNC:4044 4044 FZD6_HUMAN sample list | EXP(613) IDA(88710) IPI(213005) IMP(25206) IGI(2047) IEP(933) ISS(28265) ISO(11) ISA(1439) ISM(3) RCA(470) TAS(95949) NAS(7152) IC(6837) ND(1706) IEA(73570) | ||
Default URL template: http://www.genenames.org/data/hgnc_data.php?hgnc_id= | ||||||
Mouse - Mus musculus MGI gene_association.mgi README | 31815 | MGI_ID Gene Symbol Gene_Symbol (old) | P2ry12 MGI:1918089 P2Y12 sample list | EXP(1191) IDA(78074) IPI(24701) IMP(65903) IGI(15785) IEP(2337) ISS(37419) ISO(151269) ISA(4764) ISM(32) RCA(292) TAS(7964) NAS(4687) IC(648) ND(45145) IEA(84700) | ||
Default URL template: http://www.informatics.jax.org/searches/accession_report.cgi?id= | ||||||
Yeast - Schizosaccharomyces pombe PomBase gene_association.pombase README | 5439 | Systematic Name Gene Name Gene Synonym | SPCC191.07 cyc1 sample list | EXP(2480) IDA(8910) IPI(2700) IMP(4844) IGI(698) IEP(7) ISS(861) ISO(2741) ISM(1104) TAS(394) NAS(460) IC(1618) ND(1799) IEA(2142) | ||
Default URL template: http://www.pombase.org/gene/ | ||||||
Pseudomonas - Pseudomonas aeruginosa PAO1 PseudoCAP gene_association.pseudocap | 1503 | PA# Gene Name Alt. Gene Name (opt.) | fliD PA1094 hook-associated protein sample list | EXP(40) IDA(864) IPI(42) IMP(1136) IGI(61) IEP(12) ISS(1225) ISO(13) ISA(10) IGC(48) TAS(11) NAS(18) | ||
Default URL template: http://www.pseudomonas.com/AnnotationByPAU.asp?PA= | ||||||
Rat - Rattus norvegicus RGD gene_association.rgd README | 23016 | RGD_ID (or Ensembl Id, or UniProt accession) Gene Symbol (or UniProt Entry Name) if GOA-provided, an International Protein Index identifier | Fgb D3Z8Y5_RAT D3Z8Y5 IPI00948614 sample list | EXP(496) IDA(28929) IPI(8138) IMP(10901) IGI(338) IEP(10574) ISS(28636) ISO(214846) RCA(5) TAS(3231) NAS(748) IC(163) ND(1917) IEA(105147) | ||
Default URL template: http://rgd.mcw.edu/tools/genes/genes_view.cgi?id= | ||||||
Yeast - Saccharomyces cerevisiae SGD gene_association.sgd README | 6915 | 7166 | SGD_ID Gene Name Systematic ORF Name | YJL166W S000003702 COR5 QCR8 sample list | EXP(11) IDA(18589) IPI(3082) IMP(14294) IGI(5342) IEP(23) ISS(939) ISO(15) ISA(251) ISM(438) RCA(580) TAS(252) NAS(82) IC(1025) ND(3538) IEA(49652) | |
Default URL template: http://www.yeastgenome.org/locus/ | ||||||
Common wallcress - Arabidopsis thaliana TAIR gene_association.tair README | 43761 | TAIR Accession Gene Name Gene Alias | AT4G31210 AT4G31210.1 LOCUS:2128101 F8F16.30 F8F16_30 sample list | IDA(22775) IPI(25163) IMP(17992) IGI(4224) IEP(4910) ISS(8218) ISM(37753) RCA(868) TAS(6654) NAS(710) IC(184) ND(26567) IEA(11904) | ||
Default URL template: http://www.arabidopsis.org/servlets/Search?type=general&search_action=detail&method=1&show_obsolete=F&sub_type=gene&SEARCH_EXACT=4&SEARCH_CONTAINS=1&name= | ||||||
Worm - Caenorhabditis elegans WormBase gene_association.wb README | 14688 | 22246 | Protein Name Gene Name Gene Symbol | casy-1 B0034.3 cdh-11 WBGENE00000403 sample list | EXP(3) IDA(7741) IPI(4273) IMP(10389) IGI(4616) IEP(169) ISS(1928) ISO(1) ISM(9) RCA(13) TAS(174) NAS(180) IC(120) ND(429) IEA(60744) | |
Default URL template: http://www.wormbase.org/db/gene/gene?name= | ||||||
Zebrafish - Danio rerio ZFIN gene_association.zfin README | 24126 | 25849 | ZFIN_ID Gene Symbol | ZDB-GENE-030131-6506 mobkl1b sample list | IDA(4508) IPI(1232) IMP(20553) IGI(5555) IEP(154) ISS(8206) ISO(13) ISM(1) TAS(111) NAS(127) IC(111) ND(7286) IEA(141537) | |
Default URL template: http://zfin.org/cgi-bin/webdriver?MIval=aa-markerview.apg&OID= |
Included in the above table are augmented association files which contain additional synonyms from other sources. These files and their sources are:
Augmented Association File | Source of Additional Synonyms |
---|---|
goa_human_hgnc |
Human
Xrefs from EBI
Integr8 from EBI |
Please note that the additional synonyms may
result in greater ambiguity of terms.
For background and description of GO-TermFinder, please see Boyle et al, Bioinformatics (2004)
Please cite the original manuscript for GO-TermFinder (the perl module providing the core analysis methods used by this tool):
"GO::TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes." Boyle et al, Bioninformatics (2004)
which can be accessed at PubMed here.Please also include the URL for GOTermFinder or the Princeton GO home page:
|