Downloads

Gene set to diseases data of gene-disease associations used for the enrichment analysis can be downloaded for free for non commercial use with requirement to cite it in related publications as "Fontaine JF and Andrade-Navarro MA (2016). Gene Set to Diseases (GS2D): Disease Enrichment Analysis on Human Gene Sets with Literature Data. Genomics And Computational Biology, 2(1):e33. doi:10.18547/gcb.2016.vol2.iss1.e33" (please see other citation format on the journal web page). Available files:
  • Data computed on 2015-11-16 (63K significant gene-disease associations, filtered by at least 3 co-occurrences and FDR<0.05)
  • Data computed on 2016-04-06 (66K significant gene-disease associations, filtered by at least 3 co-occurrences and FDR<0.05)
  • Data computed on 2017-03-01 (269150 gene-disease associations, not filtered for significance but at least 2 co-occurrences)
Contact to get a download link:
  • Fontaine (at) uni-mainz.de
File format is tab-separated values with the following columns:
  • gene_id: Entrez Gene ID of the gene.
  • symbol: official gene symbol linked to Entrez Gene
  • name: disease term from the MeSH vocabulary
  • count_name_in_gene_set: number of gene-related citations associated with the disease in the literature
  • count_gene_set: number of gene-related citations
  • count_name_in_pubmed_set: number of disease-related citations
  • count_pubmed_set: total number of citations
  • mesh_tree: comma-separated list of MeSH tree numbers associated to the disease
  • pmids: comma-separated list of PMIDs of relevant citations in the literature.
  • fold_change: (number of gene-related citations associated with the disease in the literature / number of gene citations) / (number of disease-related citations / total number of citations)
  • p_value: P-value computed by a Fisher's exact test
  • fdr: False Discovery Rate computed by Benjamini Hochberg method

Comments are closed.