Load the data from the file

library(DESeq2)
library(dplyr)
de_status <- readRDS("Robjects/de_status.rds")

Convert to a data frame and order by p-value

results_ordered <- as.data.frame(results(de_status)) %>% 
  tibble::rownames_to_column("SYMBOL") %>% 
  arrange(padj)

Check the table…

head(results_ordered)

Use the annotation package to get pathway IDs

# to get names of keys
# keytypes(org.Hs.eg.db)
# to see what you can retrieve
# columns(org.Hs.eg.db)
library(org.Hs.eg.db)
Loading required package: AnnotationDbi

Attaching package: 㤼㸱AnnotationDbi㤼㸲

The following object is masked from 㤼㸱package:dplyr㤼㸲:

    select
anno_GO <- AnnotationDbi::select(org.Hs.eg.db,
                      keytype = "SYMBOL",
                      keys = results_ordered$SYMBOL[1:10],
                      columns = "GO")
'select()' returned 1:many mapping between keys and columns
head(anno_GO)

How many times do we see particular ontology IDs?

count(anno_GO,GO)

It seems that *GO:0000981* comes up a lot. Is this significant? We need to know how many genes belong to the pathway.

pathway_genes <- AnnotationDbi::select(org.Hs.eg.db,
                      keytype = "GO",
                      keys="GO:0000981",
                      columns = "SYMBOL") %>% 
  filter(!duplicated(SYMBOL)) %>% 
  pull(SYMBOL)
'select()' returned 1:many mapping between keys and columns
length(pathway_genes)
[1] 1537

To assess the significance we will add extra columns to the table; is each gene significant? does the gene belong to our pathway of interest?

results_GO <- results_ordered %>% mutate(is_DE = padj < 0.05) %>% 
  mutate(in_pathway = SYMBOL %in% pathway_genes)
results_GO

A chi-squared test can be used to assess significance; this is the basis for gene-set testing

chisq.test(table(pathway=results_GO$in_pathway, DE=results_GO$is_DE))

    Pearson's Chi-squared test with Yates' continuity correction

data:  table(pathway = results_GO$in_pathway, DE = results_GO$is_DE)
X-squared = 5.2087, df = 1, p-value = 0.02247

However, this is just for one pathway. To test all pathways there are various web tools we can use. The tools requires text files that we can create with R.

results/genes_for_GO.txt is a file containing names of all DE genes.

results_GO %>% 
  filter(is_DE) %>% 
  pull(SYMBOL) %>% 
  write.table("results/genes_for_GO.txt",row.names = FALSE,quote=FALSE)

genes_universe.txt is all genes that we tested.

results_GO %>% 
  pull(SYMBOL) %>% 
  write.table("results/gene_universe.txt",row.names=FALSE,quote=FALSE)

these two files can be uploaded to GOrilla for example

GSEA

An alternative to the gene-set testing approach above is GSEA first proposed by the Broad institute. In this method, we don’t require a hard cut-off for significance. However, we have to order (arrange) our table by test statistic.

A java application for GSEA is available online. The input for this tool is a file with two-columns; gene name and test statistic.

The GSEA tool seems to require the input file to have the file extension .rnk. We also have to remove any NA values.

results_ordered %>% 
  filter(!duplicated(SYMBOL)) %>% 
  filter(!is.na(stat)) %>% 
  arrange(stat) %>% dplyr::select(SYMBOL, stat) %>% 
  write.table("results/gsea_input.rnk",sep="\t",quote=FALSE,row.names=FALSE)
LS0tDQp0aXRsZTogIkFubm90YXRpb24gYW5kIEdlbmUgU2V0IFdhbGt0aHJvdWdoIg0KYXV0aG9yOiAiTWFyayBEdW5uaW5nIg0KZGF0ZTogIjIwIEZlYnJ1YXJ5IDIwMjAiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpMb2FkIHRoZSBkYXRhIGZyb20gdGhlIGZpbGUNCg0KYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCmxpYnJhcnkoREVTZXEyKQ0KbGlicmFyeShkcGx5cikNCmRlX3N0YXR1cyA8LSByZWFkUkRTKCJSb2JqZWN0cy9kZV9zdGF0dXMucmRzIikNCmBgYA0KDQpDb252ZXJ0IHRvIGEgZGF0YSBmcmFtZSBhbmQgb3JkZXIgYnkgcC12YWx1ZQ0KDQpgYGB7cn0NCnJlc3VsdHNfb3JkZXJlZCA8LSBhcy5kYXRhLmZyYW1lKHJlc3VsdHMoZGVfc3RhdHVzKSkgJT4lIA0KICB0aWJibGU6OnJvd25hbWVzX3RvX2NvbHVtbigiU1lNQk9MIikgJT4lIA0KICBhcnJhbmdlKHBhZGopDQpgYGANCg0KQ2hlY2sgdGhlIHRhYmxlLi4uDQoNCmBgYHtyfQ0KaGVhZChyZXN1bHRzX29yZGVyZWQpDQpgYGANCg0KVXNlIHRoZSBhbm5vdGF0aW9uIHBhY2thZ2UgdG8gZ2V0IHBhdGh3YXkgSURzDQoNCmBgYHtyfQ0KIyB0byBnZXQgbmFtZXMgb2Yga2V5cw0KIyBrZXl0eXBlcyhvcmcuSHMuZWcuZGIpDQoNCiMgdG8gc2VlIHdoYXQgeW91IGNhbiByZXRyaWV2ZQ0KIyBjb2x1bW5zKG9yZy5Icy5lZy5kYikNCmxpYnJhcnkob3JnLkhzLmVnLmRiKQ0KYW5ub19HTyA8LSBBbm5vdGF0aW9uRGJpOjpzZWxlY3Qob3JnLkhzLmVnLmRiLA0KICAgICAgICAgICAgICAgICAgICAgIGtleXR5cGUgPSAiU1lNQk9MIiwNCiAgICAgICAgICAgICAgICAgICAgICBrZXlzID0gcmVzdWx0c19vcmRlcmVkJFNZTUJPTFsxOjEwXSwNCiAgICAgICAgICAgICAgICAgICAgICBjb2x1bW5zID0gIkdPIikNCmhlYWQoYW5ub19HTykNCmBgYA0KDQpIb3cgbWFueSB0aW1lcyBkbyB3ZSBzZWUgcGFydGljdWxhciBvbnRvbG9neSBJRHM/DQoNCmBgYHtyfQ0KY291bnQoYW5ub19HTyxHTykNCmBgYA0KDQpJdCBzZWVtcyB0aGF0ICpHTzowMDAwOTgxKiBjb21lcyB1cCBhIGxvdC4gSXMgdGhpcyBzaWduaWZpY2FudD8gV2UgbmVlZCB0byBrbm93IGhvdyBtYW55IGdlbmVzIGJlbG9uZyB0byB0aGUgcGF0aHdheS4NCg0KYGBge3J9DQpwYXRod2F5X2dlbmVzIDwtIEFubm90YXRpb25EYmk6OnNlbGVjdChvcmcuSHMuZWcuZGIsDQogICAgICAgICAgICAgICAgICAgICAga2V5dHlwZSA9ICJHTyIsDQogICAgICAgICAgICAgICAgICAgICAga2V5cz0iR086MDAwMDk4MSIsDQogICAgICAgICAgICAgICAgICAgICAgY29sdW1ucyA9ICJTWU1CT0wiKSAlPiUgDQogIGZpbHRlcighZHVwbGljYXRlZChTWU1CT0wpKSAlPiUgDQogIHB1bGwoU1lNQk9MKQ0KDQpsZW5ndGgocGF0aHdheV9nZW5lcykNCmBgYA0KDQpUbyBhc3Nlc3MgdGhlIHNpZ25pZmljYW5jZSB3ZSB3aWxsIGFkZCBleHRyYSBjb2x1bW5zIHRvIHRoZSB0YWJsZTsgaXMgZWFjaCBnZW5lIHNpZ25pZmljYW50PyBkb2VzIHRoZSBnZW5lIGJlbG9uZyB0byBvdXIgcGF0aHdheSBvZiBpbnRlcmVzdD8NCg0KYGBge3J9DQpyZXN1bHRzX0dPIDwtIHJlc3VsdHNfb3JkZXJlZCAlPiUgbXV0YXRlKGlzX0RFID0gcGFkaiA8IDAuMDUpICU+JSANCiAgbXV0YXRlKGluX3BhdGh3YXkgPSBTWU1CT0wgJWluJSBwYXRod2F5X2dlbmVzKQ0KcmVzdWx0c19HTw0KYGBgDQoNCkEgY2hpLXNxdWFyZWQgdGVzdCBjYW4gYmUgdXNlZCB0byBhc3Nlc3Mgc2lnbmlmaWNhbmNlOyB0aGlzIGlzIHRoZSBiYXNpcyBmb3IgZ2VuZS1zZXQgdGVzdGluZw0KDQpgYGB7cn0NCmNoaXNxLnRlc3QodGFibGUocGF0aHdheT1yZXN1bHRzX0dPJGluX3BhdGh3YXksIERFPXJlc3VsdHNfR08kaXNfREUpKQ0KDQpgYGANCg0KSG93ZXZlciwgdGhpcyBpcyBqdXN0IGZvciBvbmUgcGF0aHdheS4gVG8gdGVzdCAqYWxsKiBwYXRod2F5cyB0aGVyZSBhcmUgdmFyaW91cyB3ZWIgdG9vbHMgd2UgY2FuIHVzZS4gVGhlIHRvb2xzIHJlcXVpcmVzIHRleHQgZmlsZXMgdGhhdCB3ZSBjYW4gY3JlYXRlIHdpdGggUi4NCg0KKnJlc3VsdHMvZ2VuZXNfZm9yX0dPLnR4dCogaXMgYSBmaWxlIGNvbnRhaW5pbmcgbmFtZXMgb2YgYWxsIERFIGdlbmVzLg0KDQpgYGB7cn0NCnJlc3VsdHNfR08gJT4lIA0KICBmaWx0ZXIoaXNfREUpICU+JSANCiAgcHVsbChTWU1CT0wpICU+JSANCiAgd3JpdGUudGFibGUoInJlc3VsdHMvZ2VuZXNfZm9yX0dPLnR4dCIscm93Lm5hbWVzID0gRkFMU0UscXVvdGU9RkFMU0UpDQpgYGANCg0KKmdlbmVzX3VuaXZlcnNlLnR4dCogaXMgYWxsIGdlbmVzIHRoYXQgd2UgdGVzdGVkLg0KDQpgYGB7cn0NCnJlc3VsdHNfR08gJT4lIA0KICBwdWxsKFNZTUJPTCkgJT4lIA0KICB3cml0ZS50YWJsZSgicmVzdWx0cy9nZW5lX3VuaXZlcnNlLnR4dCIscm93Lm5hbWVzPUZBTFNFLHF1b3RlPUZBTFNFKQ0KYGBgDQoNCnRoZXNlIHR3byBmaWxlcyBjYW4gYmUgdXBsb2FkZWQgdG8gW0dPcmlsbGFdKGh0dHA6Ly9jYmwtZ29yaWxsYS5jcy50ZWNobmlvbi5hYy5pbC8pIGZvciBleGFtcGxlDQoNCiMjIEdTRUENCg0KQW4gYWx0ZXJuYXRpdmUgdG8gdGhlIGdlbmUtc2V0IHRlc3RpbmcgYXBwcm9hY2ggYWJvdmUgaXMgR1NFQSBmaXJzdCBwcm9wb3NlZCBieSB0aGUgQnJvYWQgaW5zdGl0dXRlLiBJbiB0aGlzIG1ldGhvZCwgd2UgZG9uJ3QgcmVxdWlyZSBhIGhhcmQgY3V0LW9mZiBmb3Igc2lnbmlmaWNhbmNlLiBIb3dldmVyLCB3ZSBoYXZlIHRvIG9yZGVyIChgYXJyYW5nZWApIG91ciB0YWJsZSBieSB0ZXN0IHN0YXRpc3RpYy4NCg0KQSBqYXZhIGFwcGxpY2F0aW9uIGZvciBHU0VBIGlzIGF2YWlsYWJsZSBbb25saW5lXShodHRwczovL3d3dy5nc2VhLW1zaWdkYi5vcmcvZ3NlYS9kb3dubG9hZHMuanNwKS4gVGhlIGlucHV0IGZvciB0aGlzIHRvb2wgaXMgYSBmaWxlIHdpdGggdHdvLWNvbHVtbnM7IGdlbmUgbmFtZSBhbmQgdGVzdCBzdGF0aXN0aWMuDQoNClRoZSBHU0VBIHRvb2wgc2VlbXMgdG8gcmVxdWlyZSB0aGUgaW5wdXQgZmlsZSB0byBoYXZlIHRoZSBmaWxlIGV4dGVuc2lvbiBgLnJua2AuIFdlIGFsc28gaGF2ZSB0byByZW1vdmUgYW55IGBOQWAgdmFsdWVzLg0KDQoNCmBgYHtyfQ0KDQpyZXN1bHRzX29yZGVyZWQgJT4lIA0KICBmaWx0ZXIoIWR1cGxpY2F0ZWQoU1lNQk9MKSkgJT4lIA0KICBmaWx0ZXIoIWlzLm5hKHN0YXQpKSAlPiUgDQogIGFycmFuZ2Uoc3RhdCkgJT4lIGRwbHlyOjpzZWxlY3QoU1lNQk9MLCBzdGF0KSAlPiUgDQogIHdyaXRlLnRhYmxlKCJyZXN1bHRzL2dzZWFfaW5wdXQucm5rIixzZXA9Ilx0IixxdW90ZT1GQUxTRSxyb3cubmFtZXM9RkFMU0UpDQoNCmBgYA0K