CIBERSORT,免疫浸润的老演员
用该包进行免疫浸润确实非常的便捷
但,同时单个算法带来的局限性也非常的明显,并且CIBERSORT本身的代码运行效率就非常的低下
因此,本期推文就给同学们分享如何利用其他算法进行免疫浸润
👇多算法免疫浸润👇
1.加载R包library(immunedeconv)
library(job)
library(tcltk)
library(tidyverse)
library(ggsci)
library(scales)
library(ggtext)
library(reshape2)
library(limma)
library(parallel)
library(ggplot2)
library(ggpubr)
library(future.apply)
library(car)
library(ridge)
library(e1071)
library(preprocessCore)
library(foreach)
library(doParallel)
library(data.table)
这里最关键的是调用了immunedeconv包进行分析
2.生成随机颜色供绘图使用#生成随机颜色
randomColor <- function() {
paste0("#",paste0(sample(c(0:9, letters[1:6]), 6, replace = TRUE),collapse = ""))
}
# 生成200个随机颜色
randomColors <- replicate(200,randomColor())
3.数据准备
以TCGA数据为例
行名为基因名,列名为样品名
4.读取表达矩阵,并定义肿瘤类型# 读入表达数据
exprMatrix <- read.table("TCGA.txt",header=TRUE,sep = "\t",as.is = T,row.names = 1)
#仅保留肿瘤数据
exprMatrix= exprMatrix%>% dplyr::select(str_which(colnames(.), ".01"))%>%as.data.frame()
这里我们对TCGA的数据进行了读取,并提取出肿瘤样品的表达矩阵
需要用GEO数据库的同学,也可以构建一个分组文件,并利用匹配样品名的方式来提取肿瘤样品的表达矩阵cancer="BRCA"#TCGA癌症缩写
brca_vector <- rep(cancer, ncol(exprMatrix))
这里定义了肿瘤的类型,这在后续的各种免疫浸润算法中提供一个正确的参考,各肿瘤的缩写如下
癌症缩写 | 癌症英文全称 | 癌症中文名称 |
ACC | Adrenocortical carcinoma | 肾上腺皮质癌 |
BLCA | Bladder Urothelial Carcinoma | 膀胱尿路上皮癌 |
BRCA | Breast invasive carcinoma | 乳腺浸润癌 |
CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma | 宫颈鳞癌和腺癌 |
CHOL | Cholangiocarcinoma | 胆管癌 |
COAD | Colon adenocarcinoma | 结肠癌 |
DLBC | Lymphoid Neoplasm Diffuse Large B-cell Lymphoma | 弥漫性大B细胞淋巴瘤 |
ESCA | Esophageal carcinoma | 食管癌 |
FPPP | FFPE Pilot Phase II FFPE | 试点二期 |
GBM | Glioblastoma multiforme | 多形成性胶质细胞瘤 |
GBMLGG | Glioma | 胶质瘤 |
HNSC | Head and Neck squamous cell carcinoma | 头颈鳞状细胞癌 |
KICH | Kidney Chromophobe | 肾嫌色细胞癌 |
KIPAN | Pan-kidney cohort (KICH+KIRC+KIRP) | 混合肾癌 |
KIRC | Kidney renal clear cell carcinoma | 肾透明细胞癌 |
KIRP | Kidney renal papillary cell carcinoma | 肾乳头状细胞癌 |
LAML | Acute Myeloid Leukemia | 急性髓细胞样白血病 |
LGG | Brain Lower Grade Glioma | 脑低级别胶质瘤 |
LIHC | Liver hepatocellular carcinoma | 肝细胞肝癌 |
LUAD | Lung adenocarcinoma | 肺腺癌 |
LUSC | Lung squamous cell carcinoma | 肺鳞癌 |
MESO | Mesothelioma | 间皮瘤 |
OV | Ovarian serous cystadenocarcinoma | 卵巢浆液性囊腺癌 |
PAAD | Pancreatic adenocarcinoma | 胰腺癌 |
PCPG | Pheochromocytoma and Paraganglioma | 嗜铬细胞瘤和副神经节瘤 |
PRAD | Prostate adenocarcinoma | 前列腺癌 |
READ | Rectum adenocarcinoma | 直肠腺癌 |
SARC | Sarcomav | 肉瘤 |
SKCM | Skin Cutaneous Melanoma | 皮肤黑色素瘤 |
STAD | Stomach adenocarcinoma | 胃癌 |
STES | Stomach and Esophageal carcinoma | 胃和食管癌 |
TGCT | Testicular Germ Cell Tumors | 睾丸癌 |
THCA | Thyroid carcinoma | 甲状腺癌 |
THYM | Thymoma | 胸腺癌 |
UCEC | Uterine Corpus Endometrial Carcinoma | 子宫内膜癌 |
UCS | Uterine Carcinosarcoma | 子宫肉瘤 |
UVM | Uveal Melanoma | 葡萄膜黑色素瘤 |
5.进行免疫浸润
1)estimateexp=exprMatrix
res_estimate <- deconvolute(exp, method="estimate",tumor = T,indications = brca_vector)
write.table(res_estimate,"estimate.txt",sep="\t",row.names = F)
res_estimate$cell_type=paste0(res_estimate$cell_type,"_ESTIMATE")
2)TIMERexp=exprMatrix
res_timer <- deconvolute(exp, method="timer",tumor = T,indications = brca_vector)
write.table(res_timer,"timer.txt",sep="\t",row.names = F)
res_timer$cell_type=paste0(res_timer$cell_type,"_TIMER")
3)ABISexp=exprMatrix
res_abis <- deconvolute(exp, method="abis",tumor = T,indications = brca_vector)
write.table(res_abis,"abis.txt",sep="\t",row.names = F)
res_abis$cell_type=paste0(res_abis$cell_type,"_ABIS")
4)ConsensusTMEexp=exprMatrix
res_consensus_tme <- deconvolute(exp, method="consensus_tme",tumor = T,indications = brca_vector)
write.table(res_consensus_tme,"consensus_tme.txt",sep="\t",row.names = F)
res_consensus_tme$cell_type=paste0(res_consensus_tme$cell_type,"_ConsensusTME")
5)xCellexp=exprMatrix
res_xcell <- deconvolute(exp, method="xcell",tumor = T,indications = brca_vector)
write.table(res_xcell,"xcell.txt",sep="\t",row.names = F)
res_xcell$cell_type=paste0(res_xcell$cell_type,"_xCell")
6)EPICexp=exprMatrix
res_epic <- deconvolute(exp, method="epic",tumor = T,indications = brca_vector)
write.table(res_epic,"epic.txt",sep="\t",row.names = F)
res_epic$cell_type=paste0(res_epic$cell_type,"_EPIC")
7)quanTIseqexp=exprMatrix
res_quantiseq <- deconvolute(exp, method="quantiseq",tumor = T,indications = brca_vector)
write.table(res_quantiseq,"quantiseq.txt",sep="\t",row.names = F)
res_quantiseq$cell_type=paste0(res_quantiseq$cell_type,"_quanTIseq")
8)CIBERSORTdata=exprMatrix
v=voom(data, plot=F, save.plot=F)
out=v$E
out=rbind(ID=colnames(out), out)
write.table(out,file="uniq.symbol.txt",sep="\t",quote=F,col.names=F)
source("CIBERSORT.R",encoding = "utf-8")
results=CIBERSORT("LM22.txt", "uniq.symbol.txt", perm=1000, QN=TRUE)
res_CIBERSORT=read.table("CIBERSORT-Results.txt",sep="\t",header=T,row.names=1,check.names=F)
res_CIBERSORT=t(res_CIBERSORT)
rownames(res_CIBERSORT)=paste0(rownames(res_CIBERSORT),"_CIBERSORT")
res_CIBERSORT=data.frame(cell_type=rownames(res_CIBERSORT),res_CIBERSORT)
在这个包里CIBERSORT利用起来有些麻烦,因此我们拼接上我们自己的就行
6.合并多个软件结果
合并之前,先将每个软件中非免疫细胞的数据剔除res_consensus_tme=res_consensus_tme[1:(nrow(res_consensus_tme)-1),]
res_xcell=res_xcell[1:(nrow(res_xcell)-3),]
res_CIBERSORT=res_CIBERSORT[1:(nrow(res_CIBERSORT)-3),]
随后进行合并res=rbind(res_abis,res_consensus_tme,res_epic,res_quantiseq,res_timer,res_xcell,res_CIBERSORT)
res=as.matrix(res)
rownames(res)=res[,1]
exp=res[,2:ncol(res)]
dimnames=list(rownames(exp),colnames(exp))
data=matrix(as.numeric(as.matrix(exp)),nrow=nrow(exp),dimnames=dimnames)
7.进行相关性分析#输入要研究的基因
gene="TP53"
gene_exp=exprMatrix[gene,]
gene_exp=t(gene_exp)
samesample=intersect(rownames(gene_exp),colnames(data))
gene_exp=gene_exp[samesample,]
gene_exp=as.data.frame(gene_exp)
colnames(gene_exp)=gene
data=data[,samesample]
x=as.numeric(gene_exp[,1])
outTab=data.frame()
for(i in rownames(data)){
y=as.numeric(data[i,])
if(sd(y)<0.001){next}
corT=cor.test(x, y, method="spearman")
cor=corT$estimate
pvalue=corT$p.value
if(pvalue<0.05){
outTab=rbind(outTab,cbind(immune=i, cor, pvalue))
}
}
#输出相关性结果
write.table(file="corResult.txt", outTab, sep="\t", quote=F, row.names=F)
分析得到目标基因与每个免疫细胞的相关性和p值
8.进行可视化#绘制气泡图
corResult=read.table("corResult.txt", head=T, sep="\t")
corResult$Software=sapply(strsplit(corResult[,1],"_"), '[', 2)
corResult$Software=factor(corResult$Software,level=as.character(unique(corResult$Software[rev(order(as.character(corResult$Software)))])))
b=corResult[order(corResult$Software),]
b$immune=factor(b$immune,levels=rev(as.character(b$immune)))
colslabels=rep(hue_pal()(length(levels(b$Software))),table(b$Software)) #定义图形颜色
#输出图形
pdf(file="correlation.pdf", width=9, height=10)
ggplot(data=b, aes(x=cor, y=immune, color=Software))+
labs(x="Correlation coefficient",y="Immune cell")+
geom_point(size=4.1)+
theme(panel.background=element_rect(fill="white",size=1,color="black"),
panel.grid=element_line(color="grey75",size=0.5),
axis.ticks = element_line(size=0.5),
axis.text.y = ggtext::element_markdown(colour=rev(colslabels)))
dev.off()
最后便得到目标基因与免疫细胞的相关性图
👇效率优化👇
单个算法的局限性倒是解决了,代码效率低下的问题却貌似越来越大
这份乳腺癌的数据跑下来快花了2个小时
仔细观察代码,每个算法都是调用同一个数据,但是一个分析完成后才能跑下一个分析,这大大的浪费了电脑资源
看了官方文档,这个包没有设置并行参数,但是我们可以曲线救国👇
即将每个分析都丢在后台,并将结果输出到环境中,这样就可以一次性同时跑8个分析
当然,最大的后腿还是低效的CIBERSORT
这里也很简单,只要将关键的循环进行多线程改写就能得到很好的效果👇
👇👇👇
最后运行下来,不到20分钟就能将所有分析完成,这个提升还是蛮大的
当然,具体提升程度还需要看机器本身的硬件条件
多个算法的免疫浸润实现,以及代码效率的提升就给同学们介绍到这
感谢同学们的关注及支持
多线程的示例代码及文件可在下方获取
👇感谢关注支持👇
👇扫码获取示例文件及代码👇
可试读97%
微信扫一扫
关注该公众号