多算法免疫浸润

原创叉叉滴同学叉叉滴同学的生信笔记 2023-09-22 15:32

CIBERSORT，免疫浸润的老演员

用该包进行免疫浸润确实非常的便捷

但，同时单个算法带来的局限性也非常的明显，并且CIBERSORT本身的代码运行效率就非常的低下

因此，本期推文就给同学们分享如何利用其他算法进行免疫浸润

👇多算法免疫浸润👇

1.加载R包

library(immunedeconv)library(job)library(tcltk)library(tidyverse)library(ggsci)library(scales)library(ggtext)library(reshape2)library(limma)library(parallel)library(ggplot2)library(ggpubr)library(future.apply)library(car)library(ridge)library(e1071)library(preprocessCore)library(foreach)  library(doParallel)library(data.table)

这里最关键的是调用了immunedeconv包进行分析

2.生成随机颜色供绘图使用

#生成随机颜色randomColor <- function() {  paste0("#",paste0(sample(c(0:9, letters[1:6]), 6, replace = TRUE),collapse = ""))}
# 生成200个随机颜色randomColors <- replicate(200,randomColor())

3.数据准备

以TCGA数据为例

行名为基因名，列名为样品名

4.读取表达矩阵，并定义肿瘤类型

# 读入表达数据exprMatrix <- read.table("TCGA.txt",header=TRUE,sep = "\t",as.is = T,row.names = 1)#仅保留肿瘤数据exprMatrix= exprMatrix%>% dplyr::select(str_which(colnames(.), ".01"))%>%as.data.frame()

这里我们对TCGA的数据进行了读取，并提取出肿瘤样品的表达矩阵

需要用GEO数据库的同学，也可以构建一个分组文件，并利用匹配样品名的方式来提取肿瘤样品的表达矩阵

cancer="BRCA"#TCGA癌症缩写brca_vector <- rep(cancer, ncol(exprMatrix))

这里定义了肿瘤的类型，这在后续的各种免疫浸润算法中提供一个正确的参考，各肿瘤的缩写如下

癌症缩写	癌症英文全称	癌症中文名称
ACC	Adrenocortical carcinoma	肾上腺皮质癌
BLCA	Bladder Urothelial Carcinoma	膀胱尿路上皮癌
BRCA	Breast invasive carcinoma	乳腺浸润癌
CESC	Cervical squamous cell carcinoma and endocervical adenocarcinoma	宫颈鳞癌和腺癌
CHOL	Cholangiocarcinoma	胆管癌
COAD	Colon adenocarcinoma	结肠癌
DLBC	Lymphoid Neoplasm Diffuse Large B-cell Lymphoma	弥漫性大B细胞淋巴瘤
ESCA	Esophageal carcinoma	食管癌
FPPP	FFPE Pilot Phase II FFPE	试点二期
GBM	Glioblastoma multiforme	多形成性胶质细胞瘤
GBMLGG	Glioma	胶质瘤
HNSC	Head and Neck squamous cell carcinoma	头颈鳞状细胞癌
KICH	Kidney Chromophobe	肾嫌色细胞癌
KIPAN	Pan-kidney cohort (KICH+KIRC+KIRP)	混合肾癌
KIRC	Kidney renal clear cell carcinoma	肾透明细胞癌
KIRP	Kidney renal papillary cell carcinoma	肾乳头状细胞癌
LAML	Acute Myeloid Leukemia	急性髓细胞样白血病
LGG	Brain Lower Grade Glioma	脑低级别胶质瘤
LIHC	Liver hepatocellular carcinoma	肝细胞肝癌
LUAD	Lung adenocarcinoma	肺腺癌
LUSC	Lung squamous cell carcinoma	肺鳞癌
MESO	Mesothelioma	间皮瘤
OV	Ovarian serous cystadenocarcinoma	卵巢浆液性囊腺癌
PAAD	Pancreatic adenocarcinoma	胰腺癌
PCPG	Pheochromocytoma and Paraganglioma	嗜铬细胞瘤和副神经节瘤
PRAD	Prostate adenocarcinoma	前列腺癌
READ	Rectum adenocarcinoma	直肠腺癌
SARC	Sarcomav	肉瘤
SKCM	Skin Cutaneous Melanoma	皮肤黑色素瘤
STAD	Stomach adenocarcinoma	胃癌
STES	Stomach and Esophageal carcinoma	胃和食管癌
TGCT	Testicular Germ Cell Tumors	睾丸癌
THCA	Thyroid carcinoma	甲状腺癌
THYM	Thymoma	胸腺癌
UCEC	Uterine Corpus Endometrial Carcinoma	子宫内膜癌
UCS	Uterine Carcinosarcoma	子宫肉瘤
UVM	Uveal Melanoma	葡萄膜黑色素瘤

5.进行免疫浸润

1）estimate

exp=exprMatrixres_estimate <- deconvolute(exp, method="estimate",tumor = T,indications = brca_vector)write.table(res_estimate,"estimate.txt",sep="\t",row.names = F)res_estimate$cell_type=paste0(res_estimate$cell_type,"_ESTIMATE")

2）TIMER

exp=exprMatrixres_timer <- deconvolute(exp, method="timer",tumor = T,indications = brca_vector)write.table(res_timer,"timer.txt",sep="\t",row.names = F)res_timer$cell_type=paste0(res_timer$cell_type,"_TIMER")

3)ABIS

exp=exprMatrixres_abis <- deconvolute(exp, method="abis",tumor = T,indications = brca_vector)write.table(res_abis,"abis.txt",sep="\t",row.names = F)res_abis$cell_type=paste0(res_abis$cell_type,"_ABIS")

4)ConsensusTME

exp=exprMatrixres_consensus_tme <- deconvolute(exp, method="consensus_tme",tumor = T,indications = brca_vector)write.table(res_consensus_tme,"consensus_tme.txt",sep="\t",row.names = F)res_consensus_tme$cell_type=paste0(res_consensus_tme$cell_type,"_ConsensusTME")

5)xCell

exp=exprMatrixres_xcell <- deconvolute(exp, method="xcell",tumor = T,indications = brca_vector)write.table(res_xcell,"xcell.txt",sep="\t",row.names = F)res_xcell$cell_type=paste0(res_xcell$cell_type,"_xCell")

6)EPIC

exp=exprMatrixres_epic <- deconvolute(exp, method="epic",tumor = T,indications = brca_vector)write.table(res_epic,"epic.txt",sep="\t",row.names = F)res_epic$cell_type=paste0(res_epic$cell_type,"_EPIC")

7)quanTIseq

exp=exprMatrixres_quantiseq <- deconvolute(exp, method="quantiseq",tumor = T,indications = brca_vector) write.table(res_quantiseq,"quantiseq.txt",sep="\t",row.names = F)res_quantiseq$cell_type=paste0(res_quantiseq$cell_type,"_quanTIseq")

8)CIBERSORT

data=exprMatrixv=voom(data, plot=F, save.plot=F)out=v$Eout=rbind(ID=colnames(out), out)write.table(out,file="uniq.symbol.txt",sep="\t",quote=F,col.names=F)source("CIBERSORT.R",encoding = "utf-8")results=CIBERSORT("LM22.txt", "uniq.symbol.txt", perm=1000, QN=TRUE)res_CIBERSORT=read.table("CIBERSORT-Results.txt",sep="\t",header=T,row.names=1,check.names=F)res_CIBERSORT=t(res_CIBERSORT)rownames(res_CIBERSORT)=paste0(rownames(res_CIBERSORT),"_CIBERSORT")res_CIBERSORT=data.frame(cell_type=rownames(res_CIBERSORT),res_CIBERSORT)

在这个包里CIBERSORT利用起来有些麻烦，因此我们拼接上我们自己的就行

6.合并多个软件结果

合并之前，先将每个软件中非免疫细胞的数据剔除

res_consensus_tme=res_consensus_tme[1:(nrow(res_consensus_tme)-1),]res_xcell=res_xcell[1:(nrow(res_xcell)-3),]res_CIBERSORT=res_CIBERSORT[1:(nrow(res_CIBERSORT)-3),]

随后进行合并

res=rbind(res_abis,res_consensus_tme,res_epic,res_quantiseq,res_timer,res_xcell,res_CIBERSORT)res=as.matrix(res)rownames(res)=res[,1]exp=res[,2:ncol(res)]dimnames=list(rownames(exp),colnames(exp))data=matrix(as.numeric(as.matrix(exp)),nrow=nrow(exp),dimnames=dimnames)

7.进行相关性分析

#输入要研究的基因gene="TP53"gene_exp=exprMatrix[gene,]gene_exp=t(gene_exp)
samesample=intersect(rownames(gene_exp),colnames(data))gene_exp=gene_exp[samesample,]gene_exp=as.data.frame(gene_exp)colnames(gene_exp)=genedata=data[,samesample]
x=as.numeric(gene_exp[,1])outTab=data.frame()for(i in rownames(data)){  y=as.numeric(data[i,])  if(sd(y)<0.001){next}  corT=cor.test(x, y, method="spearman")  cor=corT$estimate  pvalue=corT$p.value  if(pvalue<0.05){    outTab=rbind(outTab,cbind(immune=i, cor, pvalue))  }}#输出相关性结果write.table(file="corResult.txt", outTab, sep="\t", quote=F, row.names=F)

分析得到目标基因与每个免疫细胞的相关性和p值

8.进行可视化

#绘制气泡图corResult=read.table("corResult.txt", head=T, sep="\t")corResult$Software=sapply(strsplit(corResult[,1],"_"), '[', 2)corResult$Software=factor(corResult$Software,level=as.character(unique(corResult$Software[rev(order(as.character(corResult$Software)))])))b=corResult[order(corResult$Software),]b$immune=factor(b$immune,levels=rev(as.character(b$immune)))colslabels=rep(hue_pal()(length(levels(b$Software))),table(b$Software))     #定义图形颜色#输出图形pdf(file="correlation.pdf", width=9, height=10)ggplot(data=b, aes(x=cor, y=immune, color=Software))+  labs(x="Correlation coefficient",y="Immune cell")+  geom_point(size=4.1)+  theme(panel.background=element_rect(fill="white",size=1,color="black"),        panel.grid=element_line(color="grey75",size=0.5),        axis.ticks = element_line(size=0.5),        axis.text.y = ggtext::element_markdown(colour=rev(colslabels)))dev.off()

最后便得到目标基因与免疫细胞的相关性图

👇效率优化👇

单个算法的局限性倒是解决了，代码效率低下的问题却貌似越来越大

这份乳腺癌的数据跑下来快花了2个小时

仔细观察代码，每个算法都是调用同一个数据，但是一个分析完成后才能跑下一个分析，这大大的浪费了电脑资源

看了官方文档，这个包没有设置并行参数，但是我们可以曲线救国👇

即将每个分析都丢在后台，并将结果输出到环境中，这样就可以一次性同时跑8个分析

当然，最大的后腿还是低效的CIBERSORT

这里也很简单，只要将关键的循环进行多线程改写就能得到很好的效果👇

👇👇👇

最后运行下来，不到20分钟就能将所有分析完成，这个提升还是蛮大的

当然，具体提升程度还需要看机器本身的硬件条件

多个算法的免疫浸润实现，以及代码效率的提升就给同学们介绍到这

感谢同学们的关注及支持

多线程的示例代码及文件可在下方获取

👇感谢关注支持👇

👇扫码获取示例文件及代码👇

1元 = 10微信豆

可试读97%

1元 = 10微信豆

喜欢此内容的人还喜欢

微信扫一扫
关注该公众号