👇点关注不迷路👇
👆点关注不迷路👆
Cancer Cell Line Encyclopedia即癌症细胞系百科全书,其数据库内包含对上千种癌细胞系的多组学图谱深度解析。在泛癌分析中,我们也经常会使用该数据库分析目标基因在不同组织肿瘤中及其对应的肿瘤细胞系中的表达。以丰满泛癌分析中基因表达这一块内容。
1)官网首页点击DATASETS进入数据界面
2)再点击CCLE data
3)随后找到文件下载入口File Downloads
4)下载必须的两个文件
1)R包加载library(xml2)library(rvest)library(ggplot2)library(Hmisc)library(ggplot2)library(RColorBrewer)
2)读取表达矩阵并处理dat = data.table::fread("OmicsExpressionProteinCodingGenesTPMLogp1.csv",data.table = F)rownames(dat) <- dat[,1]dat <- dat[,-1]exp <- t(dat)
3)读取细胞注释文件clinical = fread("Model.csv",data.table = F)ModelID = intersect(colnames(exp),clinical$ModelID)exp = exp[,ModelID]clinical = clinical[match(ModelID,clinical$ModelID),]
4)提取目标基因表达信息gene="TP53"pdat <- data.frame(Expression = as.numeric(exp[grep(gene,rownames(exp)),]),clinical)
5)记录原发组织信息并剔除无肿瘤数据pdat$SampleCollectionSite <- gsub("_"," ",pdat$SampleCollectionSite)pdat$SampleCollectionSite <- capitalize(pdat$SampleCollectionSite) pdat$OncotreePrimaryDisease = factor(pdat$OncotreePrimaryDisease,levels = sort(unique(pdat$OncotreePrimaryDisease)))pdat$SampleCollectionSite = factor(pdat$SampleCollectionSite,levels = sort(unique(pdat$SampleCollectionSite)))pdat=pdat[!pdat$OncotreePrimaryDisease=="Non-Cancerous",]
6)可视化——目标基因在各组织中的表达情况pdf("All_tissue_CCLE.pdf",width = 15,height = 10)ggplot(pdat,aes(x = SampleCollectionSite,y = Expression,fill = SampleCollectionSite))+ geom_boxplot()+ labs(title="Sample Collection Site", y = paste0(gene," Expression (Log TPM)"),hjust = 0.5,size=16)+ theme_bw()+ theme(axis.text.x = element_text(vjust = 1,hjust = 1,angle = 45,size = 14), axis.text.y = element_text(size = 14), axis.title.x = element_blank(), axis.title.y = element_text(size = 16), axis.line = element_line(size = 1), plot.title = element_text(hjust = 0.5,size = 18), legend.position="none")dev.off()
7)可视化——目标基因在目标肿瘤细胞系中的表达情况tissue="Breast"#以乳腺为例pdat2 <- pdat[pdat$SampleCollectionSite ==tissue,]pdf(paste0(tissue,"CCLE.pdf"),width = 15,height = 10)ggplot(pdat2,aes(x = CellLineName,y = Expression,fill = CellLineName))+ geom_bar(stat = "identity", position = "dodge") + labs(title=paste0(tissue," Cell"), y = paste0(gene," Expression (Log TPM)"),hjust = 0.5,size=16) + scale_y_continuous(expand = c(0,0),limits = c(0,6))+ scale_fill_manual(values = randomColors[1:50])+theme_bw()+ theme(axis.text.x = element_text(vjust = 1,hjust = 1,angle = 45,size = 14), axis.text.y = element_text(size = 14), axis.title.x = element_blank(), axis.title.y = element_text(size = 16), axis.line = element_line(size = 1), plot.title = element_text(hjust = 0.5,size = 18), legend.position="none")dev.off()
①不是通用代码
②需要一定的R语言基础
③不推荐R语言零基础者单独获取运行
④不提供答疑及报错修改
⑤获取代码后请认真按照推文流程操作
微信扫一扫
关注该公众号