GSEA结果图总调不好看?手把手教你用R的enrichplot包定制专属富集分析图(配色、布局、标签详解)
GSEA结果图总调不好看?手把手教你用R的enrichplot包定制专属富集分析图(配色、布局、标签详解)
在生物信息学分析中,基因集富集分析(GSEA)是一种强大的工具,能够揭示基因集合在特定生物学条件下的变化模式。然而,许多研究者在完成分析后,常常面临一个共同的困扰:默认生成的GSEA结果图在美观度和信息呈现上难以满足学术发表或报告展示的要求。本文将深入探讨如何利用R语言中的enrichplot包,对GSEA结果图进行全方位的美化与定制,从配色方案到布局调整,从标签优化到高级修饰,一步步打造专业级的可视化效果。
1. 基础准备与环境配置
在开始图表美化之前,确保你已经完成了GSEA的基础分析并获得了gseaResult对象。这里假设你已经熟悉clusterProfiler包的基本操作,并且已经生成了GSEA结果。我们将使用enrichplot包中的gseaplot2函数作为起点,这是目前最常用的GSEA结果可视化工具。
首先,检查并加载必要的R包:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("clusterProfiler", "enrichplot", "ggplot2", "colorspace")) library(clusterProfiler) library(enrichplot) library(ggplot2) library(colorspace)为了后续演示,我们创建一个模拟的GSEA结果对象:
# 模拟GSEA结果数据 set.seed(123) geneList <- sort(rnorm(1000), decreasing=TRUE) names(geneList) <- paste0("gene", 1:1000) gsea_result <- new("gseaResult", result = data.frame( ID = c("pathway1", "pathway2", "pathway3"), Description = c("Metabolic pathway", "Signaling pathway", "Cell cycle"), setSize = c(50, 80, 120), enrichmentScore = c(0.6, -0.55, 0.7), NES = c(1.8, -1.6, 2.0), pvalue = c(0.001, 0.002, 0.0005), p.adjust = c(0.01, 0.015, 0.008), qvalues = c(0.009, 0.013, 0.007), rank = c(200, 300, 150), leading_edge = c("tags=30%, list=20%, signal=28%", "tags=25%, list=30%, signal=23%", "tags=35%, list=15%, signal=33%"), core_enrichment = c("gene1,gene2,gene3", "gene4,gene5,gene6", "gene7,gene8,gene9") ), geneList = geneList, geneSets = list( pathway1 = paste0("gene", 1:50), pathway2 = paste0("gene", 51:130), pathway3 = paste0("gene", 131:250) ), organism = "hsa", setType = "KEGG", keytype = "ENTREZID", permScores = matrix(runif(300), ncol=3), params = list( exponent = 1, nPerm = 1000, minGSSize = 10, maxGSSize = 500, pvalueCutoff = 0.05, pAdjustMethod = "BH" ) )2. 基础图表与参数解析
让我们从gseaplot2函数的基本用法开始,了解各个参数的作用和效果。gseaplot2函数生成的图表通常包含三个子图:
- 富集得分曲线图:展示基因集在排序基因列表中的富集情况
- 基因集成员位置图:显示基因集中基因在排序列表中的分布
- 排序度量值图:展示所有基因的排序度量值(如logFC)的分布
一个最基本的GSEA图表可以通过以下代码生成:
gseaplot2(gsea_result, geneSetID = "pathway1", title = "Basic GSEA Plot")2.1 核心参数详解
gseaplot2函数提供了多个参数用于控制图表的外观和内容:
geneSetID:指定要可视化的基因集ID,可以是单个ID或ID向量title:图表主标题,支持多行标题(使用\n分隔)color:指定富集曲线的颜色,可以是单个颜色或颜色向量base_size:基础字体大小,控制图表中所有文本元素的相对大小rel_heights:三个子图的相对高度比例,默认c(1.5, 0.5, 1)subplots:选择显示哪些子图,如1:3显示全部,c(1,3)显示第一和第三子图pvalue_table:是否在图表底部添加p值表格ES_geom:富集得分曲线的几何形状,"line"(线)或"dot"(点)
2.2 多通路展示与颜色定制
当需要同时展示多个通路的富集结果时,color参数可以接受一个颜色向量,为每个通路指定不同的颜色:
gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3"), color = c("#E41A1C", "#377EB8", "#4DAF4A"), title = "Multiple Pathways GSEA Plot")对于更复杂的颜色方案,可以使用colorspace包提供的专业调色板:
library(colorspace) custom_pal <- divergingx_hcl(3, palette = "Geyser") gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3"), color = custom_pal, title = "Custom Color Scheme GSEA Plot")3. 高级布局与样式调整
3.1 子图布局优化
rel_heights参数允许我们调整三个子图的相对高度比例。例如,如果我们希望强调富集得分曲线而弱化其他两个子图:
gseaplot2(gsea_result, geneSetID = "pathway1", rel_heights = c(2, 0.3, 0.7), title = "Adjusted Subplot Heights")如果只需要显示富集得分曲线和基因位置图,可以这样设置:
gseaplot2(gsea_result, geneSetID = "pathway1", subplots = c(1, 2), rel_heights = c(2, 0.5), title = "Selected Subplots Only")3.2 字体与文本样式调整
通过base_size参数可以统一调整图表中的字体大小,但对于更精细的控制,我们可以借助ggplot2的主题系统:
gseaplot2(gsea_result, geneSetID = "pathway1", base_size = 14, title = "Increased Base Font Size") + theme(plot.title = element_text(face = "bold", hjust = 0.5), axis.title = element_text(size = 12), axis.text = element_text(size = 10))3.3 P值表格的美化
当pvalue_table = TRUE时,图表底部会添加一个包含富集统计量的表格。我们可以通过多种方式优化其显示:
gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3"), pvalue_table = TRUE, title = "GSEA Plot with P-value Table") + theme(plot.margin = unit(c(1, 1, 3, 1), "lines")) # 增加底部边距以适应表格对于表格内容的定制,可以先生成图表对象,然后修改其表格部分:
p <- gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3"), pvalue_table = TRUE) # 提取并修改表格内容 p$table <- p$table + theme(plot.title = element_text(size = 10), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) print(p)4. 结合ggplot2进行高级修饰
gseaplot2生成的图表本质上是ggplot2对象,因此我们可以使用所有ggplot2的功能进行进一步的美化和定制。
4.1 添加辅助线与注释
gseaplot2(gsea_result, geneSetID = "pathway1") + geom_vline(xintercept = 0.5, linetype = "dashed", color = "gray50") + annotate("text", x = 0.5, y = 0.8, label = "Midpoint", angle = 90, vjust = -0.5, color = "gray30")4.2 修改坐标轴与图例
gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3")) + scale_x_continuous(breaks = seq(0, 1, by = 0.2), labels = paste0(seq(0, 100, by = 20), "%")) + scale_y_continuous(sec.axis = dup_axis(name = "Secondary Axis")) + guides(color = guide_legend(title = "Pathway", override.aes = list(size = 2)))4.3 主题系统全面定制
custom_theme <- theme_minimal(base_size = 12) + theme(plot.title = element_text(face = "bold", hjust = 0.5), panel.grid.major = element_line(color = "gray90"), panel.grid.minor = element_blank(), legend.position = "bottom", legend.box = "horizontal", axis.line = element_line(color = "black"), strip.background = element_rect(fill = "gray95")) gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3")) + custom_theme5. 实战案例:打造发表级GSEA图表
结合前面介绍的各种技巧,我们来创建一个适合学术发表的高质量GSEA图表。
5.1 配色方案设计
选择适合印刷的配色方案,并确保颜色在黑白打印时也能区分:
pub_palette <- c("#1B9E77", "#D95F02", "#7570B3") # Colorblind-friendly palette5.2 图表布局与样式
final_plot <- gseaplot2(gsea_result, geneSetID = c("pathway1", "pathway2", "pathway3"), color = pub_palette, rel_heights = c(1.8, 0.4, 0.8), pvalue_table = TRUE, title = "Gene Set Enrichment Analysis\nTop Significant Pathways", base_size = 12) + scale_x_continuous(breaks = seq(0, 1, by = 0.2), labels = paste0(seq(0, 100, by = 20), "%")) + theme_minimal(base_size = 12) + theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 14), panel.grid.major = element_line(color = "gray90", size = 0.2), panel.grid.minor = element_blank(), legend.position = "bottom", legend.title = element_text(size = 11), legend.text = element_text(size = 10), axis.title = element_text(size = 11), axis.text = element_text(size = 10), plot.margin = unit(c(1, 1, 2, 1), "lines")) # 调整表格样式 final_plot$table <- final_plot$table + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), plot.title = element_text(size = 10)) print(final_plot)5.3 导出高质量图表
对于学术发表,通常需要导出高分辨率的矢量图或位图:
# PDF格式(矢量图,适合编辑和印刷) ggsave("GSEA_plot.pdf", plot = final_plot, width = 8, height = 6, device = cairo_pdf) # TIFF格式(高分辨率位图,适合期刊投稿) ggsave("GSEA_plot.tiff", plot = final_plot, width = 8, height = 6, dpi = 600, compression = "lzw") # PNG格式(网络展示用) ggsave("GSEA_plot.png", plot = final_plot, width = 8, height = 6, dpi = 300)在实际项目中,我发现将图表宽度设置为8-9英寸,高度设置为6-7英寸通常能获得最佳的显示效果。对于包含多个通路的复杂图表,适当增加高度可以避免元素重叠。
