装饰热图

在绘制热图之后,仍会保留每个热图组件的绘图区域。因此,可以通过获取组件的绘图区域来添加或修改图形

可以使用 list_components() 获取热图的所有组件名称或 viewport 名称,例如

set.seed(123)
mat <- matrix(rnorm(80, 2), 8, 10)
mat <- rbind(mat, matrix(rnorm(40, -2), 4, 10))
rownames(mat) <- paste0("R", 1:12)
colnames(mat) <- paste0("C", 1:10)

ha_column1 <- HeatmapAnnotation(
  points = anno_points(rnorm(10)), 
  annotation_name_side = "left"
)
ht1 <- Heatmap(
  mat, name = "ht1", km = 2, 
  column_title = "Heatmap 1", 
  top_annotation = ha_column1, 
  row_names_side = "left"
)
ha_column2 <- HeatmapAnnotation(
  type = c(rep("a", 5), rep("b", 5)),
  col = list(type = c("a" = "red", "b" = "blue"))
)
ht2 <- Heatmap(
  mat, name = "ht2", row_title = "Heatmap 2", 
  column_title = "Heatmap 2",
  bottom_annotation = ha_column2, column_km = 2
)

ht_list <- ht1 + ht2 + 
  rowAnnotation(
    bar = anno_barplot(
      rowMeans(mat), width = unit(2, "cm"))
)
draw(
  ht_list, row_title = "Heatmap list", 
  column_title = "Heatmap list",
)

列出所有组件

> list_components()
 [1] "ROOT"                           "global"                         "global_layout"                 
 [4] "global-heatmaplist"             "main_heatmap_list"              "heatmap_ht1"                   
 [7] "ht1_heatmap_body_wrap"          "ht1_heatmap_body_1_1"           "ht1_heatmap_body_2_1"          
[10] "ht1_column_title_1"             "ht1_row_title_1"                "ht1_row_title_2"               
[13] "ht1_dend_row_1"                 "ht1_dend_row_2"                 "ht1_dend_column_1"             
[16] "ht1_row_names_1"                "ht1_row_names_2"                "ht1_column_names_1"            
[19] "annotation_points_1"            "heatmap_ht2"                    "ht2_heatmap_body_wrap"         
[22] "ht2_heatmap_body_1_1"           "ht2_heatmap_body_1_2"           "ht2_heatmap_body_2_1"          
[25] "ht2_heatmap_body_2_2"           "ht2_column_title_1"             "ht2_dend_column_1"             
[28] "ht2_dend_column_2"              "ht2_column_names_1"             "ht2_column_names_2"            
[31] "annotation_type_1"              "annotation_type_2"              "heatmap_heatmap_annotation_2"  
[34] "annotation_bar_1"               "annotation_bar_2"               "global-column_title_top"       
[37] "global_column_title"            "global-row_title_left"          "global_row_title"              
[40] "global-heatmap_legend_right"    "heatmap_legend"                 "global-annotation_legend_right"
[43] "annotation_legend"             

1. 装饰函数

获取了所有组件的 viewport 名称之后,我们可以使用 seekViewport() 来获取对应的 viewport。

为了避免使用繁琐的 viewport 名称,ComplexHeatmap 包提供了 decorate_*() 形式的简便函数

  • decorate_heatmap_body():装饰热图主体
  • decorate_annotation():注释
  • decorate_dend():树状图
  • decorate_title():标题
  • decorate_dimnames():行列名称
  • decorate_row_names():行名,相当于 decorate_dimnames(…, which = “row”)
  • decorate_column_names():列名,相当于 decorate_dimnames(…, which = “column”)
  • decorate_row_dend():行树状图,相当于 decorate_dend(…, which = “row”)
  • decorate_column_dend():列树状图,相当于 decorate_dend(…, which = “column”)
  • decorate_row_title():行标题,相当于 decorate_title(…, which = “row”)
  • decorate_column_title():列标题,相当于 to decorate_title(…, which = “column”)

这些函数需要传入热图或注释的名称、绘图块,如果热图分块了,需要设置分块索引

例如,对于下面的热图

ht_list <- draw(ht_list, row_title = "Heatmap list", column_title = "Heatmap list", 
               heatmap_legend_side = "right", annotation_legend_side = "left")

在第一个热图的第二个分块中,文本和线条注释

decorate_heatmap_body("ht1", {
  grid.text("outlier", 1.5/10, 2.5/4, default.units = "npc")
  grid.lines(
    c(0.5, 0.5), c(0, 1), 
    gp = gpar(lty = 2, lwd = 2, col = "green"))
}, slice = 2)

为第一个热图的树形图的不同类别添加不同的背景颜色

decorate_column_dend("ht1", {
  # 获取树状图
  tree = column_dend(ht_list)$ht1[[1]]
  # 将树状图分为两类
  ind = cutree(as.hclust(tree), k = 2)[order.dendrogram(tree)]
  # 获取每类的起始索引
  first_index = function(l) which(l)[1]
  # 获取没类的终止索引
  last_index = function(l) { x = which(l); x[length(x)] }
  x1 = c(first_index(ind == 1), first_index(ind == 2)) - 1
  x2 = c(last_index(ind == 1), last_index(ind == 2))
  # 为没类设置矩形背景色
  grid.rect(x = x1/length(ind), width = (x2 - x1)/length(ind), just = "left",
            default.units = "npc", gp = gpar(fill = c("#FF000040", "#00FF0040"), col = NA))
})

为第一个热图的第二个分块的行名,设置背景填充色

decorate_row_names("ht1", {
  grid.rect(gp = gpar(fill = "#FF000040"))
}, slice = 2)

为第一个热图的第一个分块的行标题设置背景填充色

decorate_row_title("ht1", {
  grid.rect(gp = gpar(fill = "#00FF0040"))
}, slice = 1)

在点图注释中添加一条水平线

decorate_annotation("points", {
  grid.lines(
    c(0, 1), unit(c(0, 0), "native"), 
    gp = gpar(col = "red"))
})

oncoprint

oncoprint 是一种通过热图的方式来可视化多个基因组变异事件。ComplexHeatmap 包提供了 oncoPrint() 函数来绘制这种类型的图。

默认的样式是 cBioPortal 格式,我们也可以根据需要不同类型的图形

1. 输入数据格式

输入数据可以有两种格式:矩阵和矩阵列表

1.1 矩阵

对于矩阵类型的数据,行代表的是基因,列表示的是样本,矩阵的值表示基因在样本中方式的变异类型。例如

mat <- read.table(
  textConnection(
    "s1,s2,s3
    g1,snv;indel,snv,indel
    g2,,snv;indel,snv
    g3,snv,,indel;snv"
  ),
  row.names = 1,
  header = TRUE,
  sep = ",",
  stringsAsFactors = FALSE
)
mat <- as.matrix(mat)

对于这种字符型矩阵,还需要定义相应的变异类型提取函数,例如

> get_type_fun <- function(x) unlist(strsplit(x, ";"))
> get_type_fun(mat[1,1])
[1] "snv"   "indel"

如果变异类型的编码方式为 snv|indel,只需把函数定义为按 | 分割就行

get_type_fun <- function(x) unlist(strsplit(x, "|"))

然后将该函数传递给 oncoPrint() 函数的 get_type 参数。

对于常见的分隔符:;:,|oncoPrint 会自动解析,不需要指定解析函数

alter_fun 参数可以自定义每种变异类型在热图单元格中的绘制函数,函数接受 4 个参数,其中 xy 用于标识格子位置,wh 用于标识格子的大小,并使用 col 来标注颜色

col = c(snv = "#fb8072", indel = "#80b1d3")
oncoPrint(
  mat, alter_fun = list(
    snv = function(x, y, w, h) 
      grid.rect(
        x, y, w*0.9, h*0.9,
        gp = gpar(fill = col["snv"],col = NA)),
    indel = function(x, y, w, h) 
      grid.rect(
        x, y, w*0.9, h*0.4,
        gp = gpar(fill = col["indel"], col = NA))
    ), 
  col = col
)

注意:如果 alter_fun 设置为列表,元素的顺序会影响图形绘制顺序,先定义的先绘制

1.2 矩阵列表

第二种格式是矩阵列表,列表中的每种突变类型对应一个矩阵,矩阵仅包含 0-1 值,用于标识基因在样本中是否发生了这种类型的突变,并且列表名称与变异类型对应

> mat_list <- list(
+   snv = matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3),
+   indel = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 0), nrow = 3))
> 
> rownames(mat_list$snv) <- rownames(mat_list$indel) <- c("g1", "g2", "g3")
> colnames(mat_list$snv) <- colnames(mat_list$indel) <- c("s1", "s2", "s3")
> mat_list
$snv
   s1 s2 s3
g1  1  1  0
g2  0  1  1
g3  1  0  1

$indel
   s1 s2 s3
g1  1  0  1
g2  0  1  0
g3  0  0  0

需要保证所有矩阵具有相同的行名和列名

col = c(snv = "#fb8072", indel = "#80b1d3")
oncoPrint(
  mat_list, alter_fun = list(
    snv = function(x, y, w, h) 
      grid.rect(
        x, y, w*0.9, h*0.9,
        gp = gpar(fill = col["snv"],col = NA)),
    indel = function(x, y, w, h) 
      grid.rect(
        x, y, w*0.9, h*0.4,
        gp = gpar(fill = col["indel"], col = NA))
    ), 
  col = col
)

2. 定义 alter_fun

alter_fun 不仅可以传递一个函数列表,还可以传递一个函数,该函数多了一个参数,用于传递一个逻辑值向量,用于标识当前基因在当前样本中是否发生了对应的变异

oncoPrint(
  mat, alter_fun = function(x, y, w, h, v) {
    if (v["snv"])
      grid.rect(x, y, w * 0.9, h * 0.9, 
                gp = gpar(fill = col["snv"], col = NA))
    if (v["indel"])
      grid.rect(x, y, w * 0.9, h * 0.4,
                gp = gpar(fill = col["indel"], col = NA))
  }, 
  col = col
)

设置为单个函数,可以更灵活进行自定义

oncoPrint(
  mat, alter_fun = function(x, y, w, h, v) {
    n = sum(v) # 发生变异的数量
    h = h * 0.9
    if (n)
      grid.rect(x,
                y - h * 0.5 + 1:n / n * h,
                w * 0.9,
                1 / n * h,
                gp = gpar(fill = col[names(which(v))], col = NA),
                just = "top")
  }, col = col)

设置为三角形填充

oncoPrint(
  mat,
  alter_fun = list(
    # 控制背景的绘制,通常在放在第一个
    background = function(x, y, w, h) {
      grid.polygon(
        unit.c(x - 0.5 * w, x - 0.5 * w, x + 0.5 * w),
        unit.c(y - 0.5 * h, y + 0.5 * h, y - 0.5 * h),
        gp = gpar(fill = "grey", col = "white")
      )
      grid.polygon(
        unit.c(x + 0.5 * w, x + 0.5 * w, x - 0.5 * w),
        unit.c(y + 0.5 * h, y - 0.5 * h, y + 0.5 * h),
        gp = gpar(fill = "grey", col = "white")
      )
    },
    snv = function(x, y, w, h) {
      grid.polygon(
        unit.c(x - 0.5 * w, x - 0.5 * w, x + 0.5 * w),
        unit.c(y - 0.5 * h, y + 0.5 * h, y - 0.5 * h),
        gp = gpar(fill = col["snv"], col = "white")
      )
    },
    indel = function(x, y, w, h) {
      grid.polygon(
        unit.c(x + 0.5 * w, x + 0.5 * w, x - 0.5 * w),
        unit.c(y + 0.5 * h, y - 0.5 * h, y + 0.5 * h),
        gp = gpar(fill = col["indel"], col = "white")
      )
    }
  ),
  col = col
)

在上面的例子中,我们添加了一个背景设置 background。背景需要放置第一个,如果想要删除背景,可以设置

background = function(...) NULL

在某些情况下,我们可能需要设置的变异类型较多,为了确保我们 alter_fun 设置正确,可以使用 test_alter_fun() 函数来进行测试。例如

alter_fun <- list(
  mut1 = function(x, y, w, h) 
    grid.rect(x, y, w, h, gp = gpar(fill = "red", col = NA)),
  mut2 = function(x, y, w, h) 
    grid.rect(x, y, w, h, gp = gpar(fill = "blue", col = NA)),
  mut3 = function(x, y, w, h) 
    grid.rect(x, y, w, h, gp = gpar(fill = "yellow", col = NA)),
  mut4 = function(x, y, w, h) 
    grid.rect(x, y, w, h, gp = gpar(fill = "purple", col = NA)),
  mut5 = function(x, y, w, h) 
    grid.rect(x, y, w, h, gp = gpar(fill = NA, lwd = 2)),
  mut6 = function(x, y, w, h) 
    grid.points(x, y, pch = 16),
  mut7 = function(x, y, w, h) 
    grid.segments(x - w*0.5, y - h*0.5, x + w*0.5, y + h*0.5, gp = gpar(lwd = 2))
)
test_alter_fun(alter_fun)

3. 简化 alter_fun

如果只要绘制简单图形,如 矩形和散点图,可以使用 alter_graphic() 函数

oncoPrint(
  mat,
  alter_fun = list(
    snv = alter_graphic(
      "rect",
      width = 0.9,
      height = 0.9,
      fill = col["snv"]
    ),
    indel = alter_graphic(
      "rect",
      width = 0.9,
      height = 0.4,
      fill = col["indel"]
    )
  ),
  col = col
)

4. 复杂变异类型

大多数时候,我们需要展示的变异类型并不是单单一两种,可能会有很多种,如果单单用颜色来区分的话比较困难。

而且,有些变异类型是我们比较关注的,而其他的一些次要的变异类型没那么重要,就有一个主次关系。

例如,snvindel 变异类型中又包含 intronic snvexonic snvintronic indelexonic indel。主分类应该是 snvindel,次分类是 intronicexonic

所以,我们可以为主分类设置同样类型的图形,比如说,设置不同的颜色来区分;而次分类设置为不同的符号类型。

对于下面的数据

type <- c("snv;intronic", "snv;exonic", "indel;intronic", "indel;exonic", "")
m <- matrix(
  sample(type, size = 100, replace = TRUE),
  nrow = 10, ncol = 10,
  dimnames = list(paste0("g", 1:10), paste0("s", 1:10))
  )

定义 alter_fun

alter_fun <- list(
  # 设置背景
  background = function(x, y, w, h) 
    grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#CCCCCC", col = NA)),
  # SNV 颜色
  snv = function(x, y, w, h) 
    grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#fb8072", col = NA)),
  # indel 颜色
  indel = function(x, y, w, h) 
    grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#80b1d3", col = NA)),
  # 内含子设置为点
  intronic = function(x, y, w, h) 
    grid.points(x, y, pch = 16),
  # 外显子设置为 X
  exonic = function(x, y, w, h) {
    grid.segments(x - w*0.4, y - h*0.4, x + w*0.4, y + h*0.4, gp = gpar(lwd = 2))
    grid.segments(x + w*0.4, y - h*0.4, x - w*0.4, y + h*0.4, gp = gpar(lwd = 2))
  }
)

绘制

oncoPrint(m, alter_fun = alter_fun, col = c(snv = "#fb8072", indel = "#80b1d3"))

5. 其他参数设置

oncoPrint 本质上也是热图,所以很多热图的参数都可以使用,例如,显示列名

alter_fun <- list(
  snv = function(x, y, w, h)
    grid.rect(x, y, w * 0.9, h * 0.9,
              gp = gpar(fill = col["snv"], col = NA)),
  indel = function(x, y, w, h)
    grid.rect(x, y, w * 0.9, h * 0.4,
              gp = gpar(fill = col["indel"], col = NA))
)

oncoPrint(
  mat, alter_fun = alter_fun, 
  col = col, show_column_names = TRUE
)

行名和百分比文本的显示可以使用 show_pctshow_row_names,位置可以使用 pct_siderow_names_side 设置,百分比精确度可以使用 pct_digits

oncoPrint(
  mat,
  alter_fun = alter_fun,
  col = col,
  row_names_side = "left",
  pct_side = "right",
  pct_digits = 2
)

使用 anno_oncoprint_barplot() 注释函数来控制条形图

oncoPrint(
  mat,
  alter_fun = alter_fun,
  col = col,
  top_annotation = HeatmapAnnotation(cbar = anno_oncoprint_barplot(height = unit(1, "cm"))),
  right_annotation = rowAnnotation(rbar = anno_oncoprint_barplot(
    width = unit(4, "cm"),
    axis_param = list(
      at = c(0, 2, 4),
      labels = c("zero", "two", "four"),
      side = "top",
      labels_rot = 0
    )
  )),
)

或者,把右边的条形图往左边放放

oncoPrint(
  mat,
  alter_fun = alter_fun,
  col = col,
  left_annotation =  rowAnnotation(rbar = anno_oncoprint_barplot(axis_param = list(direction = "reverse"))),
  right_annotation = NULL
)

应用实例

我们使用 ComplexHeatmap 包中提供的数据,该数据来自于 cBioPortal 数据库

mat <- read.table(
   system.file(
     "extdata",
     package = "ComplexHeatmap",
     "tcga_lung_adenocarcinoma_provisional_ras_raf_mek_jnk_signalling.txt"
   ),
   header = TRUE,
   stringsAsFactors = FALSE,
   sep = "\t"
 )
mat[is.na(mat)] <- ""
rownames(mat) <- mat[, 1]
mat <- mat[,-1]
mat <-  mat[,-ncol(mat)]
mat <- t(as.matrix(mat))

该数据包含 Ras-Raf-MEK-Erk/JNK signaling 通路中的 26 个基因在 172 个肺腺癌样本中的突变即 CNV 变异信息

> mat[1:3,1:3]
     TCGA-05-4384-01 TCGA-05-4390-01 TCGA-05-4425-01
KRAS "  "            "MUT;"          "  "           
HRAS "  "            "  "            "  "           
BRAF "  "            "  "            "  " 

数据中包含 3 种变异:MUTAMPHOMDEL,现在,我们为每种变异类型定义图形

col <- c("HOMDEL" = "#ff7f00", "AMP" = "#984ea3", "MUT" = "#4daf4a")
alter_fun = list(
  background = alter_graphic("rect", fill = "#CCCCCC"),
  HOMDEL = alter_graphic("rect", fill = col["HOMDEL"]),
  AMP = alter_graphic("rect", fill = col["AMP"]),
  MUT = alter_graphic("rect", height = 0.33, fill = col["MUT"])
)

我们只是设置格子的颜色,所以可以使用 alter_graphic 来设置

设置列标题和图例

column_title <- "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling"
heatmap_legend_param <-
  list(
    title = "Alternations",
    at = c("HOMDEL", "AMP", "MUT"),
    labels = c("Deep deletion", "Amplification", "Mutation")
  )

绘制图片

oncoPrint(
  mat, alter_fun = alter_fun, col = col,
  column_title = column_title,
  heatmap_legend_param = heatmap_legend_param
)

我们可以看到,有很多空白的行和列,删掉它们

oncoPrint(
  mat, alter_fun = alter_fun, col = col,
  remove_empty_columns = TRUE, 
  remove_empty_rows = TRUE,
  column_title = column_title,
  heatmap_legend_param = heatmap_legend_param
)

row_ordercolumn_order 可以设置行、列的顺序

oncoPrint(
  mat, alter_fun = alter_fun, col = col,
  column_title = column_title,
  row_order = 1:nrow(mat),
  remove_empty_columns = TRUE, 
  remove_empty_rows = TRUE,
  heatmap_legend_param = heatmap_legend_param
)

我们可以使用 anno_oncoprint_barplot() 来修改条形图注释,且条形图默认都是显示变异的数量,可以在设置 show_fraction = TRUE 来显示频率

oncoPrint(
  mat,
  alter_fun = alter_fun,
  col = col,
  # 上方条形图只显示 MUT 的频率
  top_annotation = HeatmapAnnotation(
    column_barplot = anno_oncoprint_barplot(
      "MUT", border = TRUE,
      show_fraction = TRUE,
      height = unit(4, "cm")
  )),
  # 右侧条形图显示 AMP 和 HOMDEL
  right_annotation = rowAnnotation(
    row_barplot = anno_oncoprint_barplot(
      c("AMP", "HOMDEL"),
      border = TRUE,
      height = unit(4, "cm"),
      axis_param = list(side = "bottom", labels_rot = 90)
  )),
  remove_empty_columns = TRUE,
  remove_empty_rows = TRUE,
  column_title = column_title,
  heatmap_legend_param = heatmap_legend_param
)

类似于热图,我们可以使用 HeatmapAnnotation()rowAnnotation() 来添加行列注释

oncoPrint(
  mat,
  alter_fun = alter_fun,
  col = col,
  remove_empty_columns = TRUE,
  remove_empty_rows = TRUE,
  top_annotation = HeatmapAnnotation(
    cbar = anno_oncoprint_barplot(),
    foo1 = 1:172,
    bar1 = anno_points(1:172)
  ),
  left_annotation = rowAnnotation(foo2 = 1:26),
  right_annotation = rowAnnotation(bar2 = anno_barplot(1:26)),
  column_title = column_title,
  heatmap_legend_param = heatmap_legend_param
)

起始 oncoPrint() 返回的是 Heatmap 对象,所以,我们可以在水平或竖直方向上添加热图或注释

ht_list <- oncoPrint(
  mat,
  alter_fun = alter_fun,
  col = col,
  column_title = column_title,
  heatmap_legend_param = heatmap_legend_param
) +
  Heatmap(
    matrix(rnorm(nrow(mat) * 10), ncol = 10),
    name = "expr",
    col = colorRamp2(c(-3, 0, 3), c("#8c510a", "white", "#01665e")),
    width = unit(4, "cm")
  )
draw(ht_list)

Logo

永洪科技,致力于打造全球领先的数据技术厂商,具备从数据应用方案咨询、BI、AIGC智能分析、数字孪生、数据资产、数据治理、数据实施的端到端大数据价值服务能力。

更多推荐