Heatmaps offer a compelling way to visualize matrix data, transforming numbers into colors to reveal patterns, clusters, and correlations at a glance. When combined with annotations, they become even more powerful, allowing you to layer contextual information onto your visualization. This guide provides a detailed R code example using the built-in mtcars
dataset to create an informative and visually appealing heatmap with annotations.
mtcars
dataset.pheatmap
package in R for creating customizable and clustered heatmaps easily.mtcars
Dataset?The mtcars
dataset, sourced from the 1974 Motor Trend US magazine, provides data on fuel consumption and 10 aspects of automobile design and performance for 32 cars. It's a classic dataset for statistical analysis and visualization in R.
A heatmap is particularly well-suited for exploring mtcars
because:
mpg
, hp
, wt
, disp
) measured on different scales.cyl
) or transmission type (am
), helping to explain observed patterns.Before creating the heatmap, it's often essential to scale the data. The mtcars
dataset includes variables with vastly different ranges (e.g., hp
ranges from 52 to 335, while wt
is in thousands of pounds). Without scaling, variables with larger values would dominate the color representation, potentially obscuring patterns in variables with smaller values. Scaling (typically to Z-scores, with a mean of 0 and standard deviation of 1) ensures that each variable contributes proportionally to the visualization and clustering.
pheatmap
We'll use the pheatmap
package, a popular choice in R for creating visually appealing and informative heatmaps with built-in support for clustering and annotations. It strikes a good balance between ease of use and customization options.
First, ensure you have the pheatmap
package installed and loaded. Then, load the mtcars
dataset, which is built into R.
# Install pheatmap if you haven't already
# install.packages("pheatmap")
# Load the package
library(pheatmap)
# Load the built-in mtcars dataset
data(mtcars)
Select the numeric columns suitable for the heatmap and scale them.
# mtcars is mostly numeric, but let's explicitly confirm and scale
# Using all columns for this example, but you could subset if needed
mtcars_matrix <- as.matrix(mtcars)
# Scale the data (center and scale to unit variance)
mtcars_scaled <- scale(mtcars_matrix)
We'll create annotations for the rows (cars) based on the number of cylinders (cyl
) and transmission type (am
). Annotations are provided as a data frame where row names must match the row names of the data matrix (mtcars_scaled
in this case).
# Create a data frame for row annotations
annotation_row_df <- data.frame(
Cylinders = factor(mtcars$cyl), # Convert 'cyl' to a factor
Transmission = factor(mtcars$am, labels = c("Automatic", "Manual")) # Convert 'am' to a factor with labels
)
# IMPORTANT: Set row names of the annotation data frame to match the data matrix
rownames(annotation_row_df) <- rownames(mtcars)
Specify colors for different levels within your annotation factors. This makes the annotations much easier to interpret.
# Create a list defining colors for each annotation level
annotation_colors_list <- list(
Cylinders = c("4" = "#66c2a5", "6" = "#fc8d62", "8" = "#8da0cb"), # Assign colors to 4, 6, 8 cylinders
Transmission = c("Automatic" = "#e78ac3", "Manual" = "#a6d854") # Assign colors to transmission types
)
Now, use the pheatmap()
function with the prepared data and annotations.
# Generate the heatmap
pheatmap(mtcars_scaled,
main = "Annotated Heatmap of Scaled mtcars Dataset", # Title
annotation_row = annotation_row_df, # Add row annotations
annotation_colors = annotation_colors_list, # Apply custom annotation colors
cluster_rows = TRUE, # Cluster rows (cars)
cluster_cols = TRUE, # Cluster columns (variables)
show_rownames = TRUE, # Show car names
show_colnames = TRUE, # Show variable names
fontsize_row = 8, # Adjust row label font size
fontsize_col = 10, # Adjust column label font size
color = colorRampPalette(c("navy", "white", "firebrick3"))(50) # Define heatmap color gradient
)
The resulting heatmap visualizes the scaled mtcars
data. Colors typically represent scaled values (e.g., blue for low, red for high). Dendrograms on the sides show how rows (cars) and columns (variables) are clustered based on similarity. The annotation bars alongside the rows provide immediate visual context related to cylinder count and transmission type, allowing you to see if these factors align with the observed clusters.
An example heatmap generated in R, demonstrating clustered data visualization.
Creating an annotated heatmap involves several distinct steps. This mindmap illustrates the typical workflow, from initial data preparation to the final customized visualization.
mtcars
"]
id1b["Load pheatmap
package"]
id2["2. Prepare Data"]
id2a["Select relevant columns (if needed)"]
id2b["Convert to matrix"]
id2c["Scale data (e.g., Z-score)"]
id3["3. Create Annotations"]
id3a["Define annotation data frame"]
id3b["Match row/column names"]
id3c["Define annotation colors (optional)"]
id4["4. Generate Heatmap"]
id4a["Choose package (e.g., pheatmap
)"]
id4b["Call heatmap function"]
id4c["Pass data and annotations"]
id5["5. Customize & Interpret"]
id5a["Adjust colors, labels, clustering"]
id5b["Analyze patterns and clusters"]
id5c["Relate patterns to annotations"]
id5d["Save plot (optional)"]
This structured approach ensures all necessary components are considered, leading to a meaningful and interpretable heatmap visualization.
pheatmap
vs. ComplexHeatmap
While pheatmap
is excellent for many standard heatmap tasks, the ComplexHeatmap
package offers significantly more power and flexibility, especially for complex annotations and multi-heatmap layouts. Here's a comparative overview:
As the chart suggests, pheatmap
excels in ease of use for standard tasks, while ComplexHeatmap
provides superior flexibility for intricate annotations and layouts, albeit with a steeper learning curve. For the task described in the query, pheatmap
is often sufficient and simpler to implement.
The pheatmap
function offers numerous arguments to customize the appearance and behavior of your heatmap. Here are some commonly used options:
Parameter | Description | Example Usage |
---|---|---|
color |
Specifies the color palette for the heatmap body. | color = colorRampPalette(rev(RColorBrewer::brewer.pal(n = 7, name = "RdYlBu")))(100) |
scale |
Specifies if values should be centered and scaled ('row', 'column', or 'none'). | scale = "row" |
cluster_rows / cluster_cols |
Boolean indicating whether to cluster rows or columns. | cluster_rows = FALSE |
clustering_distance_rows / clustering_distance_cols |
Distance measure used for clustering ('correlation', 'euclidean', etc.). | clustering_distance_rows = "correlation" |
clustering_method |
Clustering method used ('ward.D2', 'average', 'complete', etc.). | clustering_method = "ward.D2" |
annotation_col |
Data frame for column annotations (requires matching column names). | annotation_col = col_annot_df |
display_numbers |
Boolean indicating whether to display the numeric values on the heatmap cells. | display_numbers = TRUE |
number_color |
Color for the numbers displayed on cells. | number_color = "black" |
fontsize , fontsize_row , fontsize_col |
Controls the font size for general text, row labels, and column labels. | fontsize = 10, fontsize_row = 7 |
filename |
Specifies a file path to save the heatmap (e.g., "my_heatmap.png"). | filename = "mtcars_heatmap.pdf" |
border_color |
Color of cell borders. Set to NA to remove borders. | border_color = "grey60" |
Refer to the pheatmap
documentation (?pheatmap
in R) for a complete list of options.
This video provides a tutorial on generating correlation heatmaps using the mtcars
dataset in R. While it might focus specifically on correlation or use slightly different packages, the fundamental concepts of visualizing matrix data like mtcars
as a heatmap are clearly demonstrated, offering valuable visual context to the code examples provided here.
Watching how heatmaps are constructed step-by-step can solidify understanding of data preparation, function calls, and interpretation of the final visual output.