bartools: scRNA-seq analysis guide
Henrietta Holze
September 05, 2023
bartools_single_cell_analysis.Rmd
Bartools single-cell guide
The bartools
package contains methods to simplify clone
level analyses from single-cell cellular barcoding datasets. The purpose
of this vignette is to highlight these capabilities of
bartools
.
0. Setup
This vignette makes use of a simple test single cell experiment
object (SCE) included in the bartools
package. The dataset
contains 100 cells each with 100 randomly sampled genes. Each cell has
lineage barcode information annotated in the
test.sce$barcode
field.
Dimensionality reduction and clustering were previously performed
using Seurat
. Cluster assignments are in the
test.sce$seurat_clusters
field.
Load the bartools
package
library(bartools)
## Loading required package: edgeR
## Loading required package: limma
## Loading required package: ggplot2
knitr::opts_chunk$set(dev="png")
Load the test.sce
dataset
data(test.sce)
test.sce
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following object is masked from 'package:limma':
##
## plotMA
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
## match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
## Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
##
## Attaching package: 'SingleCellExperiment'
## The following object is masked from 'package:edgeR':
##
## cpm
## class: SingleCellExperiment
## dim: 100 100
## metadata(0):
## assays(3): counts logcounts scaledata
## rownames(100): Mrpl15 Lypla1 ... Tmem131 Cnga3
## rowData names(0):
## colnames(100): GCTACCTAGAGGCCAT ACTGCAAGTGATTCAC ... CGAGTTACAGCGACAA
## TTTATGCGTACAGTTC
## colData names(27): orig.ident nCount_RNA ... seurat_clusters ident
## reducedDimNames(3): PCA TSNE UMAP
## mainExpName: RNA
## altExpNames(1): HTO
1. Analyse dataset metrics using plotMetrics
Single cell RNA sequencing datasets can reveal transcriptional
differences between groups of cells / celltypes that are of biological
importance. The plotMetrics
function takes a single cell
object in Seurat or SingleCellExperiment format and plots a desired
continuous variable (e.g. number of transcripts or genes detected per
cell) split across any groups of interest. These groups could be
clusters, celltypes or, with lineage barcode information, individual
clones. Thus, plotMetrics
allows us to go one step further
and examine biological differences between individual groups of
cells.
Here, the group
parameter defines a grouping variable,
present as a column of metadata in the single cell object. the
factor
parameter defines a continuous variable to plot per
level of the group
parameter. the threshold
parameter defines the minimum number of cells required for each level of
the group
parameter.
plotMetrics - clusters
plotMetrics(test.sce, group = "seurat_clusters", factor = "nCount_RNA", threshold = 10)
Using the trans
parameter, the x axis in
plotMetrics
can be transformed using standard methods
available within R
plotMetrics(test.sce, group = "seurat_clusters", factor = "nCount_RNA", threshold = 10, trans = "log10")
plotMetrics - barcodes
plotMetrics
accepts any grouping variable available in
the sample metadata. Here we examine transcriptional differences between
clones using lineage barcode information.
NB: In this test dataset the number of cells per clone is small.
plotMetrics(test.sce, group = "barcode", factor = "nCount_RNA", threshold = 2)
2. Examine number of cells per grouping variable - plotCellsPerGroup
We may also be interested in basic metrics like how many cells
comprise a grouping variable of interest such as cluster, celltype or
lineage barcode. The plotCellsPerGroup
function allows this
to be easily plotted.
Groups above a user defined threshold are highlighted.
Here, the group
parameter defines a grouping variable,
present as a column of metadata in the single cell object. the
threshold
parameter defines the minimum number of cells.
Levels of the group
parameter above this threshold will be
labelled.
plot cells per cluster
plotCellsPerGroup(test.sce, group = "seurat_clusters", threshold = 5)
plot cells per lineage barcode
plotCellsPerGroup(test.sce, group = "barcode", threshold = 3)
3. Plot distribution of cells across clusters - plotCellsInClusters
We may also be interested in the distribution of cells within certain
groups across levels of another group. For example we may want to reveal
the proportion of cells in a certain cell cycle phase across louvain
clusters within a single cell dataset. The
plotCellsInClusters
function allows users to examine these
questions.
Here, the group
parameter defines a grouping variable,
present as a column of metadata in the single cell object. the
factor
parameter defines a level of group
to
calculate percentage abundance within each level of idents
.
the idents
parameter defines a second grouping variable,
present as a column of metadata in the single cell object.
Cell cycle phase across clusters
plotCellsInClusters(test.sce, group = "Phase", factor = "G1", clusters = "seurat_clusters")
## # A tibble: 7 × 2
## seurat_clusters n
## <fct> <int>
## 1 0 17
## 2 1 15
## 3 2 15
## 4 3 6
## 5 4 4
## 6 5 3
## 7 7 1
## Warning in ggplot2::geom_histogram(stat = "identity"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`
The plotCellsInClusters
framework can extend to any
discrete variables present in the dataset. Here we examine the
representation of lineage barcode BC_12904 across clusters
plotCellsInClusters(test.sce, group = "barcode", factor = "BC_12904", clusters = "seurat_clusters")
## # A tibble: 3 × 2
## seurat_clusters n
## <fct> <int>
## 1 0 2
## 2 1 1
## 3 7 1
## Warning in ggplot2::geom_histogram(stat = "identity"): Ignoring unknown
## parameters: `binwidth`, `bins`, and `pad`
4. Determine enrichment within clusters - plotClusterEnrichment
To check whether a clone is enriched in a cluster or cell type, we
can perform a hypergeometric test using the
plotClusterEnrichment
function. Here we test for enrichment
of cells in G2M cell cycle phase across louvain clusters.
Here, the group
parameter defines a grouping variable,
present as a column of metadata in the single cell object. the
factor
parameter defines a level of the group
parameter to test for enrichment within each level of
idents
. the idents
parameter defines a second
grouping variable to test for enrichment of factor
at each
level. the threshold
parameter defines a p-value threshold
for the hypergeometric test.
plotClusterEnrichment(
test.sce,
group = "Phase",
factor = "G2M",
clusters = "seurat_clusters",
threshold = 0.01,
order = T,
plot = T
)
## The following ident levels had no observations and were removed: 6
## ---
## cluster_0
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 24
## cluster G2M cells: 0
## [1] "Hypergeometric test p-value: 0.424242424242424"
## ---
## cluster_1
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 27
## cluster G2M cells: 0
## [1] "Hypergeometric test p-value: 0.469090909090909"
## ---
## cluster_2
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 28
## cluster G2M cells: 1
## [1] "Hypergeometric test p-value: 0.0763636363636364"
## ---
## cluster_3
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 10
## cluster G2M cells: 1
## [1] "Hypergeometric test p-value: 0.0090909090909091"
## ---
## cluster_4
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 5
## cluster G2M cells: 0
## [1] "Hypergeometric test p-value: 0.0979797979797982"
## ---
## cluster_5
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 5
## cluster G2M cells: 0
## [1] "Hypergeometric test p-value: 0.0979797979797982"
## ---
## cluster_7
## all cells: 100
## all G2M cells: 2
## universe: 98
## cluster total cells: 1
## cluster G2M cells: 0
## [1] "Hypergeometric test p-value: 0.0200000000000001"
## 5. Session Info
## R version 4.2.2 (2022-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS 14.1.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.20.1 SummarizedExperiment_1.28.0
## [3] Biobase_2.58.0 GenomicRanges_1.50.2
## [5] GenomeInfoDb_1.34.9 IRanges_2.32.0
## [7] S4Vectors_0.36.2 BiocGenerics_0.44.0
## [9] MatrixGenerics_1.10.0 matrixStats_1.0.0
## [11] bartools_1.0.0 ggplot2_3.4.4
## [13] edgeR_3.40.2 limma_3.54.2
##
## loaded via a namespace (and not attached):
## [1] viridis_0.6.4 sass_0.4.7 viridisLite_0.4.2
## [4] jsonlite_1.8.7 splines_4.2.2 bslib_0.6.0
## [7] highr_0.10 GenomeInfoDbData_1.2.9 ggrepel_0.9.4
## [10] yaml_2.3.7 pillar_1.9.0 lattice_0.21-9
## [13] glue_1.6.2 digest_0.6.33 XVector_0.38.0
## [16] colorspace_2.1-0 htmltools_0.5.7 Matrix_1.6-1.1
## [19] pkgconfig_2.0.3 zlibbioc_1.44.0 purrr_1.0.2
## [22] scales_1.2.1 tibble_3.2.1 mgcv_1.9-0
## [25] farver_2.1.1 generics_0.1.3 cachem_1.0.8
## [28] withr_2.5.2 cli_3.6.1 magrittr_2.0.3
## [31] memoise_2.0.1 evaluate_0.23 fs_1.6.3
## [34] fansi_1.0.5 nlme_3.1-163 MASS_7.3-60
## [37] vegan_2.6-4 textshaping_0.3.6 tools_4.2.2
## [40] ineq_0.2-13 lifecycle_1.0.4 stringr_1.5.1
## [43] munsell_0.5.0 locfit_1.5-9.8 DelayedArray_0.24.0
## [46] cluster_2.1.4 compiler_4.2.2 pkgdown_2.0.7
## [49] jquerylib_0.1.4 systemfonts_1.0.5 rlang_1.1.2
## [52] grid_4.2.2 RCurl_1.98-1.12 rstudioapi_0.15.0
## [55] labeling_0.4.3 bitops_1.0-7 rmarkdown_2.25
## [58] gtable_0.3.4 R6_2.5.1 gridExtra_2.3
## [61] knitr_1.45 dplyr_1.1.4 fastmap_1.1.1
## [64] utf8_1.2.4 rprojroot_2.0.4 ragg_1.2.5
## [67] permute_0.9-7 desc_1.4.2 stringi_1.8.1
## [70] parallel_4.2.2 Rcpp_1.0.11 vctrs_0.6.4
## [73] tidyselect_1.2.0 xfun_0.41