Title: | Efficient Estimation of Bayesian SBMs & MLSBMs |
---|---|
Description: | Fit Bayesian stochastic block models (SBMs) and multi-level stochastic block models (MLSBMs) using efficient Gibbs sampling implemented in 'Rcpp'. The models assume symmetric, non-reflexive graphs (no self-loops) with unweighted, binary edges. Data are input as a symmetric binary adjacency matrix (SBMs), or list of such matrices (MLSBMs). |
Authors: | Carter Allen [aut, cre] |
Maintainer: | Carter Allen <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.99.4 |
Built: | 2025-03-05 04:33:39 UTC |
Source: | https://github.com/carter-allen/mlsbm |
A data set containing 3 layers of undirected, symmetric adjacency matrices simulated from an SBM with 3 true clusters
AL
AL
A list of length 3
Function to quickly return credible intervals
col_summarize(MAT, dig = 2, level = 0.95)
col_summarize(MAT, dig = 2, level = 0.95)
MAT |
A matrix |
dig |
Number of digits to round estimates and CrIs to |
level |
Confidence level |
A character vector of posterior estimates and intervals
M <- matrix(rnorm(1000),ncol = 4) col_summarize(M)
M <- matrix(rnorm(1000),ncol = 4) col_summarize(M)
This function allows you to fit multilevel stochastic block models.
fit_mlsbm( A, K, z_init = NULL, a0 = 2, b10 = 1, b20 = 1, n_iter = 1000, burn = 100, verbose = FALSE, r = 1.2 )
fit_mlsbm( A, K, z_init = NULL, a0 = 2, b10 = 1, b20 = 1, n_iter = 1000, burn = 100, verbose = FALSE, r = 1.2 )
A |
An adjacency list of length L, the number of levels. Each level contains an n x n symmetric adjacency matrix. |
K |
The number of clusters specified a priori. |
z_init |
Initialized cluster indicators. If NULL, will initialize automatically with Louvain algorithm. |
a0 |
Dirichlet prior parameter for cluster sizes for clusters 1,...,K. |
b10 |
Beta distribution prior paramter for community connectivity. |
b20 |
Beta distribution prior parameter for community connectivity. |
n_iter |
The number of total MCMC iterations to run. |
burn |
The number of burn-in MCMC iterations to discard. The number of saved iterations will be n_iter - burn. |
verbose |
Whether to print a progress bar to track MCMC progress. Defaults to true. |
r |
Resolution parameter for Louvain initialization. Sould be >= 0 and higher values give a larger number of smaller clusters. |
A list of MCMC samples, including the MAP estimate of cluster indicators (z)
data(AL) # increase n_iter in practice fit <- fit_mlsbm(AL,3,n_iter = 100)
data(AL) # increase n_iter in practice fit <- fit_mlsbm(AL,3,n_iter = 100)
This function allows you to fit single level stochastic block models.
fit_sbm( A, K, z_init = NULL, a0 = 1, b10 = 2, b20 = 2, n_iter = 1000, burn = 100, verbose = FALSE, r = 1.2 )
fit_sbm( A, K, z_init = NULL, a0 = 1, b10 = 2, b20 = 2, n_iter = 1000, burn = 100, verbose = FALSE, r = 1.2 )
A |
An n x n symmetric adjacency matrix. |
K |
The number of clusters specified a priori. |
z_init |
Initialized cluster indicators. If NULL, will initialize automatically with Louvain algorithm. |
a0 |
Dirichlet prior parameter for cluster sizes for clusters 1,...,K. |
b10 |
Beta distribution prior paramter for community connectivity. |
b20 |
Beta distribution prior parameter for community connectivity. |
n_iter |
The number of total MCMC iterations to run. |
burn |
The number of burn-in MCMC iterations to discard. The number of saved iterations will be n_iter - burn. |
verbose |
Whether to print a progress bar to track MCMC progress. Defaults to true. |
r |
Resolution parameter for Louvain initialization. Sould be >= 0 and higher values give a larger number of smaller clusters. |
A list of MCMC samples, including the MAP estimate of cluster indicators (z)
data(AL) fit <- fit_sbm(AL[[1]],3)
data(AL) fit <- fit_sbm(AL[[1]],3)
This function allows you to augment the discrete cell type assignments with continuous propensity and uncertainty scores
get_scores(fit)
get_scores(fit)
fit |
A list returned by fit_sbm() or fit_mlsbm() |
A list with populated entries C_scores (N x K matrix for cell type propensities) and U_scores (N x 1 vector of uncertainty scores)
Simple function to return the mean (95% CrI) for a vector
mean_CRI(y, dig = 2)
mean_CRI(y, dig = 2)
y |
A numeric vector |
dig |
The number of digits to round to |
A string of mean and 95% quantile interval rounded to 'dig'
mean_CRI(rnorm(1000))
mean_CRI(rnorm(1000))
This package fits Bayesian stochastic block models (SBMs)
The mlsbm functions ...
This function allows you to visualize the community structure of cell sub-populations in matrix format via the connectivity parameters of the BANYAN model
plot_connectivity_matrix(fit)
plot_connectivity_matrix(fit)
fit |
A list returned by fit_banyan(). |
A ggplot object
This function allows you to visualize the inferred community structure as a community-community connectivity network
plot_connectivity_network(fit)
plot_connectivity_network(fit)
fit |
A list returned by fit_sbm() or fit_mlsbm() |
A ggplot object
Avoid label switching by re-mapping sampled mixture component labels at each iteration (Peng and Carvhalo 2016).
remap_canonical2(z)
remap_canonical2(z)
z |
A length-n vector of discrete mixture component labels |
A length-n vector of mixture component labels re-mapped to a canonical sub-space
# parameters n <- 10 # number of observations K <- 3 # number of clusters (mixture components) pi <- rep(1/K,K) # cluster membership probability z <- sample(1:K, size = n, replace = TRUE, prob = pi) # cluster indicators z <- remap_canonical2(z)
# parameters n <- 10 # number of observations K <- 3 # number of clusters (mixture components) pi <- rep(1/K,K) # cluster membership probability z <- sample(1:K, size = n, replace = TRUE, prob = pi) # cluster indicators z <- remap_canonical2(z)
This function allows you to sample a multilevel stochastic block model.
sample_mlsbm(z, P, L)
sample_mlsbm(z, P, L)
z |
An n x 1 vector of community labels for each node |
P |
A K x K symmetric matrix of community connectivity probabilities |
L |
The number of levels to sample |
A list of adjecency matrices – one for each level of the MLSBM
n = 100 K = 3 L = 2 pi = rep(1/K,K) z = sample(1:K, size = n, replace = TRUE, prob = pi) p_in = 0.50 p_out = 0.05 P = matrix(p_out, nrow = K, ncol = K) diag(P) = p_in AL = sample_mlsbm(z,P,L)
n = 100 K = 3 L = 2 pi = rep(1/K,K) z = sample(1:K, size = n, replace = TRUE, prob = pi) p_in = 0.50 p_out = 0.05 P = matrix(p_out, nrow = K, ncol = K) diag(P) = p_in AL = sample_mlsbm(z,P,L)
This function allows you to sample a single level stochastic block model.
sample_sbm(z, P)
sample_sbm(z, P)
z |
An n x 1 vector of community labels for each node |
P |
A K x K symmetric matrix of community connectivity probabilities |
An adjacency matrix
n = 100 K = 3 pi = rep(1/K,K) z = sample(1:K, size = n, replace = TRUE, prob = pi) p_in = 0.50 p_out = 0.05 P = matrix(p_out, nrow = K, ncol = K) diag(P) = p_in A = sample_sbm(z,P)
n = 100 K = 3 pi = rep(1/K,K) z = sample(1:K, size = n, replace = TRUE, prob = pi) p_in = 0.50 p_out = 0.05 P = matrix(p_out, nrow = K, ncol = K) diag(P) = p_in A = sample_sbm(z,P)