For each module provided as input, runs PCA and tests informativity of successive PCs to find different activation scores.

run_activity_analysis(
  expr_mat,
  modules_list,
  norm = FALSE,
  nb_comp_max = 5,
  min_cells_pct = 0.05,
  nCores = 1,
  min_module_size = 10,
  max_contrib = 0.5,
  scale_before_pca = T,
  all_PCs_in_range = F
)

Arguments

expr_mat

Numeric matrix, with genes as rows and cells as columns. This matrix should be normalized. If it is not, set parameter norm to TRUE to perform logCPM normalization.

modules_list

List, with module names as list names and character vectors containing genes in module. Output from read_gmt file for example.

norm

Boolean, set to TRUE if your data is not normalized. LogCPM normalization will be performed.

nb_comp_max

Integer, maximum number of ways a module can be activated.

min_cells_pct

Numeric between 0 and 1, minimum percentage of cells that should be above informativity threshold to consider activity score interesting.

nCores

Number of cores to use. Set to 1 if working in a Windows environment, otherwise you can use the function detectCores() to find out how many cores are available.

min_module_size

Minimum size of the gene set for which it makes sense to compute activity. Default is 10.

Value

List, for each module, stores activity scores (raw or scaled between 0 and 1) from informative PCs, gene contributions to each informative PC, variance explained by each PC, threshold used for informativity and number of cells above threshold.