This functions calls filterGRNAndConnectGenes repeatedly and stores the total number of connections and other statistics each time to summarize them afterwards. All arguments are identical to the ones in filterGRNAndConnectGenes, see the help for this function for details. The function plot_stats_connectionSummary can be used afterwards for plotting.

generateStatsSummary(
  GRN,
  TF_peak.fdr = c(0.001, 0.01, 0.05, 0.1, 0.2),
  TF_peak.connectionTypes = "all",
  peak_gene.fdr = c(0.001, 0.01, 0.05, 0.1, 0.2),
  peak_gene.r_range = c(0, 1),
  gene.types = c("protein_coding"),
  allowMissingGenes = c(FALSE, TRUE),
  allowMissingTFs = c(FALSE),
  forceRerun = FALSE
)

Arguments

GRN

Object of class GRN

TF_peak.fdr

Numeric vector[0,1]. Default c(0.001, 0.01, 0.05, 0.1, 0.2). TF-peak FDR values to iterate over.

TF_peak.connectionTypes

Character vector. Default all. TF-peak connection types to consider. The special keyword all denotes all connection types (e.g., expression and TFActivity) that are found in the GRN object. By default, only expression is present in the object, so all and expression are usually equivalent unless calculation of TF-peak links based on TF activity has also been enabled.

peak_gene.fdr

Numeric vector[0,1]. Default c(0.001, 0.01, 0.05, 0.1, 0.2). Peak-gene FDR values to iterate over.

peak_gene.r_range

Numeric vector of length 2[-1,1]. Default c(0,1). The correlation range of peak-gene connections to keep.

gene.types

Character vector of supported gene types. Default c("protein_coding", "lincRNA"). Filter for gene types to retain, genes with gene types not listed here are filtered. The special keyword "all" indicates no filter and retains all gene types. The specified names must match the names as stored in the GRN object (see GRN@annotation$genes$gene.type) and correspond 1:1 to the gene type names as provided by biomaRt, with the exception of lncRNAs, which is internally renamed to lincRNAs when first fetching all gene types. This is done due to a recent change in biomaRt and aims at keeping backwards compatibility with GRN objects.

allowMissingGenes

Logical vector of length 1 or 2. Default c(FALSE, TRUE). Allow genes to be missing for peak-gene connections? If both FALSE and TRUE are given, the code loops over both

allowMissingTFs

Logical vector of length 1 or 2. Default c(FALSE). Allow TFs to be missing for TF-peak connections? If both FALSE and TRUE are given, the code loops over both

forceRerun

TRUE or FALSE. Default FALSE. Force execution, even if the GRN object already contains the result. Overwrites the old results.

Value

An updated GRN object, with additional information added from this function.

Examples

# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> INFO [2023-08-16 17:28:10] Storing GRN@data$RNA$counts matrix as sparse matrix because fraction of 0s is > 0.1 (0.44)
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = generateStatsSummary(GRN, TF_peak.fdr = c(0.01, 0.1), peak_gene.fdr = c(0.01, 0.1))
#> INFO [2023-08-16 17:28:10] Generating summary. This may take a while...
#> INFO [2023-08-16 17:28:10] 
#> Real data...
#> 
#> INFO [2023-08-16 17:28:10] Calculate network stats for TF-peak FDR of 0.01
#> INFO [2023-08-16 17:28:15] Calculate network stats for TF-peak FDR of 0.1
#> INFO [2023-08-16 17:28:20] 
#> Permuted data...
#> 
#> INFO [2023-08-16 17:28:20] Calculate network stats for TF-peak FDR of 0.01
#> INFO [2023-08-16 17:28:24] Calculate network stats for TF-peak FDR of 0.1
#> INFO [2023-08-16 17:28:29] Finished successfully. Execution time: 18.7 secs