After the execution of this function, QC plots can be plotted with the function plotDiagnosticPlots_peakGene unless this has already been done by default due to plotDiagnosticPlots = TRUE

addConnections_peak_gene(
  GRN,
  overlapTypeGene = "TSS",
  corMethod = "pearson",
  promoterRange = 250000,
  TADs = NULL,
  TADs_mergeOverlapping = FALSE,
  shuffleRNACounts = TRUE,
  nCores = 4,
  plotDiagnosticPlots = TRUE,
  plotGeneTypes = list(c("all"), c("protein_coding")),
  outputFolder = NULL,
  forceRerun = FALSE
)

Arguments

GRN

Object of class GRN

overlapTypeGene

Character. "TSS" or "full". Default "TSS". If set to "TSS", only the TSS of the gene is used as reference for finding genes in the neighborhood of a peak. If set to "full", the whole annotated gene (including all exons and introns) is used instead.

corMethod

Character. One of pearson, spearman or bicor. Default pearson. Method for calculating the correlation coefficient. For pearson and spearman , see cor for details. bicor denotes the *biweight midcorrelation*, a correlation measure based on medians as calculated by WGCNA::bicorAndPvalue. Both spearman and bicor are considered more robust measures that are less prone to be affected by outliers.

promoterRange

Integer >=0. Default 250000. The size of the neighborhood in bp to correlate peaks and genes in vicinity. Only peak-gene pairs will be correlated if they are within the specified range. Increasing this value leads to higher running times and more peak-gene pairs to be associated, while decreasing results in the opposite.

TADs

Data frame with TAD domains. Default NULL. If provided, the neighborhood of a peak is defined by the TAD domain the peak is in rather than a fixed-sized neighborhood. The expected format is a BED-like data frame with at least 3 columns in this particular order: chromosome, start, end, the 4th column is optional and will be taken as ID column. All additional columns as well as column names are ignored. For the first 3 columns, the type is checked as part of a data integrity check.

TADs_mergeOverlapping

TRUE or FALSE. Default FALSE. Should overlapping TADs be merged? Only relevant if TADs are provided.

shuffleRNACounts

TRUE or FALSE. Default TRUE. Should the RNA sample labels be shuffled in addition to testing random peak-gene pairs for the background? When set to FALSE, only peak-gene pairs are shuffled, but for each pair, the counts from peak and RNA that are correlated are matched (i.e., sample 1 counts from peak data are compared to sample 1 counts from RNA). If set to TRUE, however, the RNA sample labels are in addition shuffled so that sample 1 counts from peak data are compared to sample 4 data from RNA, for example. Shuffling truly randomizes the resulting background eGRN. Note that this parameter and its influence is still being investigated. Until version 1.0.7, this parameter (although not existent explicitly) was implicitly set to TRUE.

nCores

Integer >0. Default 1. Number of cores to use. A value >1 requires the BiocParallel package (as it is listed under Suggests, it may not be installed yet).

plotDiagnosticPlots

TRUE or FALSE. Default TRUE. Run and plot various diagnostic plots? If set to TRUE, PDF files will be produced and saved in the output directory (in a subfolder called plots).

plotGeneTypes

List of character vectors. Default list(c("all"), c("protein_coding")). Each list element may consist of one or multiple gene types that are plotted collectively in one PDF. The special keyword "all" denotes all gene types that are found (be aware: this typically contains 20+ gene types, see https://www.gencodegenes.org/pages/biotypes.html for details).

outputFolder

Character or NULL. Default NULL. If set to NULL, the default output folder as specified when initiating the object in initializeGRN will be used. Otherwise, all output from this function will be put into the specified folder. If a folder is provided, while we recommend specifying an absolute path, a relative one also works.

forceRerun

TRUE or FALSE. Default FALSE. Force execution, even if the GRN object already contains the result. Overwrites the old results.

Value

An updated GRN object, with additional information added from this function.

Examples

# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> INFO [2023-08-16 17:26:50] Storing GRN@data$RNA$counts matrix as sparse matrix because fraction of 0s is > 0.1 (0.44)
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = addConnections_peak_gene(GRN, promoterRange=10000, plotDiagnosticPlots = FALSE)
#> INFO [2023-08-16 17:26:50] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2023-08-16 17:26:50] Finished successfully. Execution time: 0 secs