GRN
objectaddConnections_peak_gene.Rd
After the execution of this function, QC plots can be plotted with the function plotDiagnosticPlots_peakGene
unless this has already been done by default due to plotDiagnosticPlots = TRUE
addConnections_peak_gene(
GRN,
overlapTypeGene = "TSS",
corMethod = "pearson",
promoterRange = 250000,
TADs = NULL,
TADs_mergeOverlapping = FALSE,
shuffleRNACounts = TRUE,
nCores = 4,
plotDiagnosticPlots = TRUE,
plotGeneTypes = list(c("all"), c("protein_coding")),
outputFolder = NULL,
forceRerun = FALSE
)
Object of class GRN
Character. "TSS"
or "full"
. Default "TSS"
. If set to "TSS"
, only the TSS of the gene is used as reference for finding genes in the neighborhood of a peak. If set to "full"
, the whole annotated gene (including all exons and introns) is used instead.
Character. One of pearson
, spearman
or bicor
. Default pearson
. Method for calculating the correlation coefficient.
For pearson
and spearman
, see cor for details. bicor
denotes the *biweight midcorrelation*, a correlation measure based on medians as
calculated by WGCNA::bicorAndPvalue
. Both spearman
and bicor
are considered more robust measures that are less prone to be affected by outliers.
Integer >=0. Default 250000. The size of the neighborhood in bp to correlate peaks and genes in vicinity. Only peak-gene pairs will be correlated if they are within the specified range. Increasing this value leads to higher running times and more peak-gene pairs to be associated, while decreasing results in the opposite.
Data frame with TAD domains. Default NULL
. If provided, the neighborhood of a peak is defined by the TAD domain the peak is in rather than a fixed-sized neighborhood. The expected format is a BED-like data frame with at least 3 columns in this particular order: chromosome, start, end, the 4th column is optional and will be taken as ID column. All additional columns as well as column names are ignored. For the first 3 columns, the type is checked as part of a data integrity check.
TRUE
or FALSE
. Default FALSE
. Should overlapping TADs be merged? Only relevant if TADs are provided.
TRUE
or FALSE
. Default TRUE
. Should the RNA sample labels be shuffled in addition to
testing random peak-gene pairs for the background? When set to FALSE
, only peak-gene pairs are shuffled, but
for each pair, the counts from peak and RNA that are correlated are matched (i.e., sample 1 counts from peak data are compared to sample 1 counts from RNA).
If set to TRUE
, however, the RNA sample labels are in addition shuffled so that sample 1 counts from peak data are compared to sample 4 data from RNA, for example.
Shuffling truly randomizes the resulting background eGRN. Note that this parameter and its influence is still being investigated. Until version 1.0.7, this parameter (although not existent explicitly)
was implicitly set to TRUE
.
Integer >0. Default 1. Number of cores to use.
A value >1 requires the BiocParallel
package (as it is listed under Suggests
, it may not be installed yet).
TRUE
or FALSE
. Default TRUE
. Run and plot various diagnostic plots? If set to TRUE
, PDF files will be produced and saved in the output directory (in a subfolder called plots
).
List of character vectors. Default list(c("all"), c("protein_coding"))
. Each list element may consist of one or multiple gene types that are plotted collectively in one PDF. The special keyword "all"
denotes all gene types that are found (be aware: this typically contains 20+ gene types, see https://www.gencodegenes.org/pages/biotypes.html for details).
Character or NULL
. Default NULL
. If set to NULL
, the default output folder as specified when initiating the
object in initializeGRN
will be used. Otherwise, all output from this function will be put into the specified folder.
If a folder is provided, while we recommend specifying an absolute path, a relative one also works.
TRUE
or FALSE
. Default FALSE
. Force execution, even if the GRN object already contains the result. Overwrites the old results.
An updated GRN
object, with additional information added from this function.
# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> INFO [2023-08-16 17:26:50] Storing GRN@data$RNA$counts matrix as sparse matrix because fraction of 0s is > 0.1 (0.44)
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = addConnections_peak_gene(GRN, promoterRange=10000, plotDiagnosticPlots = FALSE)
#> INFO [2023-08-16 17:26:50] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2023-08-16 17:26:50] Finished successfully. Execution time: 0 secs