GRN
objectfilterData.Rd
This function marks genes and/or peaks as filtered
depending on the chosen filtering criteria and is based on the count data AFTER
potential normalization as chosen when using the addData
function. Most of the filters may not be meaningful and useful anymore to apply
after using particular normalization schemes that can give rise to, for example, negative values such as cyclic loess normalization. If normalized counts do
not represents counts anymore but rather a deviation from a mean or something a like, the filtering critieria usually do not make sense anymore.
Filtered genes / peaks will then be disregarded when adding connections in subsequent steps via addConnections_TF_peak
and addConnections_peak_gene
. This function does NOT (re)filter existing connections when the GRN
object already contains connections. Thus, upon re-execution of this function with different filtering criteria, all downstream steps have to be re-run.
filterData(
GRN,
minNormalizedMean_peaks = NULL,
maxNormalizedMean_peaks = NULL,
minNormalizedMeanRNA = NULL,
maxNormalizedMeanRNA = NULL,
chrToKeep_peaks = NULL,
minSize_peaks = 20,
maxSize_peaks = 10000,
minCV_peaks = NULL,
maxCV_peaks = NULL,
minCV_genes = NULL,
maxCV_genes = NULL,
forceRerun = FALSE
)
Object of class GRN
Numeric[0,] or NULL
. Default 5. Minimum mean across all samples for a peak to be retained for the normalized counts table. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Numeric[0,] or NULL
. Default NULL
. Maximum mean across all samples for a peak to be retained for the normalized counts table. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Numeric[0,] or NULL
. Default 5. Minimum mean across all samples for a gene to be retained for the normalized counts table. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Numeric[0,] or NULL
. Default NULL
. Maximum mean across all samples for a gene to be retained for the normalized counts table. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Character vector or NULL
. Default NULL
. Vector of chromosomes that peaks are allowed to come from. This filter can be used to filter sex chromosomes from the peaks, for example (e.g, c(paste0("chr", 1:22), "chrX", "chrY")
)
Integer[1,] or NULL
. Default 20. Minimum peak size (width, end - start) for a peak to be retained. Set to NULL
for not applying the filter.
Integer[1,] or NULL
. Default 10000. Maximum peak size (width, end - start) for a peak to be retained. Set to NULL
for not applying the filter.
Numeric[0,] or NULL
. Default NULL
. Minimum CV (coefficient of variation, a unitless measure of variation) for a peak to be retained. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Numeric[0,] or NULL
. Default NULL
. Maximum CV (coefficient of variation, a unitless measure of variation) for a peak to be retained. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Numeric[0,] or NULL
. Default NULL
. Minimum CV (coefficient of variation, a unitless measure of variation) for a gene to be retained. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
Numeric[0,] or NULL
. Default NULL
. Maximum CV (coefficient of variation, a unitless measure of variation) for a gene to be retained. Set to NULL
for not applying the filter.
Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.
TRUE
or FALSE
. Default FALSE
. Force execution, even if the GRN object already contains the result. Overwrites the old results.
An updated GRN
object, with added data from this function.
All this function does is setting (or modifying) the filtering flag in GRN@data$peaks$counts_metadata
and GRN@data$RNA$counts_metadata
, respectively.
# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> INFO [2023-08-16 17:28:02] Storing GRN@data$RNA$counts matrix as sparse matrix because fraction of 0s is > 0.1 (0.44)
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = filterData(GRN, forceRerun = FALSE)
#> INFO [2023-08-16 17:28:02] FILTER PEAKS
#> INFO [2023-08-16 17:28:02] Number of peaks before filtering : 75000
#> INFO [2023-08-16 17:28:02] Filter peaks by CV: Min = 0
#> INFO [2023-08-16 17:28:02] Number of peaks after filtering : 75000
#> INFO [2023-08-16 17:28:02] Finished successfully. Execution time: 0.1 secs
#> INFO [2023-08-16 17:28:02] Filter and sort peaks by size and remain only those bigger than 20 and smaller than 10000
#> INFO [2023-08-16 17:28:02] Number of peaks before filtering: 75000
#> INFO [2023-08-16 17:28:02] Number of peaks after filtering : 75000
#> INFO [2023-08-16 17:28:02] Finished successfully. Execution time: 0.1 secs
#> INFO [2023-08-16 17:28:02] Collectively, filter 0 out of 75000 peaks.
#> INFO [2023-08-16 17:28:02] Number of remaining peaks: 75000
#> INFO [2023-08-16 17:28:02] FILTER RNA-seq
#> INFO [2023-08-16 17:28:02] Number of genes before filtering : 61534
#> INFO [2023-08-16 17:28:02] Filter genes by CV: Min = 0
#> INFO [2023-08-16 17:28:02] Filter genes by mean:
#> INFO [2023-08-16 17:28:02] Number of genes after filtering : 27005
#> INFO [2023-08-16 17:28:02] Finished successfully. Execution time: 0.1 secs
#> INFO [2023-08-16 17:28:02] Flagged 8056 rows due to filtering criteria
#> INFO [2023-08-16 17:28:02] Finished successfully. Execution time: 0.6 secs