This function marks genes and/or peaks as filtered depending on the chosen filtering criteria and is based on the count data AFTER potential normalization as chosen when using the addData function. Most of the filters may not be meaningful and useful anymore to apply after using particular normalization schemes that can give rise to, for example, negative values such as cyclic loess normalization. If normalized counts do not represents counts anymore but rather a deviation from a mean or something a like, the filtering critieria usually do not make sense anymore. Filtered genes / peaks will then be disregarded when adding connections in subsequent steps via addConnections_TF_peak and addConnections_peak_gene. This function does NOT (re)filter existing connections when the GRN object already contains connections. Thus, upon re-execution of this function with different filtering criteria, all downstream steps have to be re-run.

filterData(
  GRN,
  minNormalizedMean_peaks = NULL,
  maxNormalizedMean_peaks = NULL,
  minNormalizedMeanRNA = NULL,
  maxNormalizedMeanRNA = NULL,
  chrToKeep_peaks = NULL,
  minSize_peaks = 20,
  maxSize_peaks = 10000,
  minCV_peaks = NULL,
  maxCV_peaks = NULL,
  minCV_genes = NULL,
  maxCV_genes = NULL,
  forceRerun = FALSE
)

Arguments

GRN

Object of class GRN

minNormalizedMean_peaks

Numeric[0,] or NULL. Default 5. Minimum mean across all samples for a peak to be retained for the normalized counts table. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

maxNormalizedMean_peaks

Numeric[0,] or NULL. Default NULL. Maximum mean across all samples for a peak to be retained for the normalized counts table. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

minNormalizedMeanRNA

Numeric[0,] or NULL. Default 5. Minimum mean across all samples for a gene to be retained for the normalized counts table. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

maxNormalizedMeanRNA

Numeric[0,] or NULL. Default NULL. Maximum mean across all samples for a gene to be retained for the normalized counts table. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

chrToKeep_peaks

Character vector or NULL. Default NULL. Vector of chromosomes that peaks are allowed to come from. This filter can be used to filter sex chromosomes from the peaks, for example (e.g, c(paste0("chr", 1:22), "chrX", "chrY"))

minSize_peaks

Integer[1,] or NULL. Default 20. Minimum peak size (width, end - start) for a peak to be retained. Set to NULL for not applying the filter.

maxSize_peaks

Integer[1,] or NULL. Default 10000. Maximum peak size (width, end - start) for a peak to be retained. Set to NULL for not applying the filter.

minCV_peaks

Numeric[0,] or NULL. Default NULL. Minimum CV (coefficient of variation, a unitless measure of variation) for a peak to be retained. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

maxCV_peaks

Numeric[0,] or NULL. Default NULL. Maximum CV (coefficient of variation, a unitless measure of variation) for a peak to be retained. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

minCV_genes

Numeric[0,] or NULL. Default NULL. Minimum CV (coefficient of variation, a unitless measure of variation) for a gene to be retained. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

maxCV_genes

Numeric[0,] or NULL. Default NULL. Maximum CV (coefficient of variation, a unitless measure of variation) for a gene to be retained. Set to NULL for not applying the filter. Be aware that depending on the chosen normalization, this filter may not make sense and should NOT be applied. See the notes for this function.

forceRerun

TRUE or FALSE. Default FALSE. Force execution, even if the GRN object already contains the result. Overwrites the old results.

Value

An updated GRN object, with added data from this function.

Details

All this function does is setting (or modifying) the filtering flag in GRN@data$peaks$counts_metadata and GRN@data$RNA$counts_metadata, respectively.

Examples

# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> INFO [2023-08-16 17:28:02] Storing GRN@data$RNA$counts matrix as sparse matrix because fraction of 0s is > 0.1 (0.44)
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = filterData(GRN, forceRerun = FALSE)
#> INFO [2023-08-16 17:28:02] FILTER PEAKS
#> INFO [2023-08-16 17:28:02]  Number of peaks before filtering : 75000
#> INFO [2023-08-16 17:28:02]   Filter peaks by CV: Min = 0
#> INFO [2023-08-16 17:28:02]  Number of peaks after filtering : 75000
#> INFO [2023-08-16 17:28:02]  Finished successfully. Execution time: 0.1 secs
#> INFO [2023-08-16 17:28:02] Filter and sort peaks by size and remain only those bigger than 20 and smaller than 10000
#> INFO [2023-08-16 17:28:02]  Number of peaks before filtering: 75000
#> INFO [2023-08-16 17:28:02]  Number of peaks after filtering : 75000
#> INFO [2023-08-16 17:28:02]  Finished successfully. Execution time: 0.1 secs
#> INFO [2023-08-16 17:28:02] Collectively, filter 0 out of 75000 peaks.
#> INFO [2023-08-16 17:28:02] Number of remaining peaks: 75000
#> INFO [2023-08-16 17:28:02] FILTER RNA-seq
#> INFO [2023-08-16 17:28:02]  Number of genes before filtering : 61534
#> INFO [2023-08-16 17:28:02]   Filter genes by CV: Min = 0
#> INFO [2023-08-16 17:28:02]   Filter genes by mean:
#> INFO [2023-08-16 17:28:02]  Number of genes after filtering : 27005
#> INFO [2023-08-16 17:28:02]  Finished successfully. Execution time: 0.1 secs
#> INFO [2023-08-16 17:28:02]  Flagged 8056 rows due to filtering criteria
#> INFO [2023-08-16 17:28:02] Finished successfully. Execution time: 0.6 secs