For this, a folder that contains one TFBS file per TF in bed or bed.gz format must be given (see details). The folder must also contain a so-called translation table, see the argument translationTable for details. We provide example files for selected supported genome assemblies (hg19, hg38 and mm10) that are fully compatible with GRaNIE as separate downloads. For more information, check https://difftf.readthedocs.io/en/latest/chapter2.html#dir-tfbs.

addTFBS(
  GRN,
  source = "custom",
  motifFolder = NULL,
  TFs = "all",
  translationTable = "translationTable.csv",
  translationTable_sep = " ",
  filesTFBSPattern = "_TFBS",
  fileEnding = ".bed",
  nTFMax = NULL,
  EnsemblVersion = NULL,
  JASPAR_useSpecificTaxGroup = NULL,
  JASPAR_removeAmbiguousTFs = TRUE,
  forceRerun = FALSE,
  ...
)

Arguments

GRN

Object of class GRN

source

Character. One of custom, JASPAR. Default custom. If a custom source is being used, further details about the motif folder and files will be provided (see the other function arguments). If set to JASPAR, the JASPAR2022 database is used.

motifFolder

Character. No default. Only relevant if source = "custom". Path to the folder that contains the TFBS predictions. The files must be in BED format, 6 columns, one file per TF. See the other parameters for more details. The folder must also contain a so-called translation table, see the argument translationTable for details.

TFs

Character vector. Default all. Only relevant if source = "custom". Vector of TF names to include. The special keyword all can be used to include all TF found in the folder as specified by motifFolder. If all is specified anywhere, all TFs will be included. TF names must otherwise match the file names that are found in the folder, without the file suffix.

translationTable

Character. Default translationTable.csv. Only relevant if source = "custom". Name of the translation table file that is also located in the folder along with the TFBS files. This file must have the following structure: at least 2 columns, called ENSEMBL and ID. ID denotes the ID for the TF that is used throughout the pipeline (e.g., AHR) and the prefix of how the corresponding file is called (e.g., AHR.0.B if the file for AHR is called AHR.0.B_TFBS.bed.gz), while ENSEMBL denotes the ENSEMBL ID (dot suffix; e.g., ENSG00000106546, are removed automatically if present).

translationTable_sep

Character. Default " " (white space character). Only relevant if source = "custom". The column separator for the translationTable file.

filesTFBSPattern

Character. Default "_TFBS". Only relevant if source = "custom". Suffix for the file names in the TFBS folder that is not part of the TF name. Can be empty. For example, for the TF CTCF, if the file is called CTCF.all.TFBS.bed, set this parameter to ".all.TFBS".

fileEnding

Character. Default ".bed". Only relevant if source = "custom". File ending for the files from the motif folder.

nTFMax

NULL or Integer[1,]. Default NULL. Maximal number of TFs to import. Can be used for testing purposes, e.g., setting to 5 only imports 5 TFs even though the whole motifFolder has many more TFs defined.

EnsemblVersion

NULL or Character(1). Default NULL. Only relevant if source is not set to custom, ignored otherwise. The Ensembl version to use for the retrieval of gene IDs from their provided database names (e.g., JASPAR) via biomaRt. By default (NULL), the newest version is selected for the most recent genome assembly versions is used (see biomaRt::listEnsemblArchives() for supported versions). This parameter can override this to use a custom (older) version instead.

JASPAR_useSpecificTaxGroup

NULL or Character(1). Default NULL. Should a tax group instead of th specific genome assembly be used for retrieving the TF list? This is useful for genomes that are not human or mouse for which JASPAR otherwise returns too few TFs otherwise. If set to NULL, the specific genome version as provided in the object is used within TFBSTools::getMatrixSet in the opts list for species, while tax_group will be used instead if this argument is not set to NULL. For example, it can be set to vertebrates to use the vertebrates TF collection. For more details, see ?TFBSTools::getMatrixSet.

JASPAR_removeAmbiguousTFs

TRUE or FALSE. Default TRUE. Remove TFs for which the name as provided b JASPAR cannot be mapped uniquely to one and only Ensembl ID? By default (NULL), the newest version is selected (see biomaRt::listEnsemblArchives() for supported versions). This parameter can override this to use a custom (older) version instead.

forceRerun

TRUE or FALSE. Default FALSE. Force execution, even if the GRN object already contains the result. Overwrites the old results.

...

Additional named elements for the opts function argument from ?TFBSTools::getMatrixSet that is used to query the JASPAR database.

Value

An updated GRN object, with additional information added from this function(GRN@annotation$TFs in particular)

Examples

# See the Workflow vignette on the GRaNIE website for examples