Configuration file for data encryption
This is the description of the JSON configuration file for a File Utilities data encryption data operation.
The configuration file is in JSON format. It contains the following sections:
Global parameters: General information about the data operation.
Tasks parameters: One or several task blocks, containing information about the specific data operation.
Credential parameters: Information about the credentials for the buckets and the PGP public key.
👁️🗨️ Example
Here is an example of File Utilities configuration file for data encryption:
🌐 Global parameters
Parameter | Description |
---|---|
$schema type: string optional | The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues. |
configuration_type type: string mandatory | Type of data operation. For an File Utilities data operation, the value is always "file-utilities" |
configuration_id type: string mandatory | ID of the data operation. You can pick any name you want, but is has to be unique for this data operation type. Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to include in your data operation name:
|
environment type: string mandatory | Deployment context. Values: PROD, PREPROD, STAGING, DEV. |
account type: string mandatory | Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator. |
activated type: boolean optional | Flag used to enable/disable the execution of the data operation. Default value: true |
archived type: boolean optional | Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio. Default value: false |
doc_md type: string optional | Path to a file containing a detailed description of the data operation. The file must be in Markdown format. |
version type: string mandatory | Use only version 2, version 1 is depreciated. |
gcp_project_id type: string mandatory | Set the project where deploy the configuration and the associated cloud functions. If not set, the user will be prompted to choose a project id. |
gcs_bucket type: string mandatory | Name of the bucket. |
gcs_path type: string mandatory | Path where the files will be found, e.g. "some/sub/dir". |
gcs_destination_suffix type: string mandatory | Google Cloud Storage destination path, e.g. "/subdir/subdir_2" to send the files to "gs://BUCKET/subdir/subdir_2/source_file.ext" |
launch_mode type: string mandatory | Choice of triggering system. Choose "gcs" to trigger the operation on file creation on a bucket. Futur modes will be implemented. |
filename_templates type: string mandatory | List of filename templates that will be processed. You can set the value to "*" for all files to be copied. However, this is not recommended, as unnecessary or sensitive files might be included by mistake. Besides, the date value specified in filename_template will be used to sort files in the archive folder. If no date value is specified, all files will be stored together under one folder named /ALL. The best practice is to specify one or more filename templates with the filename_template and file_description parameters as described in the next paragraphe. |
task_dependencies type: array of strings mandatory | The task_dependencies parameter allows you to create dependencies between the different tasks specified in the workflow parameter (see below). It will define in which order the workflow tasks will run, some of them running concurrently, others sequentially. Syntax
For detailed information about the syntax, refer to the Airflow documentation. Example 1 We have the following tasks that we want to run sequentially: taskA (create_gbq_table), taskB (sql) and taskC (copy_gbq_table).
The task_dependencies parameter will be as follows: Example 2 We have the following tasks that we want to run concurrently: taskA, taskB and taskC. The task_dependencies parameter will be as follows: Example 3 We have the following 9 tasks we want to order: taskA, taskD, taskG (create_gbq_table), taskB, taskE, taskH (sql), taskC, taskF, taskI (copy_gbq_table).
The task_dependencies parameter will be as follows: Example 4 In the example above, we want taskH to run before taskE so we can use its result for taskE. The task_dependencies parameter will be as follows:
|
credentials type:array mandatory | Encrypted credentials needed to read/write data from the source bucket. You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page. |
🖥️ PGP encrypt task parameters
Information related to the Google Cloud Compute Engine VM where the script will be executed.
Parameter | Description |
---|---|
task_id type: string mandatory | ID of the task. It must be unique within the data operation. |
task_type type: string mandatory | The value has to be set to "pgp" for this task type. |
pgp_mode type: string mandatory | PGP mode. For data encryption, the value is always "encrypt". |
public_key.pgp type:array mandatory in encrypt mode | Encrypted public key. This array contains two entities: - the recipient "username" of the public key - the content "schema" credentials after passing it through tailer encrypt. |
Last updated