Configuration file for data encryption
This is the description of the JSON configuration file for a File Utilities data encryption data operation.
The configuration file is in JSON format. It contains the following sections:
Global parameters: General information about the data operation.
Tasks parameters: One or several task blocks, containing information about the specific data operation.
Credential parameters: Information about the credentials for the buckets and the PGP public key.
👁️🗨️ Example
Here is an example of File Utilities configuration file for data encryption:
🌐 Global parameters
$schema
type: string
optional
The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.
configuration_type
type: string
mandatory
Type of data operation.
For an File Utilities data operation, the value is always "file-utilities"
configuration_id
type: string
mandatory
ID of the data operation.
You can pick any name you want, but is has to be unique for this data operation type.
Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to include in your data operation name:
your account ID
the source bucket
the source path
environment
type: string
mandatory
Deployment context.
Values: PROD, PREPROD, STAGING, DEV.
account
type: string
mandatory
Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.
activated
type: boolean
optional
Flag used to enable/disable the execution of the data operation.
Default value: true
archived
type: boolean
optional
Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.
Default value: false
doc_md
type: string
optional
Path to a file containing a detailed description of the data operation. The file must be in Markdown format.
version
type: string
mandatory
Use only version 2, version 1 is depreciated.
gcp_project_id
type: string
mandatory
Set the project where deploy the configuration and the associated cloud functions.
If not set, the user will be prompted to choose a project id.
gcs_bucket
type: string
mandatory
Name of the bucket.
gcs_path
type: string
mandatory
Path where the files will be found, e.g. "some/sub/dir".
gcs_destination_suffix
type: string
mandatory
Google Cloud Storage destination path, e.g. "/subdir/subdir_2" to send the files to "gs://BUCKET/subdir/subdir_2/source_file.ext"
launch_mode
type: string
mandatory
Choice of triggering system. Choose "gcs" to trigger the operation on file creation on a bucket. Futur modes will be implemented.
filename_templates
type: string
mandatory
List of filename templates that will be processed.
You can set the value to "*" for all files to be copied. However, this is not recommended, as unnecessary or sensitive files might be included by mistake. Besides, the date value specified in filename_template will be used to sort files in the archive folder. If no date value is specified, all files will be stored together under one folder named /ALL.
The best practice is to specify one or more filename templates with the filename_template and file_description parameters as described in the next paragraphe.
task_dependencies
type: array of strings
mandatory
The task_dependencies parameter allows you to create dependencies between the different tasks specified in the workflow parameter (see below). It will define in which order the workflow tasks will run, some of them running concurrently, others sequentially.
Syntax
The double chevron
>>
means that the first task needs to be completed before the next one can start.The comma
,
means that the tasks will run concurrently.The square brackets
[
and]
allow you to define a set of tasks that will run together.
For detailed information about the syntax, refer to the Airflow documentation.
Example 1
We have the following tasks that we want to run sequentially: taskA (create_gbq_table), taskB (sql) and taskC (copy_gbq_table).
The task_dependencies parameter will be as follows: "task_dependencies": [" taskA >> taskB >> taskC "],
Example 2
We have the following tasks that we want to run concurrently: taskA, taskB and taskC.
The task_dependencies parameter will be as follows: "task_dependencies": [" taskA, taskB, taskC "],
Example 3
We have the following 9 tasks we want to order: taskA, taskD, taskG (create_gbq_table), taskB, taskE, taskH (sql), taskC, taskF, taskI (copy_gbq_table).
The task_dependencies parameter will be as follows:"task_dependencies": [" [taskA, taskD, taskG] >> [taskB, taskE, taskH] >> [taskC, taskF, taskI] "],
Example 4
In the example above, we want taskH to run before taskE so we can use its result for taskE.
The task_dependencies parameter will be as follows:
"task_dependencies": [" [taskA, taskD, taskG] >> taskH >> [taskB, taskE] >> [taskC, taskF, taskI] "],
credentials
type:array
mandatory
Encrypted credentials needed to read/write data from the source bucket.
You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.
🖥️ PGP encrypt task parameters
Information related to the Google Cloud Compute Engine VM where the script will be executed.
task_id
type: string
mandatory
ID of the task. It must be unique within the data operation.
task_type
type: string
mandatory
The value has to be set to "pgp" for this task type.
pgp_mode
type: string
mandatory
PGP mode.
For data encryption, the value is always "encrypt".
public_key.pgp
type:array
mandatory in encrypt mode
Encrypted public key. This array contains two entities: - the recipient "username" of the public key - the content "schema" credentials after passing it through tailer encrypt.
Last updated