Configuration file for data encryption

This is the description of the JSON configuration file for a File Utilities data encryption data operation.

The configuration file is in JSON format. It contains the following sections:

  • Global parameters: General information about the data operation.

  • Tasks parameters: One or several task blocks, containing information about the specific data operation.

  • Credential parameters: Information about the credentials for the buckets and the PGP public key.

👁️‍🗨️ Example

Here is an example of File Utilities configuration file for data encryption:

{
    "$schema": "http://jsonschema.tailer.ai/schema/file-utilities-veditor",
    "configuration_type": "file-utilities",
    "configuration_id": "000010-file-utilities-demo",
    "environment": "DEV",
    "account": "000099",
    "activated": true,
    "archived": false,
    "version": "2",
    "doc_md": "readme.md",
    "gcp_project_id": "my-project",
    "gcs_bucket": "my-bucket",
    "gcs_path": "output",
    "gcs_destination_suffix": "output_encrypt",
    "launch_mode": "gcs",
    "filename_templates": [{
            "filename_template": "{{FD_DATE}}_PRODUITS-{{FD_BLOB_12}}.csv",
            "file_description": "Product data from demo system"
        },
        {
            "filename_template": "{{FD_DATE}}_SITES-{{FD_BLOB_12}}.csv",
            "file_description": "Site data from demo system"
        }
    ],
    "task_dependencies": [
        "pgp_encrypt"
    ],
    "tasks": [{
        "task_id": "pgp_encrypt",
        "task_type": "pgp",
        "pgp_mode": "encrypt",
        "public_key.pgp": {
            "recipient": "me@my-domain.com",
            "content": {
                "cipher_aes": "f7f...",
                "tag": "a0c...",
                "ciphertext": "cag...",
                "enc_session_key": "a3f..."
            }
        }
    }],
    "credentials": {
        "gcp-credentials.json": {
            "content": {
                "cipher_aes": "gf5...", 
                "tag": "cvh...", 
                "ciphertext": "4et...", 
                "enc_session_key": "g5d..."
            }
        }
    }
}

🌐 Global parameters

ParameterDescription

$schema

type: string

optional

The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.

configuration_type

type: string

mandatory

Type of data operation.

For an File Utilities data operation, the value is always "file-utilities"

configuration_id

type: string

mandatory

ID of the data operation.

You can pick any name you want, but is has to be unique for this data operation type.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to include in your data operation name:

  • your account ID

  • the source bucket

  • the source path

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

activated

type: boolean

optional

Flag used to enable/disable the execution of the data operation.

Default value: true

archived

type: boolean

optional

Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.

Default value: false

doc_md

type: string

optional

Path to a file containing a detailed description of the data operation. The file must be in Markdown format.

version

type: string

mandatory

Use only version 2, version 1 is depreciated.

gcp_project_id

type: string

mandatory

Set the project where deploy the configuration and the associated cloud functions.

If not set, the user will be prompted to choose a project id.

gcs_bucket

type: string

mandatory

Name of the bucket.

gcs_path

type: string

mandatory

Path where the files will be found, e.g. "some/sub/dir".

gcs_destination_suffix

type: string

mandatory

Google Cloud Storage destination path, e.g. "/subdir/subdir_2" to send the files to "gs://BUCKET/subdir/subdir_2/source_file.ext"

launch_mode

type: string

mandatory

Choice of triggering system. Choose "gcs" to trigger the operation on file creation on a bucket. Futur modes will be implemented.

filename_templates

type: string

mandatory

List of filename templates that will be processed.

You can set the value to "*" for all files to be copied. However, this is not recommended, as unnecessary or sensitive files might be included by mistake. Besides, the date value specified in filename_template will be used to sort files in the archive folder. If no date value is specified, all files will be stored together under one folder named /ALL.

The best practice is to specify one or more filename templates with the filename_template and file_description parameters as described in the next paragraphe.

task_dependencies

type: array of strings

mandatory

The task_dependencies parameter allows you to create dependencies between the different tasks specified in the workflow parameter (see below). It will define in which order the workflow tasks will run, some of them running concurrently, others sequentially.

Syntax

  • The double chevron >> means that the first task needs to be completed before the next one can start.

  • The comma , means that the tasks will run concurrently.

  • The square brackets [ and ] allow you to define a set of tasks that will run together.

For detailed information about the syntax, refer to the Airflow documentation.

Example 1

We have the following tasks that we want to run sequentially: taskA (create_gbq_table), taskB (sql) and taskC (copy_gbq_table). The task_dependencies parameter will be as follows: "task_dependencies": [" taskA >> taskB >> taskC "],

Example 2

We have the following tasks that we want to run concurrently: taskA, taskB and taskC.

The task_dependencies parameter will be as follows: "task_dependencies": [" taskA, taskB, taskC "],

Example 3

We have the following 9 tasks we want to order: taskA, taskD, taskG (create_gbq_table), taskB, taskE, taskH (sql), taskC, taskF, taskI (copy_gbq_table). The task_dependencies parameter will be as follows:"task_dependencies": [" [taskA, taskD, taskG] >> [taskB, taskE, taskH] >> [taskC, taskF, taskI] "],

Example 4

In the example above, we want taskH to run before taskE so we can use its result for taskE.

The task_dependencies parameter will be as follows:

"task_dependencies": [" [taskA, taskD, taskG] >> taskH >> [taskB, taskE] >> [taskC, taskF, taskI] "],

credentials

type:array

mandatory

Encrypted credentials needed to read/write data from the source bucket.

You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.

🖥️ PGP encrypt task parameters

Information related to the Google Cloud Compute Engine VM where the script will be executed.

ParameterDescription

task_id

type: string

mandatory

ID of the task. It must be unique within the data operation.

task_type

type: string

mandatory

The value has to be set to "pgp" for this task type.

pgp_mode

type: string

mandatory

PGP mode.

For data encryption, the value is always "encrypt".

public_key.pgp

type:array

mandatory in encrypt mode

Encrypted public key. This array contains two entities: - the recipient "username" of the public key - the content "schema" credentials after passing it through tailer encrypt.

Last updated