Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Example
  • Global parameters
  • PGP decrypt task parameters

Was this helpful?

Edit on GitHub
  1. Data Pipeline Operations
  2. Modify files with File Utilities
  3. Encrypt/Decrypt data with File Utilities

Configuration file for data decryption

This is the description of the JSON configuration file for a File Utilities data decryption data operation.

PreviousConfiguration file for data encryptionNextTransfer data with GBQ to Firestore

Last updated 1 year ago

Was this helpful?

The configuration file is in JSON format. It contains the following sections:

  • Global parameters: General information about the data operation.

  • Tasks parameters: One or several task blocks, containing information about the specific data operation.

  • Credential parameters: Information about the credentials for the buckets and the PGP public key.

Example

Here is an example of File Utilities configuration file for data decryption:

{
    "configuration_type": "file-utilities",
    "configuration_id": "000010-file-utilities_demo",
    "environment": "DEV",
    "account": "000099",
    "activated": true,
    "archived": false,
    "version": "2",
    "doc_md": "readme.md",
    "gcp_project_id": "my-project",
    "gcs_bucket": "my-bucket",
    "gcs_path": "input",
    "gcs_destination_suffix": "input_decrypt",
    "launch_mode": "gcs",
    "filename_templates": [{
            "filename_template": "SCORES_{{FD_DATE}}-{{FD_BLOB_36}}.{{FD_BLOB_13}}.csv.gz.gpg",
            "file_description": "Scores"
        },
        {
            "filename_template": "ADR_{{FD_DATE}}-{{FD_BLOB_36}}.csv.gpg",
            "file_description": "Addresses"
        }
    ],
    "task_dependencies": [
        "pgp_decrypt"
    ],
    "tasks": [{
        "task_id": "pgp_decrypt",
        "task_type": "pgp",
        "pgp_mode": "decrypt",
        "private_key.pgp": {
            "passphrase": {
                "cipher_aes": "af3...",
                "tag": "fl9...",
                "ciphertext": "1e2...",
                "enc_session_key": "qvt..."
            },
            "recipient": "me@my-domain.com",
            "content": {
                "cipher_aes": "hk5...",
                "tag": "dfh...",
                "ciphertext": "cj5...",
                "enc_session_key": "2fk..."
            }
        }
    }],
    "credentials": {
        "gcp-credentials.json": {
            "content": {
                "cipher_aes": "gf5...", 
                "tag": "cvh...", 
                "ciphertext": "4et...", 
                "enc_session_key": "g5d..."
            }
        }
    }
}
Parameter
Description

configuration_type

type: string

mandatory

Type of data operation.

For an File Utilities data operation, the value is always "file-utilities"

configuration_id

type: string

mandatory

ID of the data operation.

You can pick any name you want, but is has to be unique for this data operation type.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to include in your data operation name:

  • your account ID

  • the source bucket

  • the source path

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

activated

type: boolean

optional

Flag used to enable/disable the execution of the data operation.

Default value: true

archived

type: boolean

optional

Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.

Default value: false

doc_md

type: string

optional

Path to a file containing a detailed description of the data operation. The file must be in Markdown format.

version

type: string

optional

Enter 2 is you are using context & environment and 1 if not.

gcp_project_id

type: string

mandatory

Set the project where deploy the configuration and the associated cloud functions.

If not set, the user will be prompted to choose a project id.

gcs_bucket

type: string

mandatory

Name of the bucket.

gcs_path

type: string

mandatory

Path where the files will be found, e.g. "some/sub/dir".

gcs_destination_suffix

type: string

mandatory

Google Cloud Storage destination path, e.g. "/subdir/subdir_2" to send the files to "gs://BUCKET/subdir/subdir_2/source_file.ext"

launch_mode

type: string

mandatory

Choice of triggering system. Choose "gcs" to trigger the operation on file creation on a bucket. Futur modes will be implemented.

filename_templates

type: string

mandatory

List of filename templates that will be processed.

You can set the value to "*" for all files to be copied. However, this is not recommended, as unnecessary or sensitive files might be included by mistake. Besides, the date value specified in filename_template will be used to sort files in the archive folder. If no date value is specified, all files will be stored together under one folder named /ALL.

The best practice is to specify one or more filename templates with the filename_template and file_description parameters as described in the next paragraphe.

task_dependencies

type: array of strings

mandatory

The task_dependencies parameter allows you to create dependencies between the different tasks specified in the workflow parameter (see below). It will define in which order the workflow tasks will run, some of them running concurrently, others sequentially.

Syntax

  • The double chevron >> means that the first task needs to be completed before the next one can start.

  • The comma , means that the tasks will run concurrently.

  • The square brackets [ and ] allow you to define a set of tasks that will run together.

Example 1

We have the following tasks that we want to run sequentially: taskA (create_gbq_table), taskB (sql) and taskC (copy_gbq_table). The task_dependencies parameter will be as follows: "task_dependencies": [" taskA >> taskB >> taskC "],

Example 2

We have the following tasks that we want to run concurrently: taskA, taskB and taskC.

The task_dependencies parameter will be as follows: "task_dependencies": [" taskA, taskB, taskC "],

Example 3

We have the following 9 tasks we want to order: taskA, taskD, taskG (create_gbq_table), taskB, taskE, taskH (sql), taskC, taskF, taskI (copy_gbq_table). The task_dependencies parameter will be as follows:"task_dependencies": [" [taskA, taskD, taskG] >> [taskB, taskE, taskH] >> [taskC, taskF, taskI] "],

Example 4

In the example above, we want taskH to run before taskE so we can use its result for taskE.

The task_dependencies parameter will be as follows:

"task_dependencies": [" [taskA, taskD, taskG] >> taskH >> [taskB, taskE] >> [taskC, taskF, taskI] "],

credentials

type:array

mandatory

Encrypted credentials needed to read/move data from the source bucket.

Information related to the Google Cloud Compute Engine VM where the script will be executed.

Parameter
Description

task_id

type: string

mandatory

ID of the task. It must be unique within the data operation.

task_type

type: string

mandatory

The value has to be set to "pgp" for this task type.

pgp_mode

type: string

mandatory

PGP mode.

For data decryption, the value is always "decrypt".

private_key.pgp

type:array

mandatory

Encrypted private key. This array contains three entities:

Global parameters

For detailed information about the syntax, refer to the .

You should have generated credentials when . To learn how to encrypt them, refer to .

PGP decrypt task parameters

- the passphrase "password" of the private key - the recipient "username" of the private key - the content "schema" credentials after passing it through .

👁️‍🗨️
🌐
🖥️
Airflow documentation
setting up GCP
this page
tailer encrypt