Storage to Storage configuration file

This is the description of the JSON configuration file of a Storage to Storage data operation.

The configuration file is in JSON format. It contains the following sections:

  • Global parameters: General information about the data operation.

  • Source parameters: One source block, containing information about the data source.

  • Destination parameters: One or several destination blocks, containing information about the data destinations.

👁️‍🗨️ Example

Here is an example of STS configuration file for a GCS to SFTP transfer:

{
  "$schema": "http://jsonschema.tailer.ai/schema/storage-to-storage-veditor",
  "configuration_type": "storage-to-storage",
  "configuration_id": "copy-my-files-gcs-to-sftp",
  "doc_md": "readme.md",
  "environment": "PROD",
  "account": "000099",
  "version": "3",
  "activated": true,
  "archived": false,
  "filename_templates": [
    {
      "filename_template": "{{FD_DATE}}_sales_file.txt",
      "file_description": "This is a description for sales_file.txt."
    },
    {
      "filename_template": "{{FD_DATE}}_products.txt",
      "file_description": "This is a description for proucts.txt."
    }
  ],
  
  "source": {
    "type": "gcs",
    "gcp_project_id": "my_gcp_project",
    "gcs_source_bucket" : "my-input-bucket",
    "gcs_source_prefix" : "input-folder",
    "archive_prefix": "archive-folder",
    "gcp_credentials_secret": {
      "cipher_aes": "b42724dcbbf0aba89a0f106d1c4",
      "tag": "5c8816ea0a7aded7c6f2df61f5b9",
      "ciphertext": "fd096e",
      "enc_session_key": "8f6f7c"
    }
  },
  
  "destinations": [
    {
      "type": "sftp",
      "generate_top_file": "REPLACE_EXTENSION",
      "sftp_host": "sftp.domain.com",
      "sftp_port": 22,
      "sftp_userid": "john_doe",
      "sftp_password_secret": {
        "cipher_aes": "3926f71cd00d10b07d0fee4e",
        "tag": "1f5c066351d91041343a2ab37aebe",
        "ciphertext": "921776fd04228fe8aaa42af04",
        "enc_session_key": "2fb0ad2b0df9771"
      },
      "sftp_destination_dir": "/",
      "sftp_destination_dir_create": false
    }
  ]
}

🌐 Global parameters

Parameter
Description

$schema type: string optional

The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.

configuration_type

type: string

mandatory

Type of data operation.

For an STS data operation, the value is always "storage-to-storage".

configuration_id

type: string

mandatory

ID of the data operation.

You can pick any name you want, but is has to be unique for this data operation type.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:

  • your account ID,

  • the source bucket name,

  • and the source directory name.

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

version

type: string

optional

Version of the configuration in order to use new features.

filename_templates

type: string

mandatory

List of filename templates that will be processed.

You can set the value to "*" for all files to be copied. However, this is not recommended, as unnecessary or sensitive files might be included by mistake. Besides, the date value specified in filename_template will be used to sort files in the archive folder. If no date value is specified, all files will be stored together under one folder named /ALL.

The best practice is to specify one or more filename templates with the filename_template and file_description parameters as described in the next paragraphe.

activated

type: boolean

optional

Flag used to enable/disable the execution of the data operation.

If not specified, the default value will be "true".

archived

type: boolean

optional

Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.

If not specified, the default value will be "false".

max_active_runs

type: integer

optional

This parameter limits the number of concurrent runs for this data operation.

If not set, the default value is 5.

empty_file_policy

type: string

optional

This parameter will tell Tailer how to behave when an empty file (0 bytes) is read.

  • "NONE" the file is ignored and left in place.

  • "PROCESS" the file will be processed normally.

  • "TRIGGER_FAILED_STATUS" a Tailer run will be traced and set to FAILED.

  • "TRIGGER_SUCCESS_STATUS" a Tailer run will be traced and set to SUCCESS.

The default value is: "NONE"

short_description

type: string

optional

Short description of the Data Operation

doc_md

type: string

optional

Path to a file containing a detailed description. The file must be in Markdown format.

"Filename Templates" sub-object parameters

The "filename_templates" object contains the definition of expected source files to copy to the destinations.

Parameter
Description

filename_template

type: string

mandatory

Template for the files to be processed. The following placeholders are currently supported:

  • "FD_DATE" looks for an 8-digit date (e.g. "20191015").

  • "FD_DATE_YEAR_4" looks for 4-digit year (e.g "2021").

  • "FD_DATE_YEAR_2" looks for 2-digit year (e.g "21").

  • "FD_DATE_MONTH" looks for 2-digit month (e.g "05").

  • "FD_DATE_DAY" looks for 2-digit day (e.g "12").

  • "FD_TIME" looks for a 6-digit time (e.g. "124213").

  • "FD_BLOB_XYZ", where XYZ is a non-zero positive integer, looks for a string of characters of XYZ length.

Information:

  • if "FD_DATE" is specified, it will have priority upon "FD_DATE_YEAR_X".

  • if "FD_DATE_YEAR_4" or "FD_DATE_YEAR_2" is specified, the final date will be concatenated with "FD_DATE_MONTH" and "FD_DATE_DAY".

  • if "FD_DATE_YEAR_2" is specified, it will be prefixed by "20".

  • if "FD_DATE_YEAR_4" or "FD_DATE_YEAR_2" is specified only "FD_DATE_MONTH" and "FD_DATE_DAY" will be set to "01".

Example 1

This template:

"stores_{{FD_DATE}}{{FD_TIME}}.txt"

will allow you to process this type of files:

"stores_20201116_124213.txt"

Example 2

This template:

"{{FD_DATE}}{{FD_BLOB_5}}fixedvalue_{{FD_BLOB_11}}.gz"

will allow you to process this type of files:

"20201116_12397_fixedvalue_12312378934.gz"

file_description

type: string

optional

Short description of the files that will match the filename template.

⬇️ Source parameters

There can only be one source block, as STS data operations can only process one source at a time.

Google Cloud Storage source

Example:

{
  "source": {
    "type": "gcs",
    "gcp_project_id": "my_gcp_project",
    "gcs_source_bucket" : "my_bucket",
    "gcs_source_prefix" : "INPUT_SOMEDIR",
    "archive_prefix": "archive",
    "gcp_credentials_secret": {
      "cipher_aes": "b42724dcbbf6c3310aba89a0f106d1c4",
      "tag": "5c8816ea0a7aded9cb47c6f2df61f5b9",
      "ciphertext": "fdf09c6e",
      "enc_session_key": "8f63f7c"
    }
  }
}

Parameter

Description

type

type: string

mandatory

Type of source.

In this case : "gcs".

gcp_project_id

type: string

mandatory

Set the project where deploy the source configuration and associated cloud functions

If not set, the user will be prompted to choose a profile where deploy the configuration

gcs_source_bucket

type: string

mandatory

Name of the source bucket.

gcs_source_prefix

type: string

mandatory

Path where the files will be found, e.g. "some/sub/dir".

archive_prefix

type: string

optional

Path where the source files will be archived.

If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.

If not present or empty, there will be no archiving.

gcp_credentials_secret

type: dict

mandatory

Encrypted credentials needed to read/move data from the source bucket.

You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.

Amazon S3 source

Example:

{
  "source": {
    "type": "s3",
    "s3_source_bucket": "my_s3_bucket",
    "s3_source_prefix": "input/my_source/",
    "archive_prefix": "archive"
    "aws_access_key": "3VJ3F6JJQBA2",
    "aws_access_key_secret": {
      "cipher_aes": "e6f5a68d4de8af89e83ea93e42facbed",
      "tag": "20e174e34c5d0c537be77d85ed8dda33",
      "ciphertext": "60a98b884110aab84",
      "enc_session_key": "9c4648e"
    }
  }
}

Parameter

Description

type

type: string

mandatory

Type of source.

In this case : "s3".

s3_source_bucket

type: string

mandatory

Name of the source S3 bucket.

s3_source_prefix

type: string

mandatory

Path where the files will be found, e.g. "some/sub/dir".

archive_prefix

type: string

optional

Path where the source files will be archived.

If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.

If not present or empty, there will be no archiving.

aws_access_key

type: string

mandatory

Amazon S3 access key ID.

aws_access_key_secret

type: dict

mandatory

Encrypted Amazon S3 access private key.

This is needed to read/move data from the source bucket. To learn how to encrypt the private key value, refer to this page.

Azure source

Example:

{
    "source": {
        "type": "azure",
        "azure_source_storage": "my_azure_storage",
        "azure_source_prefix": "input/my_source/",
        "archive_prefix": "archive",
        "azure_connection_string_secret": {
            "cipher_aes": "f1c4xxxx",
            "tag": "0052fxxxx",
            "ciphertext": "7e1a3xxxx",
            "enc_session_key": "2dc2xxxx"
        }
    }
}

Parameter

Description

type

type: string

mandatory

Type of source.

In this case : "azure"

azure_source_storage

type: string

mandatory

Name of the source Azure storage.

azure_source_prefix

type: string

mandatory

Path where the files will be found, e.g. "some/sub/dir".

archive_prefix

type: string

optional

Path where the source files will be archived.

If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.

If not present or empty, there will be no archiving.

azure_connection_string_secret

type: dict

mandatory

Encrypted Azure access private key.

This is needed to read/move data from the source bucket. To learn how to encrypt the private key value, refer to this page.

SFTP source

Example:

{
  "source": {
    "type": "sftp",
    "sftp_source_directory": "/",
    "sftp_source_filename": "20190621_test_file.txt",
    "archive_prefix": "archive",
    "sftp_host": "sftp.domain.com",
    "sftp_port": 22,
    "sftp_userid": "john_doe",
    "sftp_authentication_method": "USERNAME_PASSWORD",
    "sftp_password_secret": {
      "cipher_aes": "3926f71cd00f8d2b812d10b07d0fee4e",
      "tag": "1f5c066351db5f91041343a2ab37aebe",
      "ciphertext": "921776fd0caa71a04228fe8aaa42af04",
      "enc_session_key": "2fb2f8d271"
    }
  }
}

Parameter

Description

type

type: string

mandatory

Type of source.

In this case : "sftp".

sftp_source_directory

type: string

mandatory

Sub-path to switch to before downloading the file.

sftp_source_filename

type: string

mandatory

File to retrieve.

archive_prefix

type: string

optional

Path where the source files will be archived.

If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.

If not present or empty, there will be no archiving.

sftp_host

type: string

mandatory

SFTP host, e.g. "sftp.something.com".

sftp_port

type: integer

mandatory

SFTP port, e.g. "22".

sftp_userid

type: string

mandatory

SFTP user ID, e.g. "john_doe".

sftp_authentication_method

type: string

optional

Authentication method used to connect to the SFTP server.

The following methods are supported:

  • USERNAME_PASSWORD

  • PRIVATE_KEY

Default : USERNAME_PASSWORD

sftp_password_secret

type: dict

optional

Encrypted SFTP password for the user ID.

This is needed to read/move data from the source SFTP.

This attribute MUST be set if sftp_authentication_method is set to USERNAME_PASSWORD

To learn how to encrypt the password, refer to this page.

sftp_private_key_secret type: dict

optional

Encrypted SFTP private key.

This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.

To learn how to encrypt the password, refer to this page.

sftp_private_key_passphrase_secret type: dict

optional

Encrypted SFTP private key passphrase if provided

This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.

To learn how to encrypt the password, refer to this page.

⬆️ Destination parameters

These parameters allow you specify a list of destinations. You can add as many "destination" sub-objects as you want, they will all be processed.

Google Cloud Storage destination

Example:

{
  "destinations": [
    {
      "type": "gcs",
      "gcs_destination_bucket": "my_dest_bucket",
      "gcs_destination_prefix": "DEV/output",
      "gcp_credentials_secret": {
        "cipher_aes": "b42724dcbbf6c3310aba89a0f106d1c4",
        "tag": "5c8816ea0a7aded9cb47c6f2df61f5b9",
        "ciphertext": "fdf09c6e",
        "enc_session_key": "8f634f7f7c"
      }
    }
  ]
}
Parameter
Description

type

type: string

mandatory

Type of destination.

In this case : "gcs".

gcs_destination_bucket

type: string

mandatory

Google Cloud Storage destination bucket.

gcs_destination_prefix

type: string

mandatory

Google Cloud Storage destination path, e.g. "/subdir/subdir_2" to send the files to "gs://BUCKET/subdir/subdir_2/source_file.ext"

gcp_credentials_secret

type: dict

mandatory

Encrypted credentials needed to read/write/move data from the destination bucket.

You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.

Amazon S3 destination

Example:

{
  "destinations": [
    {
      "type": "s3",
      "s3_bucket" : "my_dest_bucket",
      "s3_destination_prefix": "PROD/output",
      "aws_access_key": "J3F6JLUVJQ",
      "aws_access_key_secret": {
        "cipher_aes": "e6f5a68dxxxx",
        "tag": "20e1xxxx",
        "ciphertext": "60a84xxxx",
        "enc_session_key": "9c4619048exxxx"
      }
    }
  ]
}
Parameter
Description

type

type: string

mandatory

Type of destination.

In this case : "s3".

s3_bucket

type: string

mandatory

Amazon S3 bucket name.

s3_destination_prefix

type: string

mandatory

Amazon S3 destination path, e.g. "subdir_A/subdir_B" to send the files to "s3://bucket/subdir_A/subdir_B/source_file.ext".

aws_access_key

type: string

mandatory

Amazon S3 access key ID.

aws_access_key_secret

type: dict

mandatory

Encrypted Amazon S3 access private key.

This is needed to read/write/move data from the destination bucket.

To learn how to encrypt the private key value, refer to this page.

Azure destination

Example:

{
  "destinations": [
    {
      "type": "azure",
      "azure_destination_prefix": "my_azure_bucket/output",
      "azure_connection_string_secret": {
        "cipher_aes": "3926fxxxx",
        "tag": "1f5cxxxx",
        "ciphertext": "9217xxxx",
        "enc_session_key": "2fb0xxxx"
      }
    }
  ]
}
Parameter
Description

type

type: string

mandatory

Type of destination.

In this case : "azure".

azure_destination_prefix

type: string

mandatory

Complete Azure destination path, i.e. storage name and subdirectory if needed.

azure_connection_string_secret

type: dict

mandatory

Encrypted Azure access private key.

This is needed to read/move data from the source bucket. To learn how to encrypt the private key value, refer to this page.

SFTP destination

Example:

{
  "destinations": [
    {
      "type": "sftp",
      "generate_top_file": "REPLACE_EXTENSION",
      "sftp_destination_dir": "/",
      "sftp_destination_dir_create": false,
      "sftp_host": "sftp.domain.com",
      "sftp_port": 22,
      "sftp_userid": "john_doe",
      "sftp_password_secret": {
        "cipher_aes": "3926f71cd00f8d2b812d10b07d0fee4e",
        "tag": "1f5c066351db5f91041343a2ab37aebe",
        "ciphertext": "921776fd0caa71a04228fe8aaa42af04",
        "enc_session_key": "2fb0adf8d271"
      }
    }
  ]
}
Parameter
Description

type

type: string

mandatory

Type of destination.

In this case : "sftp".

generate_top_file

type: string

optional

This flag, if set, will generate a TOP file along with the file copied.

Possible values are:

  • "REPLACE_EXTENSION": if the source file is "20190708_data.txt", the TOP file will be "20190708_data.top".

  • "ADD_EXTENSION": if the source file is "20190708_data.txt", the TOP file will be "20190708_data.txt.top".

sftp_destination_dir

type: string

mandatory

Path to switch to before uploading the file.

sftp_destination_dir_create

type: string

mandatory

Will try to create the subdir specified in sftp_destination_dir on the SFTP filesystem before switching to it and copying files.

sftp_host

type: string

mandatory

SFTP host, e.g. "sftp.something.com".

sftp_port

type: string

mandatory

SFTP port, e.g. "22".

sftp_userid

type: string

mandatory

SFTP user ID, e.g. "john_doe".

sftp_authentication_method

type: string

optional

Authentication method used to connect to the SFTP server.

The following methods are supported:

  • USERNAME_PASSWORD

  • PRIVATE_KEY

Default : USERNAME_PASSWORD

sftp_password_secret

type: dict

optional

Encrypted SFTP password for the user ID.

This is needed to read/move data from the source SFTP.

This attribute MUST be set if sftp_authentication_method is set to USERNAME_PASSWORD.

To learn how to encrypt the password, refer to this page.

sftp_private_key_secret type: dict

optional

Encrypted SFTP private key.

This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.

To learn how to encrypt the password, refer to this page.

sftp_private_key_passphrase_secret type: dict

optional

Encrypted SFTP private key passphrase if provided.

This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.

To learn how to encrypt the password, refer to this page.

Last updated