Storage to Storage configuration file

This is the description of the JSON configuration file of a Storage to Storage data operation.
The configuration file is in JSON format. It contains the following sections:
  • Global parameters: General information about the data operation.
  • Source parameters: One source block, containing information about the data source.
  • Destination parameters: One or several destination blocks, containing information about the data destinations.

👁️‍🗨️
Example

Here is an example of STS configuration file for a GCS to SFTP transfer:
{
"$schema": "http://jsonschema.tailer.ai/schema/storage-to-storage-veditor",
"configuration_type": "storage-to-storage",
"configuration_id": "copy-my-files-gcs-to-sftp",
"doc_md": "readme.md",
"environment": "PROD",
"account": "000099",
"version": "3",
"activated": true,
"archived": false,
"filename_templates": [
{
"filename_template": "{{FD_DATE}}_sales_file.txt",
"file_description": "This is a description for sales_file.txt."
},
{
"filename_template": "{{FD_DATE}}_products.txt",
"file_description": "This is a description for proucts.txt."
}
],
"source": {
"type": "gcs",
"gcp_project_id": "my_gcp_project",
"gcs_source_bucket" : "my-input-bucket",
"gcs_source_prefix" : "input-folder",
"archive_prefix": "archive-folder",
"gcp_credentials_secret": {
"cipher_aes": "b42724dcbbf0aba89a0f106d1c4",
"tag": "5c8816ea0a7aded7c6f2df61f5b9",
"ciphertext": "fd096e",
"enc_session_key": "8f6f7c"
}
},
"destinations": [
{
"type": "sftp",
"generate_top_file": "REPLACE_EXTENSION",
"sftp_host": "sftp.domain.com",
"sftp_port": 22,
"sftp_userid": "john_doe",
"sftp_password_secret": {
"cipher_aes": "3926f71cd00d10b07d0fee4e",
"tag": "1f5c066351d91041343a2ab37aebe",
"ciphertext": "921776fd04228fe8aaa42af04",
"enc_session_key": "2fb0ad2b0df9771"
},
"sftp_destination_dir": "/",
"sftp_destination_dir_create": false
}
]
}

🌐
Global parameters

Parameter
Description
$schema type: string optional
The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.
configuration_type
type: string
mandatory
Type of data operation.
For an STS data operation, the value is always "storage-to-storage".
configuration_id
type: string
mandatory
ID of the data operation.
You can pick any name you want, but is has to be unique for this data operation type.
Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:
  • your account ID,
  • the source bucket name,
  • and the source directory name.
environment
type: string
mandatory
Deployment context.
Values: PROD, PREPROD, STAGING, DEV.
account
type: string
mandatory
Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.
version
type: string
optional
Version of the configuration in order to use new features.
filename_templates
type: string
mandatory
List of filename templates that will be processed.
You can set the value to "*" for all files to be copied. However, this is not recommended, as unnecessary or sensitive files might be included by mistake. Besides, the date value specified in filename_template will be used to sort files in the archive folder. If no date value is specified, all files will be stored together under one folder named /ALL.
The best practice is to specify one or more filename templates with the filename_template and file_description parameters as described in the next paragraphe.
activated
type: boolean
optional
Flag used to enable/disable the execution of the data operation.
If not specified, the default value will be "true".
archived
type: boolean
optional
Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.
If not specified, the default value will be "false".
max_active_runs
type: integer
optional
This parameter limits the number of concurrent runs for this data operation.
If not set, the default value is 5.
short_description
type: string
optional
Short description of the Data Operation
doc_md
type: string
optional
Path to a file containing a detailed description. The file must be in Markdown format.

"Filename Templates" sub-object parameters

The "filename_templates" object contains the definition of expected source files to copy to the destinations.
Parameter
Description
filename_template
type: string
mandatory
Template for the files to be processed. The following placeholders are currently supported:
  • "FD_DATE" looks for an 8-digit date (e.g. "20191015").
  • "FD_DATE_YEAR_4" looks for 4-digit year (e.g "2021").
  • "FD_DATE_YEAR_2" looks for 2-digit year (e.g "21").
  • "FD_DATE_MONTH" looks for 2-digit month (e.g "05").
  • "FD_DATE_DAY" looks for 2-digit day (e.g "12").
  • "FD_TIME" looks for a 6-digit time (e.g. "124213").
  • "FD_BLOB_XYZ", where XYZ is a non-zero positive integer, looks for a string of characters of XYZ length.
Information:
  • if "FD_DATE" is specified, it will have priority upon "FD_DATE_YEAR_X".
  • if "FD_DATE_YEAR_4" or "FD_DATE_YEAR_2" is specified, the final date will be concatenated with "FD_DATE_MONTH" and "FD_DATE_DAY".
  • if "FD_DATE_YEAR_2" is specified, it will be prefixed by "20".
  • if "FD_DATE_YEAR_4" or "FD_DATE_YEAR_2" is specified only "FD_DATE_MONTH" and "FD_DATE_DAY" will be set to "01".
Example 1
This template:
"stores_{{FD_DATE}}{{FD_TIME}}.txt"
will allow you to process this type of files:
"stores_20201116_124213.txt"
Example 2
This template:
"{{FD_DATE}}{{FD_BLOB_5}}fixedvalue_{{FD_BLOB_11}}.gz"
will allow you to process this type of files:
"20201116_12397_fixedvalue_12312378934.gz"
file_description
type: string
optional
Short description of the files that will match the filename template.

⬇️
Source parameters

There can only be one source block, as STS data operations can only process one source at a time.

Google Cloud Storage source

Example:
{
"source": {
"type": "gcs",
"gcp_project_id": "my_gcp_project",
"gcs_source_bucket" : "my_bucket",
"gcs_source_prefix" : "INPUT_SOMEDIR",
"archive_prefix": "archive",
"gcp_credentials_secret": {
"cipher_aes": "b42724dcbbf6c3310aba89a0f106d1c4",
"tag": "5c8816ea0a7aded9cb47c6f2df61f5b9",
"ciphertext": "fdf09c6e",
"enc_session_key": "8f63f7c"
}
}
}
Parameter
Description
type
type: string
mandatory
Type of source.
In this case : "gcs".
gcp_project_id
type: string
mandatory
Set the project where deploy the source configuration and associated cloud functions
If not set, the user will be prompted to choose a profile where deploy the configuration
gcs_source_bucket
type: string
mandatory
Name of the source bucket.
gcs_source_prefix
type: string
mandatory
Path where the files will be found, e.g. "some/sub/dir".
archive_prefix
type: string
optional
Path where the source files will be archived.
If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.
If not present or empty, there will be no archiving.
gcp_credentials_secret
type: dict
mandatory
Encrypted credentials needed to read/move data from the source bucket.
You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.

Amazon S3 source

Example:
{
"source": {
"type": "s3",
"s3_source_bucket": "my_s3_bucket",
"s3_source_prefix": "input/my_source/",
"archive_prefix": "archive"
"aws_access_key": "3VJ3F6JJQBA2",
"aws_access_key_secret": {
"cipher_aes": "e6f5a68d4de8af89e83ea93e42facbed",
"tag": "20e174e34c5d0c537be77d85ed8dda33",
"ciphertext": "60a98b884110aab84",
"enc_session_key": "9c4648e"
}
}
}
Parameter
Description
type
type: string
mandatory
Type of source.
In this case : "s3".
s3_source_bucket
type: string
mandatory
Name of the source S3 bucket.
s3_source_prefix
type: string
mandatory
Path where the files will be found, e.g. "some/sub/dir".
archive_prefix
type: string
optional
Path where the source files will be archived.
If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.
If not present or empty, there will be no archiving.
aws_access_key
type: string
mandatory
Amazon S3 access key ID.
aws_access_key_secret
type: dict
mandatory
Encrypted Amazon S3 access private key.
This is needed to read/move data from the source bucket. To learn how to encrypt the private key value, refer to this page.

Azure source

Example:
{
"source": {
"type": "azure",
"azure_source_storage": "my_azure_storage",
"azure_source_prefix": "input/my_source/",
"archive_prefix": "archive",
"azure_connection_string_secret": {
"cipher_aes": "f1c4xxxx",
"tag": "0052fxxxx",
"ciphertext": "7e1a3xxxx",
"enc_session_key": "2dc2xxxx"
}
}
}
Parameter
Description
type
type: string
mandatory
Type of source.
In this case : "azure"
azure_source_storage
type: string
mandatory
Name of the source Azure storage.
azure_source_prefix
type: string
mandatory
Path where the files will be found, e.g. "some/sub/dir".
archive_prefix
type: string
optional
Path where the source files will be archived.
If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.
If not present or empty, there will be no archiving.
azure_connection_string_secret
type: dict
mandatory
Encrypted Azure access private key.
This is needed to read/move data from the source bucket. To learn how to encrypt the private key value, refer to this page.

SFTP source

Example:
{
"source": {
"type": "sftp",
"sftp_source_directory": "/",
"sftp_source_filename": "20190621_test_file.txt",
"archive_prefix": "archive",
"sftp_host": "sftp.domain.com",
"sftp_port": 22,
"sftp_userid": "john_doe",
"sftp_authentication_method": "USERNAME_PASSWORD",
"sftp_password_secret": {
"cipher_aes": "3926f71cd00f8d2b812d10b07d0fee4e",
"tag": "1f5c066351db5f91041343a2ab37aebe",
"ciphertext": "921776fd0caa71a04228fe8aaa42af04",
"enc_session_key": "2fb2f8d271"
}
}
}
Parameter
Description
type
type: string
mandatory
Type of source.
In this case : "sftp".
sftp_source_directory
type: string
mandatory
Sub-path to switch to before downloading the file.
sftp_source_filename
type: string
mandatory
File to retrieve.
archive_prefix
type: string
optional
Path where the source files will be archived.
If present and populated, the STS data operation will archive the source files in the location specified, in the GCS source bucket.
If not present or empty, there will be no archiving.
sftp_host
type: string
mandatory
SFTP host, e.g. "sftp.something.com".
sftp_port
type: integer
mandatory
SFTP port, e.g. "22".
sftp_userid
type: string
mandatory
SFTP user ID, e.g. "john_doe".
sftp_authentication_method
type: string
optional
Authentication method used to connect to the SFTP server.
The following methods are supported:
  • USERNAME_PASSWORD
  • PRIVATE_KEY
Default : USERNAME_PASSWORD
sftp_password_secret
type: dict
optional
Encrypted SFTP password for the user ID.
This is needed to read/move data from the source SFTP.
This attribute MUST be set if sftp_authentication_method is set to USERNAME_PASSWORD
To learn how to encrypt the password, refer to this page.
sftp_private_key_secret type: dict
optional
Encrypted SFTP private key.
This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.
To learn how to encrypt the password, refer to this page.
sftp_private_key_passphrase_secret type: dict
optional
Encrypted SFTP private key passphrase if provided
This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.
To learn how to encrypt the password, refer to this page.

⬆️
Destination parameters

These parameters allow you specify a list of destinations. You can add as many "destination" sub-objects as you want, they will all be processed.

Google Cloud Storage destination

Example:
{
"destinations": [
{
"type": "gcs",
"gcs_destination_bucket": "my_dest_bucket",
"gcs_destination_prefix": "DEV/output",
"gcp_credentials_secret": {
"cipher_aes": "b42724dcbbf6c3310aba89a0f106d1c4",
"tag": "5c8816ea0a7aded9cb47c6f2df61f5b9",
"ciphertext": "fdf09c6e",
"enc_session_key": "8f634f7f7c"
}
}
]
}
Parameter
Description
type
type: string
mandatory
Type of destination.
In this case : "gcs".
gcs_destination_bucket
type: string
mandatory
Google Cloud Storage destination bucket.
gcs_destination_prefix
type: string
mandatory
Google Cloud Storage destination path, e.g. "/subdir/subdir_2" to send the files to "gs://BUCKET/subdir/subdir_2/source_file.ext"
gcp_credentials_secret
type: dict
mandatory
Encrypted credentials needed to read/write/move data from the destination bucket.
You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.

Amazon S3 destination

Example:
{
"destinations": [
{
"type": "s3",
"s3_bucket" : "my_dest_bucket",
"s3_destination_prefix": "PROD/output",
"aws_access_key": "J3F6JLUVJQ",
"aws_access_key_secret": {
"cipher_aes": "e6f5a68dxxxx",
"tag": "20e1xxxx",
"ciphertext": "60a84xxxx",
"enc_session_key": "9c4619048exxxx"
}
}
]
}
Parameter
Description
type
type: string
mandatory
Type of destination.
In this case : "s3".
s3_bucket
type: string
mandatory
Amazon S3 bucket name.
s3_destination_prefix
type: string
mandatory
Amazon S3 destination path, e.g. "subdir_A/subdir_B" to send the files to "s3://bucket/subdir_A/subdir_B/source_file.ext".
aws_access_key
type: string
mandatory
Amazon S3 access key ID.
aws_access_key_secret
type: dict
mandatory
Encrypted Amazon S3 access private key.
This is needed to read/write/move data from the destination bucket.
To learn how to encrypt the private key value, refer to this page.

Azure destination

Example:
{
"destinations": [
{
"type": "azure",
"azure_destination_prefix": "my_azure_bucket/output",
"azure_connection_string_secret": {
"cipher_aes": "3926fxxxx",
"tag": "1f5cxxxx",
"ciphertext": "9217xxxx",
"enc_session_key": "2fb0xxxx"
}
}
]
}
Parameter
Description
type
type: string
mandatory
Type of destination.
In this case : "azure".
azure_destination_prefix
type: string
mandatory
Complete Azure destination path, i.e. storage name and subdirectory if needed.
azure_connection_string_secret
type: dict
mandatory
Encrypted Azure access private key.
This is needed to read/move data from the source bucket. To learn how to encrypt the private key value, refer to this page.

SFTP destination

Example:
{
"destinations": [
{
"type": "sftp",
"generate_top_file": "REPLACE_EXTENSION",
"sftp_destination_dir": "/",
"sftp_destination_dir_create": false,
"sftp_host": "sftp.domain.com",
"sftp_port": 22,
"sftp_userid": "john_doe",
"sftp_password_secret": {
"cipher_aes": "3926f71cd00f8d2b812d10b07d0fee4e",
"tag": "1f5c066351db5f91041343a2ab37aebe",
"ciphertext": "921776fd0caa71a04228fe8aaa42af04",
"enc_session_key": "2fb0adf8d271"
}
}
]
}
Parameter
Description
type
type: string
mandatory
Type of destination.
In this case : "sftp".
generate_top_file
type: string
optional
This flag, if set, will generate a TOP file along with the file copied.
Possible values are:
  • "REPLACE_EXTENSION": if the source file is "20190708_data.txt", the TOP file will be "20190708_data.top".
  • "ADD_EXTENSION": if the source file is "20190708_data.txt", the TOP file will be "20190708_data.txt.top".
sftp_destination_dir
type: string
mandatory
Path to switch to before uploading the file.
sftp_destination_dir_create
type: string
mandatory
Will try to create the subdir specified in sftp_destination_dir on the SFTP filesystem before switching to it and copying files.
sftp_host
type: string
mandatory
SFTP host, e.g. "sftp.something.com".
sftp_port
type: string
mandatory
SFTP port, e.g. "22".
sftp_userid
type: string
mandatory
SFTP user ID, e.g. "john_doe".
sftp_authentication_method
type: string
optional
Authentication method used to connect to the SFTP server.
The following methods are supported:
  • USERNAME_PASSWORD
  • PRIVATE_KEY
Default : USERNAME_PASSWORD
sftp_password_secret
type: dict
optional
Encrypted SFTP password for the user ID.
This is needed to read/move data from the source SFTP.
This attribute MUST be set if sftp_authentication_method is set to USERNAME_PASSWORD.
To learn how to encrypt the password, refer to this page.
sftp_private_key_secret type: dict
optional
Encrypted SFTP private key.
This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.
To learn how to encrypt the password, refer to this page.
sftp_private_key_passphrase_secret type: dict
optional
Encrypted SFTP private key passphrase if provided.
This attribute MUST be set if sftp_authentication_method is set to PRIVATE_KEY.
To learn how to encrypt the password, refer to this page.