VM Launcher configuration file for code processing

This is the description of the JSON configuration file for a VM Launcher code processing data operation.

The configuration file is in JSON format. It contains the following sections:

  • Global parameters: General information about the data operation.

  • Script parameters: Information about the script location and instructions to execute on the VM.

  • VM parameters: Information related to the VM where to execute the script.

👁️‍🗨️ Example

Here is an example of VM Launcher configuration file for code processing:

{
    "$schema": "http://jsonschema.tailer.ai/schema/vm-launcher-veditor",
    "configuration_type": "vm-launcher",
    "configuration_id": "000010_my-vm-job",
    "version": "2",
    "environment": "DEV",
    "account": "000099",
    "doc_md": "readme.md",
    "start_date": "2022, 11, 16",
    "schedule_interval": "0 7 * * *",
    "activated": false,
    "archived": false,
    "gcp_project_id": "my-project",
    "gcs_bucket": "my-bucket",
    "gcs_working_directory": "/",
    "credentials": {
        "gcp-credentials.json": {
            "content": {
                "cipher_aes": "xxx", 
                "tag": "xxx", 
                "ciphertext": ""xxx, 
                "enc_session_key": "xxx"
            }
        }
    },
    "script_to_execute": [
        "mkdir -p input_DEV",
        "cd ./input_DEV && python3 my-python-script.py"
    ],
    "vm_delete": true,
    "vm_core_number": "2",
    "vm_memory_amount": "4",
    "vm_disk_size": "20",
    "vm_compute_zone": "europe-west1-b",
    "vm_custom_os_image_family": "ubuntu-2004-lts",
    "vm_custom_os_image_project": "ubuntu-os-cloud"
}

🌐 Global parameters

ParameterDescription

$schema

type: string

optional

The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.

configuration_type

type: string

mandatory

Type of data operation.

For an STS data operation, the value is always "storage-to-storage".

configuration_id

type: string

mandatory

ID of the data operation.

You can pick any name you want, but is has to be unique for this data operation type.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:

  • your account ID,

  • the source bucket name,

  • and the source directory name.

version type: string optional

Version of the configuration. Must be "2" in order to use the latest features.

Default : "1" but only version "2" supports start_date and schedule_interval. Version 1 is deprecated.

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

doc_md

type: string

optional

Path to a file containing a detailed description of the data operation. The file must be in Markdown format.

start_date

type: string

optional

only available for version: "2" and latest

mandatory if you want to specify a schedule_interval

Start date of the data operation.

The format must be: "YYYY, MM, DD"

Where:

  • YYYY >= 1970

  • MM = [1, 12]

  • DD = [1, 31]

schedule_interval

type: string

optional

only available for version: "2" and latest

A VM Launcher can be launched in two different ways:

  • If schedule_interval is set to "None", the data operation will need to be started with a Workflow, when a given condition is met.

  • If you want the data operation to start at regular intervals, you can define this in the schedule_interval parameter with a Cron expression.

Example:

For the VM Launcher to start everyday at 7:00, you need to set it as follows:

"schedule_interval": "0 7 * * *",

You can find online tools to help you edit your Cron expression (for example, crontab.guru).

activated

type: boolean

optional

Flag used to enable/disable the execution of the data operation.

If not specified, the default value will be "true".

archived

type: boolean

optional

Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.

If not specified, the default value will be "false".

✍️ Script parameters

Information about the script location and instructions to execute it.

ParameterDescription

gcp_project_id

type: string

mandatory

Google Cloud Platform project ID for the bucket containing the script.

gcs_bucket

type: string

mandatory

Name of the GCS bucket containing the script.

gcs_working_directory

type: string

mandatory

Path in the GCS bucket containing the script, e.g. "some/sub/dir".

gcp_credentials_secret

type: dict

mandatory

Encrypted credentials needed to read/move data from the source bucket.

You should have generated credentials when setting up GCP. To learn how to encrypt them, refer to this page.

script_to_execute

type: array

mandatory

List of Unix commands to be executed (similar to a Bash script) on the VM.

💻 VM parameters

Information related to the Google Cloud Compute Engine VM where the script will be executed.

ParameterDescription

vm_delete

type: boolean

optional

If set to "true", this parameter will force the deletion of the VM at the end of the data operation. Running Compute Engine VMs will incur extra costs, so it is recommended to leave this parameter on "true".

Default value: true

vm_core_number

type: string

optional

Virtual CPU (vCPU) count. It is recommended to leave the default parameter, as this should allow sufficient performance to run a standard script.

Default value: 2

vm_memory_amount

type: string

optional

System memory size (in GB).

It is recommended to leave the default parameter, as this should allow sufficient performance to run a standard script.

Default value: 4

vm_disk_size

type: string

optional

Persistent disk size (in GB).

It is recommended to leave the default parameter, as this should provide enough space to store the data to process.

Default value: 20

vm_compute_zone type: string optional

Select the zone where the vm can execute its jobs Default value: europe-west1-b

vm_custom_os_image_family

type: string

optional

Image family of the custom image.

Note that for the time being, custom OS images MUST be based on a Ubuntu 20.04 LTS.

Default value: ubuntu-2004-lts

vm_custom_os_image_project

type: string

optional

GCP Project hosting the custom image.

Note that this parameter is mandatory if vm_custom_os_image_family is set.

Default value: ubuntu-os-cloud

Last updated