VM Launcher: configuration file

After the data extraction through a tables-to-storage data operation, the GBQ to Firestore data pipeline continues with a vm-launcher data operation. It will launch a VM on Google Compute Engine, execute the Python script that loads the JSON data file into Firestore and then stop the VM. You can find the global parameters of this configuration in the VM-launcher configuration page.

As a reminder, we've just seen in the previous pages how to create a table-to-storage data operation that generates a set of freshness_next_execution_data-*.json data files in the gbq-to-firestore/000001/next_execution/ directory of the tailer-freshness GCS bucket.

The VM Launcher data operation executes the script that is in the "script_to_execute" parameter, using the "gcs_working_directory" on the "gcs_bucket" as a working directory. The first script line sets up a few prerequisites. The second line executes file-to-firestore.py. This Python script is described in the next page. A few arguments must be specified, as described below.

The file-to-firestore.py Python script must be uploaded in the Google Cloud Storage working directory before executing the VM launcher data operation.

In the "script_to_execute" parameter in the example below, you see a few arguments that are explained here. Example: "python3 file-to-firestore.py ./000001 '{\"next_execution\":{\"sub_dir\":\"next_execution\",\"file_template\":\"freshness_next_execution_data-*.json\"}}'"

Here is an example of VM Launcher configuration file that loads data from the tailer-freshness Google Cloud Storage Bucket. The Firestore destination will be specified inside the Python script (see next page).

{
    "configuration_type": "vm-launcher",
    "configuration_id": "000001_json_to_firestore_freshness_next_execution",
    "environment": "DEV",
    "account": "000099",
    "activated": true,
    "archived": false,
    "direct_execution": true,
    "gcp_project_id": "my-project",
    "gcs_bucket": "my-bucket",
    "gcs_working_directory": "gbq-to-firestore",
    "credentials": {
        "gcp-credentials.json": {
            "content": {
                "cipher_aes": "xxx", 
                "tag": "xxx", 
                "ciphertext": "xxx", 
                "enc_session_key": "xxx"
            }
        }
    },
    "script_to_execute": [
        "pip3 install google-cloud-firestore simplejson pytz",
        "python3 file-to-firestore.py ./000001 '{\"next_execution\":{\"sub_dir\":\"next_execution\",\"file_template\":\"freshness_next_execution_data-*.json\"}}'"
    ],
    "vm_delete": true,
    "vm_core_number": "2",
    "vm_memory_amount": "4",
    "vm_disk_size": "20",
    "vm_compute_zone": "europe-west1-b",
    "vm_custom_os_image_family": "ubuntu-2004-lts",
    "vm_custom_os_image_project": "ubuntu-os-cloud"
}

This configuration uses the gbq-to-firestore directory of the tailer-freshness GCS bucket as a working directory. You can see the Python file and the data directory are located here:

Insead the data directory, you find data files as specified:

Last updated