Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • file-to-firestore python call
  • Example

Was this helpful?

Edit on GitHub
  1. Data Pipeline Operations
  2. Transfer data with GBQ to Firestore

VM Launcher: configuration file

PreviousTable to Storage: SQL fileNextFile-to-firestore python file

Last updated 2 years ago

Was this helpful?

After the data extraction through a tables-to-storage data operation, the GBQ to Firestore data pipeline continues with a vm-launcher data operation. It will launch a VM on Google Compute Engine, execute the Python script that loads the JSON data file into Firestore and then stop the VM. You can find the global parameters of this configuration in the page.

As a reminder, we've just seen in the previous pages how to create a table-to-storage data operation that generates a set of freshness_next_execution_data-*.json data files in the gbq-to-firestore/000001/next_execution/ directory of the tailer-freshness GCS bucket.

The VM Launcher data operation executes the script that is in the "script_to_execute" parameter, using the "gcs_working_directory" on the "gcs_bucket" as a working directory. The first script line sets up a few prerequisites. The second line executes file-to-firestore.py. This Python script is described in the next page. A few arguments must be specified, as described below.

The file-to-firestore.py Python script must be uploaded in the Google Cloud Storage working directory before executing the VM launcher data operation.

file-to-firestore python call

In the "script_to_execute" parameter in the example below, you see a few arguments that are explained here. Example: "python3 file-to-firestore.py ./000001 '{\"next_execution\":{\"sub_dir\":\"next_execution\",\"file_template\":\"freshness_next_execution_data-*.json\"}}'"

Parameters
Description

file-to-firestore.py

Name of the python file to execute.

It must be uploaded in the working directory of the GCS bucket defined in the vm-launcher configuration file before the first execution. You can modify the Python script (provided in the next page) and rename it if you like.

./000001

Python script first argument:

The relative path, in the working directory, of the folder that contains the data files. You can specify sub-directories in the next argument.

'{"next_execution":

{

"sub_dir":"next_execution",

"file_template":"freshness_next_execution_data-*.json"}

}'

Python script second argument:

This is a stringified list of json objects that contain informations on the files to process. Please note that you must escape the double quote characters with a backslash.

In this example, there's only one use-case, named "next_execution", but you could have several of them. For each use-case, you need to specify a "sub_dir" and a "file_template":

  • sub_dir is the relative path of the sub directory where the data files are stored. Here it's "next_execution".

  • file_template is the template of the names of your data files. It can contain a wildcard "*" that stands for a set of any characters. In our example, this wildcard handles the fact that we could have several data files named for example freshness_next_execution_data-00000000.json and reshness_next_execution_data-00000001.json.

See the screenshots of the bucket and files below.

Example

Here is an example of VM Launcher configuration file that loads data from the tailer-freshness Google Cloud Storage Bucket. The Firestore destination will be specified inside the Python script (see next page).

{
    "configuration_type": "vm-launcher",
    "configuration_id": "000001_json_to_firestore_freshness_next_execution",
    "environment": "DEV",
    "account": "000099",
    "activated": true,
    "archived": false,
    "direct_execution": true,
    "gcp_project_id": "my-project",
    "gcs_bucket": "my-bucket",
    "gcs_working_directory": "gbq-to-firestore",
    "credentials": {
        "gcp-credentials.json": {
            "content": {
                "cipher_aes": "xxx", 
                "tag": "xxx", 
                "ciphertext": "xxx", 
                "enc_session_key": "xxx"
            }
        }
    },
    "script_to_execute": [
        "pip3 install google-cloud-firestore simplejson pytz",
        "python3 file-to-firestore.py ./000001 '{\"next_execution\":{\"sub_dir\":\"next_execution\",\"file_template\":\"freshness_next_execution_data-*.json\"}}'"
    ],
    "vm_delete": true,
    "vm_core_number": "2",
    "vm_memory_amount": "4",
    "vm_disk_size": "20",
    "vm_compute_zone": "europe-west1-b",
    "vm_custom_os_image_family": "ubuntu-2004-lts",
    "vm_custom_os_image_project": "ubuntu-os-cloud"
}

This configuration uses the gbq-to-firestore directory of the tailer-freshness GCS bucket as a working directory. You can see the Python file and the data directory are located here:

Insead the data directory, you find data files as specified:

🐍
👁️‍🗨️
VM-launcher configuration
Place where file-to-firestore.py is deposited for the example
Path in GCS for the file reading by File-to-firestore.py