Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Create a JSON file
  • Deploy a first data operation
  • Check the data operation in Tailer Studio
  • Check the result in GCS
  • 🚀 Further steps

Was this helpful?

Edit on GitHub
  1. [Tutorial] Create a first data pipeline

Copy files from one bucket to another

The first data operation of this tutorial will consist in transferring the files from one bucket located in one GCP project to another bucket located in a different GCP project.

PreviousPrepare the demonstration environmentNextLoad files into BigQuery tables

Last updated 1 year ago

Was this helpful?

Create a JSON file

  1. Access your tailer folder (created during ).

  2. Create a working folder named tailer-demo for this tutorial, and inside create a folder named 1-Copy_files for this step.

  3. In your 1-Copy_files folder, create a JSON file named 000099-tailer-demo-sts.json for your data operation.

  4. Copy the following contents into your file:

    {
      "$schema": "http://jsonschema.tailer.ai/schema/storage-to-storage-veditor",
      "configuration_type": "storage-to-storage",
      "configuration_id": "000099-tailer-demo-copy-files-YOUR-NAME",
      "environment": "DEV",
      "account": "000099",
      "version": "3",
      "activated": true,
      "archived": false,
      "filename_templates": [
        {
          "filename_template": "stores-{{FD_DATE}}-{{FD_TIME}}.csv",
          "file_description": "Stores repository. The store listing is the file could evolve over time"
        },
        {
          "filename_template": "products-{{FD_DATE}}-{{FD_TIME}}.csv",
          "file_description": "Products repository. The product listing in the file could evolve over time"
        },
        {
          "filename_template": "sales_{{FD_BLOB_8}}-{{FD_DATE}}.csv",
          "file_description": "Daily Sales. There are many days in each files. And some days are repeated in different files"
        },
        {
          "filename_template": "sales_{{FD_DATE}}.csv",
          "file_description": "Daily Sales. There are many days in each files. And some days are repeated in different files"
        }
      ],
      
      "source": {
        "type": "gcs",
        "gcp_project_id": "my_gcp_project",
        "gcs_source_bucket" : "my-source-bucket",
        "gcs_source_prefix" : "input-folder-YOUR-NAME",
        "archive_prefix": "archive-folder-YOUR-NAME",
        "gcp_credentials_secret": {
          "cipher_aes": "b42xxx",
          "tag": "5c8xxx",
          "ciphertext": "fd0xxx",
          "enc_session_key": "8f6xxx"
        }
      },
      
      "destinations": [
        {
          "type": "gcs",
          "gcs_destination_bucket": "my-destination-bucket",
          "gcs_destination_prefix": "tailer-demo-input-folder-YOUR-NAME",
          "gcp_credentials_secret": {
            "cipher_aes": "b42xxx",
            "tag": "5c8xxx",
            "ciphertext": "fd0xxx",
            "enc_session_key": "8f6xxx"
          }
        }
      ]
    }
  5. Take note of the different parameters. For detailed information on storage-to-storage configuration file parameters, refer to .

  6. Edit the following values: ◾ In the source section, replace my-gcp-project with the ID of the GCP project containing your source bucket. ◾ In the source section, replace my-source-bucket with the name of the GCS bucket containing the source files. ◾ In the source section, replace the value of the gcp_credentials_secret parameter with the service account credentials for the source GCP project. If you haven't generated them yet, refer to . ◾ In the destinations section, replace my-destination-bucket with the name of the GCS bucket that will contain the output files. ◾ In the destinations section, replace the value of the gcp_credentials_secret parameter with the service account credentials for the destination GCP project. If you haven't done it yet, refer to . ◾ If you share the demo project with other developers, then in the configuration_id, replace YOUR-NAME by a personal value, like your name. This way, you won't overwrite a configuration deployed by someone else. You should also add your name in the source's gcs_source_prefix and archive_prefix, and in the destinations' gcs_destination_prefix to avoid any interferences with another developer's data operation.

Your JSON file is now ready to use.

Once your JSON file is ready, you can deploy the data operation:

  1. Access your working folder by running the following command:

    cd "[path to your tailer folder]\tailer-demo\1-Copy_files"
  2. To deploy the data operation, run the following command:

    tailer deploy configuration 000099-tailer-demo-copy-files.json
tailer deploy configuration 000099-tailer-demo-copy-files.json --context NO_CONTEXT
  1. Sign in with your Tailer Platform credentials.

  2. In the left navigation menu, select Storage-to-storage.

  3. In the Configurations tab, search for your data operation. You can see its status is Activated.

  4. Click the data operation ID to display its parameters and full JSON file, or to leave comments about it.

  • In your source bucket, copy a file. The file name must match one of the filename_template specified in the configuration.

  • On Tailer Studio, in the Storage-to-Storage section, Runs tab, you should see a run for your data operation. It should appear as "running" and quickly get the status "success".

  • In your source bucket, input-folder should be empty.

  • In your source bucket, archive-tailer-demo-folder should contain a folder for each input file, named as the filename date.

  • In your destination bucket, input-tailer-demo-folder should contain a copy of the input files.

🚀 Further steps

  • You can add multiple destinations to share data with different consumers.

  • You can work with Google Cloud Storage, but also Amazon S3, Azure and SFTP.

Deploy a first data operation

You may be asked to select a context (see for more information). If you haven't deployed any context, then choose "no context". You can also use the flag --context to specify the context of your choice, or NO_CONTEXT if that's what you want:

Check the data operation in Tailer Studio

Access .

Check the result in GCS

Now that our configuration is deployed, we can test it. Let's mimic production behavior. Access the folders you created when :

You can see in the that you can handle different use cases with this data operation:

You can send data to external partners. You just need a service account (or a user and a password) that can access the destination. Then you generate credentials associated with it (see documentation ) and specify it in the "destinations" object.

▶️
✅
🗳️
📄
installation
this page
this page
this page
this page
Tailer Studio
preparing the demonstration environment
Storage to Storage documentation
here